SlideShare a Scribd company logo
1 of 41
NE7012 SOCIAL
NETWORK ANATYSIS
PREPARED BY: A.RATHNADEVI A.V.C COLLEGE OF
ENGINEERING
UNIT 1-INTRODUCTION
UNIT I- INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence
of the Social Web – Statistical Properties of Social Networks -Network analysis - Development
of Social Network Analysis - Key concepts and measures in network analysis - Discussion
networks -Blogs and online communities - Web-based networks
1.1 INTRODUCTION TO WEB
 The first web browser was invented in 1990 by Tim Berners-Lee.
 It was called World Wide Web and was later renamed Nexus.
 In 1993, Marc Andreesen created a browser that was easy to use and install with the
release of Mosaic (later Netscape).
1.2 LIMITATIONS OF THE CURRENT WEB
 There is a general consent that the Web is one of the greatest inventions of the 20th
Century. But could it be better?
 The reason that we do not often raise this question any more has to do with our unusual
ability to adapt to the limitations of our information systems. In the case of the Web this
means adaptation to our primary interface to the vast information that constitutes the
Web: the search engine.
 In the following we list four questions that search engines cannot answer at the moment
with satisfaction or not at all.
1.2.1 What’s wrong with the Web?
The questions below are specific for the sake of example, but they represent very general
categories of search tasks.
1. Who is Frank van Harmelen?
 To answer such a question using the Web one would go to the search engine and Enter
the most logical keyword: Harmelen
 The results returned by Google are shown in Figure 1.1. (Note that the results are slightly
different depending on whether one enters Google through the main site or a localized
version.)
 If this question and answer would be parts of a conversation, the dialogue would sound
like this
Q: Who is Frank van Harmelen?
A: I don’t know but there are over a million documents with the word“harmelen” on them and
I found them all really fast (0.31s). Further, you can buy Harmelen at Amazon. Free Delivery on
Orders Over 15.
 Not only the advertisement makes little sense, but from the top ten results only six are
related to the Frank van Harmelen we are interested in. Upon closer inspection the
problem becomes clear: the word Harmelen means a number of things. It’s the name of a
number of people, including the (unrelated) Frank van Harmelen and Mark van
Harmelen.
 Six of the hits from the top ten are related to the first person, one to the latter. Harmelen
is also a small town in the Netherlands (one hit) and the place for a tragic train accident
(one hit).
 The problem is thus that the keyword Harmelen (but even the term Frank van Harmelen)
is polysemous. The reason of the variety of the returned results is that designers of search
engines know that users are not likely to look at more than the top ten results. Search
engines are thus programmed in such a way that the first page shows a diversity of the
most relevant links related to the keyword.
2. Show me photos of Paris
 The most straightforward solution to this search task is typing in “Paris photos” in the
search bar of our favorite search engine.
 Most advanced search engines, however, have specific facilities for image search where
we can drop the term photo from the query. Some of the results returned by Google
Image Search are shown in Figure 1.2.
 Again, what we immediately notice is that the search engine fails to discriminate two
categories of images: those related to the city of Paris and those showing Paris Hilton, the
heiress to the Hilton fortune whose popularity on the Web could hardly be disputed
 More striking is the quality of search results in general. While the search engine does a
good job with retrieving documents, the results of image searches in general are
disappointing.
 For the keyword Paris most of us would expect photos of places in Paris or maps of the
city.
 In reality only about half of the photos on the first page, a quarter of the photos on the
second page and a fifth on the third page are directly related to our concept of Paris. The
rest are about clouds, people, signs, diagrams etc.
 The problem is that associating photos with keywords is a much more difficult task than
simply looking for keywords in the texts of documents. Automatic image recognition is
currently a largely unsolved research problem, which means that our computers cannot
“see” what kind of object is on the photo.
 Search engines attempt to understand the meaning of the image solely from its context,
e.g. based on the name of the file and the text that surrounds the image. Inevitably, this
leads to rather poor results.
3. Find new music that I (might) like
 This query is at an even higher level of difficulty so much so that most of us
Wouldn’t even think of posing it to a search engine. First, from the perspective of
automation, music retrieval is just as problematic as image search.
 As in the previous case, a search engine could avoid the problem of understanding the
content of music and look at the filename and the text of the web page for clues about the
performer or the genre.
 We suspect that such search engines do not exist for different reasons: most music on the
internet is shared illegally through peer-to-peer systems that are completely out of reach
for search engines.
 Music is also a fast moving good; search engines typically index the Web once a month
and therefore too slow for the fast moving world of music releases.
 But the reason we would not attempt to pose this query mostly has to do with formulating
the music we like. Most likely we would search for the names of our favorite bands or
music styles as a proxy, e.g. “new release”
4. Tell me about music players with a capacity of at least 4GB.
 This is a typical e-commerce query: we are looking for a product with certain
characteristics.
 One of the immediate concerns is that translating this query from natural language to the
Boolean language of search engines is almost impossible. We could try the search “music
player” “4GB” but it is clear that the search engine will not know that 4GB is the
capacity of the music player and we are interested in all players with at least that much
memory (not just those that have exactly 4GB).
 Such a query would return only pages where these terms occur as they
are. Problem is that general purpose search engines do not know anything about music
players or their properties and how to compare such properties.
 They are good at searching for specific information (e.g. the model number of an MP3
player), but not in searching for descriptions of items.
1.2.2 Diagnosis: A lack of knowledge
 The questions above are arbitrary in their specificity but they illustrate a general problem
in accessing the vast amounts of information on the Web. Namely, in all five cases we
deal with a knowledge gap: what the computer understands and able to work with is
much more limited than the knowledge of the user.
 The handicap of the computer is mostly due to technological difficulties in getting our
computers to understand natural language or to “see” the content of images and other
multimedia.
 Even if the information is there, and is blatantly obvious to a human reader, the computer
may not be able to see anything else of it other than a string of characters.
 This problem affects all of the above queries to some extent. A human can
Quickly skim the returned snippets (showing the context in which the keyword occurs)
and realize that the different references to the word Harmelen do not all refer to persons
and even the persons named Harmelen cannot all be the same.
 In the second query, it is also blatantly obvious for the human observer that not all
pictures are of cities. However, even telling cities and celebrities apart is a difficult task
when it comes to image recognition.
 In the case of the second query, an important piece of knowledge that the computer
doesn’t possess is the common knowledge that there is a city named Paris and there is a
famous person named Paris Hilton (who is also different from the Hilton in Paris).
 Answering the third query requires the kind of extensive background knowledge about
musical styles, genres etc. that shop assistants and experts in music possess. This kind of
knowledge is well beyond the information that is in the database of a typical music store.
 The third case is also interesting because there is also lacking background knowledge
about the user. There has to be a way of providing this knowledge to the search engine in
a way that it understands it.
 The fourth query is not worthy because it highlights the problem of aggregating
information.
1.2.3 The semantic solution
 The idea of the Semantic Web is to apply advanced knowledge technologies in order to
fill the knowledge gap between human and machine.
 This knowledge can either be information that is already described in the content of the
Web pages but difficult to extract or additional background knowledge that can help to
answer queries in some way.
 In the following we describe the improvement one could expect in case of our four
queries based on examples of existing tools and applications that have been implemented
for specific domains or organizational settings.
 In the case of the first query the situation can be greatly improved by providing personal
information in a semantic format.
 Solution is to attach a semantic profile to personal web pages that describe the same
information that appears in the text of the web page but in a machine process able format.
 The Friend-of-a-Friend (FOAF) project provides a widely accepted vocabulary for such
descriptions. FOAF profiles listing attributes such as the name, address, interests of the
user can be linked to the web page or even encoded in the text of the page.
 As we will see several profiles may also exist on the Web describing the same person. As
all profiles are readable and comparable by machines, all knowledge about a person can
be combined automatically.
 The solution in the second case is to attach metadata to the images in question. For
example, the online photo sharing site Flickr allows annotating images using geographic
coordinates.
 After uploading some photos users can add keywords to describe their images (e.g.
“Paris, Eiffel-tower”) and drag and drop the images on a geographic map to indicate the
location where the photo was taken. In the background the system computes the latitude
and longitude of the place where the user pointed and attaches this information to the
image.
 Although in this case the system is not even aware that Paris is a city, minimal additional
information about photos (the geo-coordinates) enables a kind of visualization that makes
the searching task much easier.
 In third case the background knowledge required for recommending music is already at
work behind the online radio called Pandora. Pandora is based on the Music Genome
Project, an attempt to create a vocabulary to describe characteristics of music from
melody, harmony and rhythm, to instrumentation, orchestration, arrangement, lyrics, and
the rich world of singing and vocal harmony.
 Over several years thousands of songs have been annotated by experts in music theory.
This knowledge is now used by the system to recommend unknown music to users based
on their existing favorites.
 Our fourth problem, the aggregation of product catalogs can also be directly addressed
using semantic technology.
 As we have seen the problem in this case is the difficulty of maintaining a unified catalog
in a way that does not require an exclusive commitment from the providers of product
information. (In practice, information providers often have their own product databases
with a proprietary classification
system.) Further, we would like to keep the catalogue open to data providers adding new,
emerging categories of products and their descriptions (e.g. mp3 players as a subclass of
music players with specific attributes such as capacity, size, color etc.)
1.3 DEVELOPMENT OF THE SEMANTIC WEB
1.3.1 Research, development and standardization
 The vision of extending the current human-focused Web with machine processable
descriptions of web content has been first formulated in 1996 by Tim Berners-Lee, the
original inventor of the Web [BLFD99].
 The Semantic Web has been actively promoted since by the World Wide Web
Consortium (also led by Berners-Lee), the organization that is chiefly responsible for
setting technical standards on the Web.
 the Semantic Web has quickly attracted significant interest from funding agencies on
both sides of the Atlantic, reshaping much of the AI research agenda in a relatively short
period of time
 In particular, the field of Knowledge Representation and Reasoning took center stage, but
outcomes from other fields of AI have also been put into to use to support the move
towards the Semantic Web: for example, Natural
Language Processing and Information Retrieval have been applied to acquiring
knowledge from the World Wide Web.
 The complete list of individuals in this community consists of 608 researchers mostly
from academia (79%) and to a lesser degree from industry (21%). Geographically, the
community covers much of the United States, Europe, with some activity in Japan and
Australia.
 As Figure 1.5 shows, the participation rate at the individual ISWC events have quickly
reached the level typical of large, established conferences and remained at that level even
for the last year of data (2004), when the conference was organized in Hiroshima, Japan.
The number of publications written by the members of the community that contain the
keyword “SemanticWeb” has been sharply rising since the beginning.
 The core technologies of the SemanticWeb, logic-based languages for knowledge
representation and reasoning have been developed in the research field of Artificial
Intelligence.
 Tools for creating, storing and reasoning with ontologies have been primarily developed
by university-affiliated technology startups (for example, Aduna, Onto Text and
Ontoprise) and at research labs of large corporations (see for example he work of the
advanced technology groups at IBM and Hewlett-Packard.
 Most of these tools are available as open source as at the current stage vendors expect to
make profit primarily by developing complete solutions and providing support for other
developers.
 The World Wide Web Consortium still plays a key role in standardization where the
interoperability of tools necessitates mediation between various developer and user
communities, as in the case of the development of a standard query language and
protocol to access ontology stores across the Web.
1.3.2 Technology adoption
 The SemanticWeb was originally conceptualized as an extension of the current Web, i.e.
as the application of metadata for describing Web content. In this vision, the content that
is already on the Web (text, but also multimedia) would be enriched in a collaborative
effort by the users of the Web.
 The SemanticWeb suffers from what the economist Kevin Kelly calls the fax-effect.
Kelly notes that when the first fax machines were introduced, they came with a very hefty
price tag.
 Yet they were almost useless: namely, the usefulness of a fax comes from being able to
communicate with other fax users.
 In this sense every fax unit sold increases the value of all fax machines in use. While
traditional goods such as the land or precious metals become more valuable the less is
produced (called the law of scarcity), the fax machine network exhibits the opposite,
which is called the law of plentitude.
 What makes the case of the Semantic Web more difficult, however, is an additional cost
factor. Returning to the example of the fax network, we can say that it required a certain
kind of agreement to get the system working on a global scale: all fax machines needed to
adopt the same protocol for communicating over the telephone line. This is similar to the
case of the Web where global interoperability is guaranteed by the standard protocol for
communication (HTTP).
1.4 THE EMERGENCE OF THE SOCIAL WEB
 The first wave of socialization on the Web was due to the appearance of blogs, wikis and
other forms of web-based communication and collaboration.
 Blogs and wikis attracted mass popularity from around 2003 What they have
in common is that they both significantly lower the requirements for adding content to the
Web: editing blogs and wikis did not require any knowledge of HTML any more. Blogs
and wikis allowed individuals and groups to claim their personal space on the Web and
fill it with content at relative ease.
 Although the example of Wikipedia, the online encyclopedia is outstanding,
Wikis large and small are used by groups of various sizes as an effective knowledge
management tool for keeping records, describing best practices or jointly developing
ideas.
 The first online social networks (also referred to as social networking services) entered
the field at the same time as blogging and wikis started to take off. In 2003, the first-
mover Friendster25 attracted over five million registered users in the span of a few
months which was followed by Google and Microsoft starting or announcing similar
services.
1.4.1 Web 2.0 + SemanticWeb =Web 3.0?
 Web 2.0 is often contrasted to the Semantic Web, which is a more conscious and
carefully orchestrated effort on the side of the W3C to trigger a new stage of
developments using semantic technologies.
 In practice the ideas of Web 2.0 and the Semantic Web are not exclusive alternatives:
while Web 2.0 mostly effects how users interact with the Web, while the Semantic Web
opens new technological opportunities for web developers in combining data and services
from different sources.
 The Semantic Web can offer to the Web 2.0 community is a standard infrastructure for
the building creative combinations of data and services.
 Standard formats for exchanging data and schema information, support for data
integration, along with standard query languages and protocols for querying remote data
sources provide a platform for the easy development of mashups.
1.5 STATISTICAL PROPERTIES OF SOCIAL NETWORKS
 To study social networks, we first represent them as graphs. We want to understand the
structural patterns and properties of these graphs
Two types of properties:
 Static properties: describing the structure of snapshots of graphs.
 Dynamic properties: describing how the structure evolves over time.
 These properties may be for unweight or weighted graphs, where weights may represent
multi-edges (e.g. multiple phone calls from one person to another), or edge weights (e.g.
monetary amounts between a donor and a recipient in a political donation network).
 Properties to Understand:
a) What do social networks look like, on a large scale?
b) How do networks behave over time?
c) How do the different components of an entire network form?
d) How do the non-giant weakly connected components behave over time?
e) What distributions and patterns do weighted graphs maintain?
f) What happens when we take into account multiple edges and weighted edges?
1.5.1 Static Properties
 While all networks we examine are evolving over time, there are properties that are
measured at single points in time, that is, static snapshots of the graphs. For the purposes
of organization we will further divide these properties into those applying to unweighted
graphs and to weighted graphs.
1.5.1.1 Static Unweighted Graphs
 Here, we present the ‘laws’ that apply to static snapshots of real graphs
Without considering the weights on the edges. Those include the patterns in
Degree distributions, the number of hops pairs of nodes can reach each other, Local
number of triangles, eigenvalues and communities. Next, we describe the related patterns
in more detail.
1) S-1: Heavy-tailed Degree Distribution
 The degree distribution of many real graphs obey a power law f(d) ∝ d−, with  > 0,
and f(d) being the fraction of nodes with degree d.
 This means that there exist many low degree nodes, whereas only a few high degree
nodes in real graphs
2) S-2: Small Diameter
 The diameter of a static graph is the maximum distance between any two nodes.
 Real world graphs often have small diameters.
 This is known as the ‘small-world phenomenon’ or the ‘six degrees of separation’.
 Diameter can be high jacked by long chains.
 Therefore we use the effective diameter which is the minimum number of hops in which
some fraction (usually 90%) of all connected node pairs can be reached.
3) S-3: Triangle Power Law (TPL)
 The number of triangles ∆ follows a power-law in the form of f(∆) ∝ ∆σ, with the
exponent σ < 0. The number of nodes that participate in ∆ number of triangles follows a
power-law in the form of f(∆) ∝ ∆σ , with the exponent σ < 0.
 TPL means that
 Many nodes have only a few triangles in their neighborhoods and
 A few nodes participate in many numbers of triangles with their neighbors.
4) S-4: Eigenvalue Power Law (EPL)
 The eigenvalues of a graph are defined as the eigenvalues of its adjacency matrix. The set
of eigenvalues of a graph is called a graph spectrum.
 (For a matrix A, if there is a vector X s.t. AX = X for some scalar, then  is the
eigenvalue of A assoc with eigenvector X.)
 EPL states that the 20 or so largest eigenvalues of the Internet graph are power-law
distributed. It has been shown that the Eigenvalue Power Law is a consequence of the
Degree Power Law.
5) S-5: Community Structure
 Real-world graphs exhibit a modular structure, with nodes forming groups, and possibly
groups within groups.
 In other words, the nodes form communities where groups of nodes in the same
community are tighter connected to each other than to those nodes outside the
community.
1.5.1.2 Static Weighted Graphs
 We consider weighted directed graphs  Data set: records in the form (IP-source, IP
destination, timestamp, number of-packets)
 We can have multi-edges and weights 
Notations:
 W(t): the total weight up to time t
 E(t): the number of distinct edges up to time t
 Ed(t): the number of multi-edges (d stands for duplicate edges) up to time t
 N(t): the number of nodes up to time t
1) SW-1: Weight Power Law (WPL)
 Between W(t) and E(t), we observe that W(t) = E(t)w (w ranges from 1.01 to 1.5)
 This means that more edges in the graph imply super linearly higher total weight.
 We also have
N(t) = E(t)n Ed(t) = E(t)dupe
Nsrc(t) = E(t)nsrc Ndst(t) = E(t)dst
2) SW-2: Edge Weights Power Law
 Given a real-world graph, nodes i and j with weights wi and wj , the edge ei,j with weight
wi,j , then we have the power law
 This means that the weight of a given edge and weights of its neighboring two nodes are
correlated (similar to Newton’s Gravitational Law).
3) SW-3: Snapshot Power Laws (SPL)
 Consider the i-th node of a weighted graph, at time t (a snapshot), and let outi , outwi be
its out-degree and out-weight. Then
 Where ow is the out-weight-exponent of the SPL. Similarly, for the in-degree, with in-
weight-exponent iw.
 The exponents iw and ow take values in the range [0.9-1.2] and [0.95-1.35], respectively.
 The exponent over time remains almost constant.
1.5.2 Dynamic Properties
 These are typically studied by looking at a series of static snapshots and seeing how
measurements of these snapshots compare. Like the static properties we presented
previously, we also divide these into properties that take into account weights and those
that don’t.
1.5.2.1 Dynamic Unweighted Graphs
 The patterns in dynamic time-evolving graphs that do not consider edge
weights include the shrinking diameter property, the densification law, oscillating around
a constant size secondary largest connected components, the largest eigenvalue law and
the bursty and self-similar edge additions over time. We next describe these laws in
detail.
1) D-1: Shrinking Diameter
 It can be observed that not only is the diameter of real graphs small, but it also shrinks
and then stabilizes over time.
 There is a ‘gelling point’ at which many small disconnected components merge and form
the largest connected component in the graph.
 This is like the ‘coalescence’ of the graph at which point the diameter ‘spikes’.
 Afterwards, with new edges the diameter keeps shrinking until it reaches an equilibrium.
 The vertical line marks the gelling point.
2) D-2: Densification Power Law (DPL)
 The relationship between E(t) and N(t) (the number of edges and nodes at time t) follows
the Densification Power Law
 where β is the densification exponent with value between 1.03 and 1.7
 This indicates a super linearity between the number of nodes and the number of edges.
 Also explain the densification effect.
 For (c) the good linear fit agrees with the DPL.
 (d) is the corresponding component sizes.
3) D-3: Diameter-plot and Gelling point
 Real graphs exhibit a gelling point, at which the diameter spikes and (several)
disconnected components gel into a giant component.
 Before that point, the graph is more or less in an establishment period, typically
consisting of a collection of small, disconnected components.
 After the gelling point, the graph obeys the expected rules.
 Example: PostNet data on slide 30 & 32.
4) D-4: Constant/Oscillating NLCCs
 After the gelling point, the secondary and tertiary connected components remain of
approximately constant size, with small oscillations.
 New nodes typically link to the GCC
 Very few of the newcomers link to the 2nd (or 3rd) CC, helping them to grow slowly
 In very rare cases, a newcomer links both to an NLCC and GCC, thus leading to the
absorption of the NLCC into the GCC
 At that point, we have a drop in the size of the 2nd CC
5) D-5: LPL: Principal eigen value over time
 The principal eigenvalue λ1(t) of the 0-1 adjacency matrix A and the number of edges
E(t) over time follow a power law with exponent less than 0.5,especially after the ‘gelling
point’. i.e.
1.5.2.2 Dynamic Weighted Graphs
1) DW-1: Bursty/self-similar weight additions
 Tracking how much weight a graph puts on at each time interval (i.e. ΔW(t)) and looking
at the entropy plots.
 The weight additions over time show self-similarity.
 If the edge weight is the number of reoccurrences of that edge, the slope of the plot >
0.95 (more uniform) For other feature as edge weight, the weight additions are more
bursty, the slope being as low as 0.6 for the Network Traffic dataset.
2) DW-2: LWPL: Weighted principal eigenvalue over time
 (λ1,w Power Law (LWPL)) Weighted real graphs exhibit a power law for the largest
eigenvalue (i.e. λ1,w(t)) of the weighted adjacency matrix Aw and the number of edges
E(t) over time. That is the exponent β ranged from 0.5 to 1.6
Applications of these Laws
 These patterns are helpful for
1. Spotting anomalous graphs and sub-graphs,
2. Answering questions about entities in a network, and
3. Answering questions about what-if scenarios.
 Spotting anomalies is vital for
1. Determining abuse of networks
2. Fraudulent reputation building (in e-auction systems)
3. Detection of dwindling/abnormal social sub-groups
4. Network intrusion detection
 Analyzing network properties is also useful for
1. Identifying authorities and search algorithms,
2. Discovering the “network value” of customers
3. Improve recommendation systems
 What-if scenarios are vital for
1. Extrapolation,
2. Provisioning and
3. Algorithm design
1.6 NETWORK ANALYSIS
 Social Network Analysis (SNA) is the study of social relations among a set of actors.
 The key difference between network analysis and other approaches to social science is
the focus on relationships between actors rather than the attributes of individual actors.
 Network analysis takes a global view on social structures based on the belief that types
and patterns of relationships emerge from individual connectivity and that the presence
(or absence) of such types and patterns have substantial effects on the network and its
constituents.
 The network structure provides opportunities and imposes constraints on the individual
actors by determining the transfer or flow of resources (material or immaterial) across the
network.
 SNA is thus a different approach to social phenomena and therefore requires a new set of
concepts and new methods for data collection and analysis.
 Network analysis provides a vocabulary for describing social structures, provides formal
models that capture the common properties of all (social) networks and a set of methods
applicable to the analysis of networks in general.
 The concepts and methods of network analysis are grounded in a formal description of
networks as graphs. Methods of analysis primarily originate from graph theory as these
are applied to the graph representation of social network data.
 The methods of data collection in network analysis are aimed at collecting relational data
in a reliable manner. Data collection is typically carried out using standard questionnaires
and observation techniques that aim to ensure the correctness and completeness of
network data.
 Often records of social interaction (publication databases, meeting notes, newspaper
articles, documents and databases of different sorts) are used to build a model of social
networks.
1.7 DEVELOPMENT OF THE SOCIAL WEB
 The field of Social Network Analysis today is the result of the convergence of several
streams of applied research in sociology, social psychology and anthropology.
 Many of the concepts of network analysis have been developed independently by various
researchers often through empirical studies of various social settings.
 For example, many social psychologists of the 1940s found a formal description of social
groups useful in depicting communication channels in the group when trying to explain
processes of group communication.
 Already in the mid-1950s anthropologists have found network representations useful in
generalizing actual field observations, for example when comparing the level of
reciprocity in marriage and other social exchanges across different cultures.
 Despite the various efforts, each of the early studies used a different set of concepts and
different methods of representation and analysis of social networks.
 The term “social network” has been introduced by Barnes in 1954
 This convergence was facilitated by the adoption of a graph representation of social
networks usually credited to Moreno.
 Moreno called a sociogram was a visual representation of social networks as a set of
nodes connected by directed links.
 The nodes represented individuals in Moreno’s work, while the edges stood for personal
relations.
 However, similar representations can be used to depict a set of relationships between any
kind of social unit such as groups, organizations, nations etc.
 It is a network image between workers (W), solderers(S) and inspectors (I).
 While 2D and 3D visual modeling is still an important technique of network analysis, the
sociogram is honored mostly for opening the way to a formal treatment of network
analysis based on graph theory.
 One of the relatively new areas of network analysis is the analysis of networks in
entrepreneurship, an active area of research that builds and contributes to organization
and management science.
 The vocabulary, models and methods of network analysis also expand continuously
through applications that require to handle ever more complex data sets.
 An example of this process is the advances in dealing with longitudinal data. New
probabilistic models are capable of modelling the evolution of social networks and
answering questions regarding the dynamics of communities.
 Formalizing an increasing set of concepts in terms of networks also contributes to both
developing and testing theories in more theoretical branches of sociology.
 The increasing variety of applications and related advances in methodology can be best
observed at the yearly Sunbelt Social Networks Conference series, which started in 1980.
The field of Social Network Analysis also has a journal of the same name since 1978.
 While the field of network analysis has been growing steadily from the beginning, there
have been two developments in the last two decades that led to an explosion in network
literature.
 First, advances in information technology brought a wealth of electronic data and
significantly increased analytical power.
 Second, the methods of SNA are increasingly applied to networks other than social
networks such as the hyperlink structure on the Web or the electric grid.
1.8 KEY CONCEPTS AND MEASURES IN NETWORK ANALYSIS
1.8.1 Networks component
 Actors (nodes, points, vertices):
1. Individuals, Organizations, Events …
2. Can have properties (attributes)
 Relations (lines, arcs, edges, ties): between pairs of actors.
1. Undirected (symmetric) / Directed (asymmetric)
2. Binary / Valued
 Most network analysis methods work on an abstract, graph based representation of real
world networks.
 The units of interest in a network are the combined sets of actors and their relations.
 We represent actors with points and relations with lines.
 In general, a relation can be:
 Undirected / Directed
 Binary / Valued
1.8.2 Types of networks
 We can examine networks across multiple levels:
1. Ego network
2. Partial network
3. Complete or “Whole” network
1. Ego network
 Have data on a respondent (ego) and the people they are connected to (alter).
 May include estimates of connections among alters
 Measures:
1. Size
2. Types of relations
2. Partial network
 Ego networks plus some amount of tracing to reach contacts of contacts
 Something less than full account of connections among all pairs of actors in the relevant
population
3. Complete or “Whole” network
 Connections among all members of a population.
 Data on all actors within a particular (relevant) boundary.
 Never exactly complete (due to missing data), but boundaries are set
 E.g.: Friendships among workers in a company
 Measures:
1. Graph properties
2. Density
3. Sub-groups
4. Positions
1.8.3 Basic data structures
1. from picture to matrices
2. from matrices to list
1.8.4 Measuring networks
1. Connectivity
 Indirect connections are what make networks systems. One actor can reach another if
there is a path in the graph connecting them.
Basic elements:
 A path is a sequence of nodes and edges starting with one node and ending with another,
tracing the indirect connection between the two. On a path, you never go backwards or
revisit the same node twice.
Example: a  b  cd
 A walk is any sequence of nodes and edges, and may go backwards.
Example: a  b  c  b c d
 A cycle is a path that starts and ends with the same node.
Example: a  b  c  a
 If you can trace a sequence of relations from one actor to another, then the two are
connected. If there is at least one path connecting every pair of actors in the graph, the
graph is connected and is called a component.
 Intuitively, a component is the set of people who are all connected by a chain of relations.
2. Distance and number of path
 Distance is measured by the (weighted) number of relations separating a pair, using the
shortest path.
Actor “a” is:
1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1
 Paths are the different routes one can take. Node-independent paths are particularly
important.
3. Centrality
 Centrality refers to (one dimension of) location, identifying where an actor resides in a
network.
 Centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of
the network. In the sense that they have many and important connections.
 Three standard centrality measures capture a wide range of “importance” in a network:
1. Degree
2. Closeness
3. Betweenness
3.1. Degree centrality
 No. of nodes adjacent to given node
 Often used as measure of a node’s degree of connectedness and hence also influence
and/or popularity
 Useful in assessing which nodes are central with respect to spreading information and
influencing others in their immediate ‘neighborhood’
 Node 3 and 4 have the highest degree 4
 Formula
3.2. Closeness centrality
 An actor is considered important if he/she is relatively close to all other actors.
 Sum of geodesic distances to all other nodes.
 Inverse measure of centrality
 It is a measure of reach, i.e. the speed with which information can reach other nodes from
a given starting node
 Node 3and 5 have the highest closeness, while node 2 fares almost as well.
 formula
3.3. Betweenness centrality
 Number of times a node lies along the shortest path between two others
 Shows which nodes are more likely to be in communication paths between other nodes
 Also useful in determining points where the network would break apart
 Node 5 has the highest Betweenness centrality then 3
 Betweenness centrality can be defined in terms of probability (1/gij),
gij = number of geodesics that bond actors pi and pj.
gij(pk)= number of geodesics which bond pi and pj and content pk.
iij(pk) = probability that actor pk is in a geodesic randomly chosen among the ones which
join pi and pj.
 Betweenness centrality is the sum of these probabilities (Freeman, 1979).
 Normalizad: C’B(pk) = CB(pk) / [(n-1)(n-2)/2]
3.3.1 Centralization
 If we want to measure the degree to which the graph as a whole is centralized, we look at
the dispersion of centrality
 Freeman’s general formula for centralization (which ranges from 0 to 1):
3.3.2 Density
 The more actors are connected to one another, the more dense the network will be.
 Undirected network: n(n-1)/2 = 2n-1 possible pairs of actors.
 Directed network: n(n-1)*2/2 = 2n-2possible lines.
1.8.5 Comparing across centrality values
 Generally, the 3 centrality types will be positively correlated
 When they are not correlated, it probably tells you something interesting about the
network.
Low
Degree
Low
Closeness
Low
Betweenness
High Degree Embedded in cluster
that is far from the rest
of the network
Ego's connections are
redundant -
communication
bypasses him/her
High Closeness Key player tied to
important
important/active alters
Probably multiple paths
in the network, ego is
near many people, but
so are many others
High Betweenness Ego's few ties are
crucial for network
flow
Very rare cell. Would
mean that ego
monopolizes the ties
from a small number
1.8.6 Social network software
1. UCINET
 The Standard network analysis program, runs in Windows
 Good for computing measures of network topography for single nets
 Input-Output of data is a special 2-file format, but is now able to read PAJEK files
directly.
 Not optimal for large networks
 Available from: Analytic Technologie
2. PAJEK
 Program for analyzing and plotting very large networks
 Intuitive windows interface
 Started mainly a graphics program, but has expanded to a wide range of analytic
capabilities
 Can link to the R statistical package
 Free
 Available from: http://vlado.fmf.uni-lj.si/pub/networks/pajek/
3. NetDraw
 Also very new, but by one of the best known names in network analysis software.
 Free
1.9 DISCUSSION NETWORKS
1.9.1 Electronic discussion networks
 Tyler, Wilkinson and Huberman analyze communication among employees of their own
lab by using the corporate email archive. if they had exchanged at least a minimum
number of total emails in a given period, filtering out one-way relationships.
 Adamic and Adar revisits one of the oldest problems of network research, namely the
question of local search.
 How do people find short paths in social networks based on only local information about
their immediate contacts?
 Their findings support earlier results that additional knowledge on contacts such as their
physical location and position in the organization allows employees to conduct their
search much more efficiently than using the simple strategy of always passing the
message to the most connected neighbor.
 Discussions are largely in email and to a smaller part on the phone and in face-to-face
meetings.
 Group communication and collective decision taking in various settings are traditionally
studied using much more limited written information such as transcripts and records of
attendance and voting.
 The main technical contribution of Gloor is a dynamic visualization of the discussion
network that allows to quickly identify the moments when key discussions take place that
activates the entire group and not just a few select members.
 Gloor also performs a comparative study across the various groups based on the
structures that emerge over time.
1.9.2 Blogs and online communities
 Content analysis has also been the most commonly used tool in the computer-aided
analysis of blogs (web logs), primarily with the intention of trend analysis for the
purposes of marketing.
 While blogs are often considered as “personal publishing” or a “digital diary”, bloggers
themselves know that blogs are much more than that: modern blogging tools allow easily
commenting and reacting to the comments of other bloggers, resulting in webs of
communication among bloggers.
 This fig shows some of the features of blogs that have been used in various studies to
establish the networks of bloggers.
 Blogs make a particularly appealing research target due to the availability of
Structured electronic data in the form of RSS (Rich Site Summary) feeds.
 RSS feeds contain the text of the blog posts as well as valuable metadata such as the
timestamp of posts, which is the basis of dynamic analysis.
 The 2004 US election campaign represented a turning point in blog research
 as it has been the first major electoral contest where blogs have been exploited as a
method of building networks among individual activists and supporters
 Online community spaces and social networking services such as MySpace,
Live Journal cater to socialization even more directly than blogs with features such as
social networking (maintaining lists of friends, joining groups), messaging and photo
sharing.
 Most online social networking services (Friendster, Orkut, LinkedIn and their sakes)
closely guard their data even from their own users.
 A technological alternative to these centralized services is the FOAF network.
 FOAF profiles are stored on the web site of the users and linked together using
hyperlinks.
 The drawback of FOAF is that at the moment there is a lack of tools for creating and
maintaining profiles as well as useful services for exploiting this network.
 Advantages
1. Easy to create and fast
2. Easy to add links, photos, videos
3. It can be used to create community
 Disadvantage
1. Generally one author
2. Used for personal opinions and reflection
1.9.3 WEB BASED NETWORKS
 The content of Web pages is the most inexhaustible source of information for social
network analysis.
 This content is not only vast, diverse and free to access but also in many cases more up to
date than any specialized database.
 On the downside, the quality of information varies significantly and reusing it for
network analysis poses significant technical challenges.
 There are two features of web pages that are considered as the basis of extracting social
relations: links and co-occurrences.
 The linking structure of the Web is considered as proxy for real world relationships as
links are chosen by the author of the page and connect to other information sources that
are considered authoritative and relevant enough to be mentioned.
 The biggest drawback of this approach is that such direct links between personal pages
are very sparse: due to the increasing size of the Web searching has taken over browsing
as the primary mode of navigation on the Web.
 As a result, most individuals put little effort in creating new links an updating link targets
or have given up linking to other personal pages altogether.
 Features in web pages that can be used for social network extraction.
 Co-occurrences of names in web pages can also be taken as evidence of relationships and
are a more frequent phenomenon.
 On the other hand, extracting relationships based on co-occurrence of the names of
individuals or institutions requires web mining as names are typically embedded in the
natural text of web pages.
 The techniques employed here are statistical methods possibly combined with an analysis
of the contents of web pages.

More Related Content

What's hot

CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit Ipkaviya
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVpkaviya
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)SocialMediaMining
 
Web engineering notes unit 2
Web engineering notes unit 2Web engineering notes unit 2
Web engineering notes unit 2inshu1890
 
Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)SocialMediaMining
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)SocialMediaMining
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalizationAmeenababs
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionPetteriTeikariPhD
 
Emerging DB Technologies
Emerging DB TechnologiesEmerging DB Technologies
Emerging DB TechnologiesTalal Alsubaie
 
Object relationship model of software engineering,a subtopic of object orient...
Object relationship model of software engineering,a subtopic of object orient...Object relationship model of software engineering,a subtopic of object orient...
Object relationship model of software engineering,a subtopic of object orient...julia121214
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)SocialMediaMining
 

What's hot (20)

CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit I
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IV
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)
 
Web engineering notes unit 2
Web engineering notes unit 2Web engineering notes unit 2
Web engineering notes unit 2
 
3 Centrality
3 Centrality3 Centrality
3 Centrality
 
Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Semantic web
Semantic webSemantic web
Semantic web
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Mobile hci
Mobile hciMobile hci
Mobile hci
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
 
Groupware/CSCW
Groupware/CSCWGroupware/CSCW
Groupware/CSCW
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer Vision
 
Emerging DB Technologies
Emerging DB TechnologiesEmerging DB Technologies
Emerging DB Technologies
 
Object relationship model of software engineering,a subtopic of object orient...
Object relationship model of software engineering,a subtopic of object orient...Object relationship model of software engineering,a subtopic of object orient...
Object relationship model of software engineering,a subtopic of object orient...
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 

Viewers also liked

Social network analysis basics
Social network analysis basicsSocial network analysis basics
Social network analysis basicsPradeep Kumar
 
Community detection
Community detectionCommunity detection
Community detectionScott Pauls
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectivenessemapesce
 
Dyno cycling behavior of Heroku
Dyno cycling behavior of HerokuDyno cycling behavior of Heroku
Dyno cycling behavior of HerokuShunji Konishi
 
Survey on the Behavior of Path Users in Indonesia
Survey on the Behavior of Path Users in IndonesiaSurvey on the Behavior of Path Users in Indonesia
Survey on the Behavior of Path Users in IndonesiaIdham Raharfian
 
Cav presentation about_indonesia
Cav presentation about_indonesiaCav presentation about_indonesia
Cav presentation about_indonesiaTakahiro Suzuki
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social NetworksKent State University
 
Community Detection
Community Detection Community Detection
Community Detection Kanika Kanwal
 
ベトナムの人気モバイルアプリランキング
ベトナムの人気モバイルアプリランキングベトナムの人気モバイルアプリランキング
ベトナムの人気モバイルアプリランキング Q&Me Vietnam Market Research
 
ベトナムオフショア開発を開始するマイクロステップ
ベトナムオフショア開発を開始するマイクロステップベトナムオフショア開発を開始するマイクロステップ
ベトナムオフショア開発を開始するマイクロステップSamurai Incubate Inc.
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Path: Why you need another social media app
Path: Why you need another social media appPath: Why you need another social media app
Path: Why you need another social media appPeriscope
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
 
Social network analysis
Social network analysisSocial network analysis
Social network analysisCaleb Jones
 
Top 20 Twitter Influencers In Singapore
Top 20 Twitter Influencers In SingaporeTop 20 Twitter Influencers In Singapore
Top 20 Twitter Influencers In SingaporeHappy Marketer
 
Indonesia social media trend 2016 jakpat
Indonesia social media trend 2016 jakpatIndonesia social media trend 2016 jakpat
Indonesia social media trend 2016 jakpatJAKPATAPP
 
Digital in numbers indonesia (compilation)
Digital in numbers indonesia (compilation)Digital in numbers indonesia (compilation)
Digital in numbers indonesia (compilation)Seno Pramuadji
 

Viewers also liked (20)

Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social network analysis basics
Social network analysis basicsSocial network analysis basics
Social network analysis basics
 
Community detection
Community detectionCommunity detection
Community detection
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Dyno cycling behavior of Heroku
Dyno cycling behavior of HerokuDyno cycling behavior of Heroku
Dyno cycling behavior of Heroku
 
Survey on the Behavior of Path Users in Indonesia
Survey on the Behavior of Path Users in IndonesiaSurvey on the Behavior of Path Users in Indonesia
Survey on the Behavior of Path Users in Indonesia
 
Cav presentation about_indonesia
Cav presentation about_indonesiaCav presentation about_indonesia
Cav presentation about_indonesia
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Community Detection
Community Detection Community Detection
Community Detection
 
ベトナムの人気モバイルアプリランキング
ベトナムの人気モバイルアプリランキングベトナムの人気モバイルアプリランキング
ベトナムの人気モバイルアプリランキング
 
ベトナムオフショア開発を開始するマイクロステップ
ベトナムオフショア開発を開始するマイクロステップベトナムオフショア開発を開始するマイクロステップ
ベトナムオフショア開発を開始するマイクロステップ
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Path: Why you need another social media app
Path: Why you need another social media appPath: Why you need another social media app
Path: Why you need another social media app
 
2015 indonesia digital landscape
2015 indonesia digital landscape 2015 indonesia digital landscape
2015 indonesia digital landscape
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Top 20 Twitter Influencers In Singapore
Top 20 Twitter Influencers In SingaporeTop 20 Twitter Influencers In Singapore
Top 20 Twitter Influencers In Singapore
 
Indonesia social media trend 2016 jakpat
Indonesia social media trend 2016 jakpatIndonesia social media trend 2016 jakpat
Indonesia social media trend 2016 jakpat
 
Digital in numbers indonesia (compilation)
Digital in numbers indonesia (compilation)Digital in numbers indonesia (compilation)
Digital in numbers indonesia (compilation)
 
7 insights of Indonesia
7 insights of Indonesia7 insights of Indonesia
7 insights of Indonesia
 

Similar to NE7012- SOCIAL NETWORK ANALYSIS

Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Web 3 for Social Data Week
Web 3 for Social Data WeekWeb 3 for Social Data Week
Web 3 for Social Data WeekPhilip Sheldrake
 
New Concepts: Relationship Elements Transcript (March 2020)
New Concepts: Relationship Elements Transcript (March 2020)New Concepts: Relationship Elements Transcript (March 2020)
New Concepts: Relationship Elements Transcript (March 2020)ALAeLearningSolutions
 
SEO & Artificial Intelligence: The new rules to stay on top!
SEO & Artificial Intelligence: The new rules to stay on top!SEO & Artificial Intelligence: The new rules to stay on top!
SEO & Artificial Intelligence: The new rules to stay on top!TheFamily
 
Content for the Web
Content for the WebContent for the Web
Content for the WebGraeme Smith
 
Semantic Web In Practice
Semantic Web In PracticeSemantic Web In Practice
Semantic Web In PracticeMilan Stankovic
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Through The Looking Glass Of Web 2.0
Through The Looking Glass Of Web 2.0Through The Looking Glass Of Web 2.0
Through The Looking Glass Of Web 2.0misty112858
 
My understanding semantic web
My understanding semantic webMy understanding semantic web
My understanding semantic webAyaz Shariff
 
The future of search powerpoint
The future of search powerpointThe future of search powerpoint
The future of search powerpointcc204834
 
Realizing a Semantic Web Application - ICWE 2010 Tutorial
Realizing a Semantic Web Application - ICWE 2010 TutorialRealizing a Semantic Web Application - ICWE 2010 Tutorial
Realizing a Semantic Web Application - ICWE 2010 TutorialEmanuele Della Valle
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated informationEric Sieverts
 
How To Search On The Net
How To Search On The NetHow To Search On The Net
How To Search On The NetErdem Genç
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
 
Semantic web and information graph
Semantic web and information graphSemantic web and information graph
Semantic web and information graphChao-Hsuan Shen
 
Rdf Based User Interfaces
Rdf Based User InterfacesRdf Based User Interfaces
Rdf Based User Interfacesvladungureanu
 

Similar to NE7012- SOCIAL NETWORK ANALYSIS (20)

Making things findable
Making things findableMaking things findable
Making things findable
 
Web 3 for Social Data Week
Web 3 for Social Data WeekWeb 3 for Social Data Week
Web 3 for Social Data Week
 
Tai web 3
Tai web 3Tai web 3
Tai web 3
 
Information Retrieval thru Cellular Devices
Information Retrieval thru Cellular DevicesInformation Retrieval thru Cellular Devices
Information Retrieval thru Cellular Devices
 
New Concepts: Relationship Elements Transcript (March 2020)
New Concepts: Relationship Elements Transcript (March 2020)New Concepts: Relationship Elements Transcript (March 2020)
New Concepts: Relationship Elements Transcript (March 2020)
 
SEO & Artificial Intelligence: The new rules to stay on top!
SEO & Artificial Intelligence: The new rules to stay on top!SEO & Artificial Intelligence: The new rules to stay on top!
SEO & Artificial Intelligence: The new rules to stay on top!
 
Content for the Web
Content for the WebContent for the Web
Content for the Web
 
Semantic Web In Practice
Semantic Web In PracticeSemantic Web In Practice
Semantic Web In Practice
 
Semantic Search with Topic Maps
Semantic Search with Topic MapsSemantic Search with Topic Maps
Semantic Search with Topic Maps
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Through The Looking Glass Of Web 2.0
Through The Looking Glass Of Web 2.0Through The Looking Glass Of Web 2.0
Through The Looking Glass Of Web 2.0
 
My understanding semantic web
My understanding semantic webMy understanding semantic web
My understanding semantic web
 
The future of search powerpoint
The future of search powerpointThe future of search powerpoint
The future of search powerpoint
 
Realizing a Semantic Web Application - ICWE 2010 Tutorial
Realizing a Semantic Web Application - ICWE 2010 TutorialRealizing a Semantic Web Application - ICWE 2010 Tutorial
Realizing a Semantic Web Application - ICWE 2010 Tutorial
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
How To Search On The Net
How To Search On The NetHow To Search On The Net
How To Search On The Net
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 
Searching Social Media
Searching Social MediaSearching Social Media
Searching Social Media
 
Semantic web and information graph
Semantic web and information graphSemantic web and information graph
Semantic web and information graph
 
Rdf Based User Interfaces
Rdf Based User InterfacesRdf Based User Interfaces
Rdf Based User Interfaces
 

Recently uploaded

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Recently uploaded (20)

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 

NE7012- SOCIAL NETWORK ANALYSIS

  • 1. NE7012 SOCIAL NETWORK ANATYSIS PREPARED BY: A.RATHNADEVI A.V.C COLLEGE OF ENGINEERING UNIT 1-INTRODUCTION
  • 2. UNIT I- INTRODUCTION Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks 1.1 INTRODUCTION TO WEB  The first web browser was invented in 1990 by Tim Berners-Lee.  It was called World Wide Web and was later renamed Nexus.  In 1993, Marc Andreesen created a browser that was easy to use and install with the release of Mosaic (later Netscape). 1.2 LIMITATIONS OF THE CURRENT WEB  There is a general consent that the Web is one of the greatest inventions of the 20th Century. But could it be better?  The reason that we do not often raise this question any more has to do with our unusual ability to adapt to the limitations of our information systems. In the case of the Web this means adaptation to our primary interface to the vast information that constitutes the Web: the search engine.  In the following we list four questions that search engines cannot answer at the moment with satisfaction or not at all. 1.2.1 What’s wrong with the Web? The questions below are specific for the sake of example, but they represent very general categories of search tasks. 1. Who is Frank van Harmelen?  To answer such a question using the Web one would go to the search engine and Enter the most logical keyword: Harmelen  The results returned by Google are shown in Figure 1.1. (Note that the results are slightly different depending on whether one enters Google through the main site or a localized version.)
  • 3.  If this question and answer would be parts of a conversation, the dialogue would sound like this Q: Who is Frank van Harmelen? A: I don’t know but there are over a million documents with the word“harmelen” on them and I found them all really fast (0.31s). Further, you can buy Harmelen at Amazon. Free Delivery on Orders Over 15.  Not only the advertisement makes little sense, but from the top ten results only six are related to the Frank van Harmelen we are interested in. Upon closer inspection the problem becomes clear: the word Harmelen means a number of things. It’s the name of a number of people, including the (unrelated) Frank van Harmelen and Mark van Harmelen.  Six of the hits from the top ten are related to the first person, one to the latter. Harmelen is also a small town in the Netherlands (one hit) and the place for a tragic train accident (one hit).  The problem is thus that the keyword Harmelen (but even the term Frank van Harmelen) is polysemous. The reason of the variety of the returned results is that designers of search engines know that users are not likely to look at more than the top ten results. Search engines are thus programmed in such a way that the first page shows a diversity of the most relevant links related to the keyword.
  • 4. 2. Show me photos of Paris  The most straightforward solution to this search task is typing in “Paris photos” in the search bar of our favorite search engine.  Most advanced search engines, however, have specific facilities for image search where we can drop the term photo from the query. Some of the results returned by Google Image Search are shown in Figure 1.2.  Again, what we immediately notice is that the search engine fails to discriminate two categories of images: those related to the city of Paris and those showing Paris Hilton, the heiress to the Hilton fortune whose popularity on the Web could hardly be disputed  More striking is the quality of search results in general. While the search engine does a good job with retrieving documents, the results of image searches in general are disappointing.
  • 5.  For the keyword Paris most of us would expect photos of places in Paris or maps of the city.  In reality only about half of the photos on the first page, a quarter of the photos on the second page and a fifth on the third page are directly related to our concept of Paris. The rest are about clouds, people, signs, diagrams etc.  The problem is that associating photos with keywords is a much more difficult task than simply looking for keywords in the texts of documents. Automatic image recognition is currently a largely unsolved research problem, which means that our computers cannot “see” what kind of object is on the photo.  Search engines attempt to understand the meaning of the image solely from its context, e.g. based on the name of the file and the text that surrounds the image. Inevitably, this leads to rather poor results. 3. Find new music that I (might) like  This query is at an even higher level of difficulty so much so that most of us Wouldn’t even think of posing it to a search engine. First, from the perspective of automation, music retrieval is just as problematic as image search.
  • 6.  As in the previous case, a search engine could avoid the problem of understanding the content of music and look at the filename and the text of the web page for clues about the performer or the genre.  We suspect that such search engines do not exist for different reasons: most music on the internet is shared illegally through peer-to-peer systems that are completely out of reach for search engines.  Music is also a fast moving good; search engines typically index the Web once a month and therefore too slow for the fast moving world of music releases.  But the reason we would not attempt to pose this query mostly has to do with formulating the music we like. Most likely we would search for the names of our favorite bands or music styles as a proxy, e.g. “new release” 4. Tell me about music players with a capacity of at least 4GB.  This is a typical e-commerce query: we are looking for a product with certain characteristics.  One of the immediate concerns is that translating this query from natural language to the Boolean language of search engines is almost impossible. We could try the search “music player” “4GB” but it is clear that the search engine will not know that 4GB is the capacity of the music player and we are interested in all players with at least that much memory (not just those that have exactly 4GB).  Such a query would return only pages where these terms occur as they are. Problem is that general purpose search engines do not know anything about music players or their properties and how to compare such properties.  They are good at searching for specific information (e.g. the model number of an MP3 player), but not in searching for descriptions of items. 1.2.2 Diagnosis: A lack of knowledge  The questions above are arbitrary in their specificity but they illustrate a general problem in accessing the vast amounts of information on the Web. Namely, in all five cases we deal with a knowledge gap: what the computer understands and able to work with is much more limited than the knowledge of the user.
  • 7.  The handicap of the computer is mostly due to technological difficulties in getting our computers to understand natural language or to “see” the content of images and other multimedia.  Even if the information is there, and is blatantly obvious to a human reader, the computer may not be able to see anything else of it other than a string of characters.  This problem affects all of the above queries to some extent. A human can Quickly skim the returned snippets (showing the context in which the keyword occurs) and realize that the different references to the word Harmelen do not all refer to persons and even the persons named Harmelen cannot all be the same.  In the second query, it is also blatantly obvious for the human observer that not all pictures are of cities. However, even telling cities and celebrities apart is a difficult task when it comes to image recognition.  In the case of the second query, an important piece of knowledge that the computer doesn’t possess is the common knowledge that there is a city named Paris and there is a famous person named Paris Hilton (who is also different from the Hilton in Paris).  Answering the third query requires the kind of extensive background knowledge about musical styles, genres etc. that shop assistants and experts in music possess. This kind of knowledge is well beyond the information that is in the database of a typical music store.  The third case is also interesting because there is also lacking background knowledge about the user. There has to be a way of providing this knowledge to the search engine in a way that it understands it.  The fourth query is not worthy because it highlights the problem of aggregating information. 1.2.3 The semantic solution  The idea of the Semantic Web is to apply advanced knowledge technologies in order to fill the knowledge gap between human and machine.  This knowledge can either be information that is already described in the content of the Web pages but difficult to extract or additional background knowledge that can help to answer queries in some way.
  • 8.  In the following we describe the improvement one could expect in case of our four queries based on examples of existing tools and applications that have been implemented for specific domains or organizational settings.  In the case of the first query the situation can be greatly improved by providing personal information in a semantic format.  Solution is to attach a semantic profile to personal web pages that describe the same information that appears in the text of the web page but in a machine process able format.  The Friend-of-a-Friend (FOAF) project provides a widely accepted vocabulary for such descriptions. FOAF profiles listing attributes such as the name, address, interests of the user can be linked to the web page or even encoded in the text of the page.  As we will see several profiles may also exist on the Web describing the same person. As all profiles are readable and comparable by machines, all knowledge about a person can be combined automatically.  The solution in the second case is to attach metadata to the images in question. For example, the online photo sharing site Flickr allows annotating images using geographic coordinates.  After uploading some photos users can add keywords to describe their images (e.g. “Paris, Eiffel-tower”) and drag and drop the images on a geographic map to indicate the location where the photo was taken. In the background the system computes the latitude and longitude of the place where the user pointed and attaches this information to the image.  Although in this case the system is not even aware that Paris is a city, minimal additional information about photos (the geo-coordinates) enables a kind of visualization that makes the searching task much easier.  In third case the background knowledge required for recommending music is already at work behind the online radio called Pandora. Pandora is based on the Music Genome Project, an attempt to create a vocabulary to describe characteristics of music from melody, harmony and rhythm, to instrumentation, orchestration, arrangement, lyrics, and the rich world of singing and vocal harmony.
  • 9.  Over several years thousands of songs have been annotated by experts in music theory. This knowledge is now used by the system to recommend unknown music to users based on their existing favorites.  Our fourth problem, the aggregation of product catalogs can also be directly addressed using semantic technology.  As we have seen the problem in this case is the difficulty of maintaining a unified catalog in a way that does not require an exclusive commitment from the providers of product information. (In practice, information providers often have their own product databases with a proprietary classification system.) Further, we would like to keep the catalogue open to data providers adding new, emerging categories of products and their descriptions (e.g. mp3 players as a subclass of music players with specific attributes such as capacity, size, color etc.) 1.3 DEVELOPMENT OF THE SEMANTIC WEB 1.3.1 Research, development and standardization  The vision of extending the current human-focused Web with machine processable descriptions of web content has been first formulated in 1996 by Tim Berners-Lee, the original inventor of the Web [BLFD99].  The Semantic Web has been actively promoted since by the World Wide Web Consortium (also led by Berners-Lee), the organization that is chiefly responsible for setting technical standards on the Web.  the Semantic Web has quickly attracted significant interest from funding agencies on both sides of the Atlantic, reshaping much of the AI research agenda in a relatively short period of time  In particular, the field of Knowledge Representation and Reasoning took center stage, but outcomes from other fields of AI have also been put into to use to support the move towards the Semantic Web: for example, Natural Language Processing and Information Retrieval have been applied to acquiring knowledge from the World Wide Web.
  • 10.  The complete list of individuals in this community consists of 608 researchers mostly from academia (79%) and to a lesser degree from industry (21%). Geographically, the community covers much of the United States, Europe, with some activity in Japan and Australia.  As Figure 1.5 shows, the participation rate at the individual ISWC events have quickly reached the level typical of large, established conferences and remained at that level even for the last year of data (2004), when the conference was organized in Hiroshima, Japan. The number of publications written by the members of the community that contain the keyword “SemanticWeb” has been sharply rising since the beginning.  The core technologies of the SemanticWeb, logic-based languages for knowledge representation and reasoning have been developed in the research field of Artificial Intelligence.  Tools for creating, storing and reasoning with ontologies have been primarily developed by university-affiliated technology startups (for example, Aduna, Onto Text and Ontoprise) and at research labs of large corporations (see for example he work of the advanced technology groups at IBM and Hewlett-Packard.  Most of these tools are available as open source as at the current stage vendors expect to make profit primarily by developing complete solutions and providing support for other developers.  The World Wide Web Consortium still plays a key role in standardization where the interoperability of tools necessitates mediation between various developer and user
  • 11. communities, as in the case of the development of a standard query language and protocol to access ontology stores across the Web. 1.3.2 Technology adoption  The SemanticWeb was originally conceptualized as an extension of the current Web, i.e. as the application of metadata for describing Web content. In this vision, the content that is already on the Web (text, but also multimedia) would be enriched in a collaborative effort by the users of the Web.  The SemanticWeb suffers from what the economist Kevin Kelly calls the fax-effect. Kelly notes that when the first fax machines were introduced, they came with a very hefty price tag.  Yet they were almost useless: namely, the usefulness of a fax comes from being able to communicate with other fax users.  In this sense every fax unit sold increases the value of all fax machines in use. While traditional goods such as the land or precious metals become more valuable the less is produced (called the law of scarcity), the fax machine network exhibits the opposite, which is called the law of plentitude.  What makes the case of the Semantic Web more difficult, however, is an additional cost factor. Returning to the example of the fax network, we can say that it required a certain kind of agreement to get the system working on a global scale: all fax machines needed to adopt the same protocol for communicating over the telephone line. This is similar to the case of the Web where global interoperability is guaranteed by the standard protocol for communication (HTTP). 1.4 THE EMERGENCE OF THE SOCIAL WEB  The first wave of socialization on the Web was due to the appearance of blogs, wikis and other forms of web-based communication and collaboration.  Blogs and wikis attracted mass popularity from around 2003 What they have in common is that they both significantly lower the requirements for adding content to the Web: editing blogs and wikis did not require any knowledge of HTML any more. Blogs
  • 12. and wikis allowed individuals and groups to claim their personal space on the Web and fill it with content at relative ease.  Although the example of Wikipedia, the online encyclopedia is outstanding, Wikis large and small are used by groups of various sizes as an effective knowledge management tool for keeping records, describing best practices or jointly developing ideas.  The first online social networks (also referred to as social networking services) entered the field at the same time as blogging and wikis started to take off. In 2003, the first- mover Friendster25 attracted over five million registered users in the span of a few months which was followed by Google and Microsoft starting or announcing similar services. 1.4.1 Web 2.0 + SemanticWeb =Web 3.0?  Web 2.0 is often contrasted to the Semantic Web, which is a more conscious and carefully orchestrated effort on the side of the W3C to trigger a new stage of developments using semantic technologies.
  • 13.  In practice the ideas of Web 2.0 and the Semantic Web are not exclusive alternatives: while Web 2.0 mostly effects how users interact with the Web, while the Semantic Web opens new technological opportunities for web developers in combining data and services from different sources.  The Semantic Web can offer to the Web 2.0 community is a standard infrastructure for the building creative combinations of data and services.  Standard formats for exchanging data and schema information, support for data integration, along with standard query languages and protocols for querying remote data sources provide a platform for the easy development of mashups. 1.5 STATISTICAL PROPERTIES OF SOCIAL NETWORKS  To study social networks, we first represent them as graphs. We want to understand the structural patterns and properties of these graphs Two types of properties:  Static properties: describing the structure of snapshots of graphs.  Dynamic properties: describing how the structure evolves over time.  These properties may be for unweight or weighted graphs, where weights may represent multi-edges (e.g. multiple phone calls from one person to another), or edge weights (e.g. monetary amounts between a donor and a recipient in a political donation network).  Properties to Understand: a) What do social networks look like, on a large scale? b) How do networks behave over time? c) How do the different components of an entire network form? d) How do the non-giant weakly connected components behave over time? e) What distributions and patterns do weighted graphs maintain? f) What happens when we take into account multiple edges and weighted edges? 1.5.1 Static Properties
  • 14.  While all networks we examine are evolving over time, there are properties that are measured at single points in time, that is, static snapshots of the graphs. For the purposes of organization we will further divide these properties into those applying to unweighted graphs and to weighted graphs. 1.5.1.1 Static Unweighted Graphs  Here, we present the ‘laws’ that apply to static snapshots of real graphs Without considering the weights on the edges. Those include the patterns in Degree distributions, the number of hops pairs of nodes can reach each other, Local number of triangles, eigenvalues and communities. Next, we describe the related patterns in more detail. 1) S-1: Heavy-tailed Degree Distribution  The degree distribution of many real graphs obey a power law f(d) ∝ d−, with  > 0, and f(d) being the fraction of nodes with degree d.  This means that there exist many low degree nodes, whereas only a few high degree nodes in real graphs 2) S-2: Small Diameter  The diameter of a static graph is the maximum distance between any two nodes.  Real world graphs often have small diameters.  This is known as the ‘small-world phenomenon’ or the ‘six degrees of separation’.  Diameter can be high jacked by long chains.  Therefore we use the effective diameter which is the minimum number of hops in which some fraction (usually 90%) of all connected node pairs can be reached. 3) S-3: Triangle Power Law (TPL)  The number of triangles ∆ follows a power-law in the form of f(∆) ∝ ∆σ, with the exponent σ < 0. The number of nodes that participate in ∆ number of triangles follows a power-law in the form of f(∆) ∝ ∆σ , with the exponent σ < 0.  TPL means that
  • 15.  Many nodes have only a few triangles in their neighborhoods and  A few nodes participate in many numbers of triangles with their neighbors. 4) S-4: Eigenvalue Power Law (EPL)  The eigenvalues of a graph are defined as the eigenvalues of its adjacency matrix. The set of eigenvalues of a graph is called a graph spectrum.  (For a matrix A, if there is a vector X s.t. AX = X for some scalar, then  is the eigenvalue of A assoc with eigenvector X.)  EPL states that the 20 or so largest eigenvalues of the Internet graph are power-law distributed. It has been shown that the Eigenvalue Power Law is a consequence of the Degree Power Law. 5) S-5: Community Structure  Real-world graphs exhibit a modular structure, with nodes forming groups, and possibly groups within groups.  In other words, the nodes form communities where groups of nodes in the same community are tighter connected to each other than to those nodes outside the community. 1.5.1.2 Static Weighted Graphs
  • 16.  We consider weighted directed graphs  Data set: records in the form (IP-source, IP destination, timestamp, number of-packets)  We can have multi-edges and weights  Notations:  W(t): the total weight up to time t  E(t): the number of distinct edges up to time t  Ed(t): the number of multi-edges (d stands for duplicate edges) up to time t  N(t): the number of nodes up to time t 1) SW-1: Weight Power Law (WPL)  Between W(t) and E(t), we observe that W(t) = E(t)w (w ranges from 1.01 to 1.5)  This means that more edges in the graph imply super linearly higher total weight.  We also have N(t) = E(t)n Ed(t) = E(t)dupe Nsrc(t) = E(t)nsrc Ndst(t) = E(t)dst 2) SW-2: Edge Weights Power Law  Given a real-world graph, nodes i and j with weights wi and wj , the edge ei,j with weight wi,j , then we have the power law
  • 17.  This means that the weight of a given edge and weights of its neighboring two nodes are correlated (similar to Newton’s Gravitational Law). 3) SW-3: Snapshot Power Laws (SPL)  Consider the i-th node of a weighted graph, at time t (a snapshot), and let outi , outwi be its out-degree and out-weight. Then  Where ow is the out-weight-exponent of the SPL. Similarly, for the in-degree, with in- weight-exponent iw.  The exponents iw and ow take values in the range [0.9-1.2] and [0.95-1.35], respectively.  The exponent over time remains almost constant. 1.5.2 Dynamic Properties  These are typically studied by looking at a series of static snapshots and seeing how measurements of these snapshots compare. Like the static properties we presented previously, we also divide these into properties that take into account weights and those that don’t.
  • 18. 1.5.2.1 Dynamic Unweighted Graphs  The patterns in dynamic time-evolving graphs that do not consider edge weights include the shrinking diameter property, the densification law, oscillating around a constant size secondary largest connected components, the largest eigenvalue law and the bursty and self-similar edge additions over time. We next describe these laws in detail. 1) D-1: Shrinking Diameter  It can be observed that not only is the diameter of real graphs small, but it also shrinks and then stabilizes over time.  There is a ‘gelling point’ at which many small disconnected components merge and form the largest connected component in the graph.  This is like the ‘coalescence’ of the graph at which point the diameter ‘spikes’.  Afterwards, with new edges the diameter keeps shrinking until it reaches an equilibrium.  The vertical line marks the gelling point. 2) D-2: Densification Power Law (DPL)  The relationship between E(t) and N(t) (the number of edges and nodes at time t) follows the Densification Power Law
  • 19.  where β is the densification exponent with value between 1.03 and 1.7  This indicates a super linearity between the number of nodes and the number of edges.  Also explain the densification effect.  For (c) the good linear fit agrees with the DPL.  (d) is the corresponding component sizes. 3) D-3: Diameter-plot and Gelling point  Real graphs exhibit a gelling point, at which the diameter spikes and (several) disconnected components gel into a giant component.  Before that point, the graph is more or less in an establishment period, typically consisting of a collection of small, disconnected components.  After the gelling point, the graph obeys the expected rules.  Example: PostNet data on slide 30 & 32. 4) D-4: Constant/Oscillating NLCCs  After the gelling point, the secondary and tertiary connected components remain of approximately constant size, with small oscillations.  New nodes typically link to the GCC  Very few of the newcomers link to the 2nd (or 3rd) CC, helping them to grow slowly  In very rare cases, a newcomer links both to an NLCC and GCC, thus leading to the absorption of the NLCC into the GCC
  • 20.  At that point, we have a drop in the size of the 2nd CC
  • 21. 5) D-5: LPL: Principal eigen value over time  The principal eigenvalue λ1(t) of the 0-1 adjacency matrix A and the number of edges E(t) over time follow a power law with exponent less than 0.5,especially after the ‘gelling point’. i.e. 1.5.2.2 Dynamic Weighted Graphs 1) DW-1: Bursty/self-similar weight additions  Tracking how much weight a graph puts on at each time interval (i.e. ΔW(t)) and looking at the entropy plots.  The weight additions over time show self-similarity.  If the edge weight is the number of reoccurrences of that edge, the slope of the plot > 0.95 (more uniform) For other feature as edge weight, the weight additions are more bursty, the slope being as low as 0.6 for the Network Traffic dataset. 2) DW-2: LWPL: Weighted principal eigenvalue over time  (λ1,w Power Law (LWPL)) Weighted real graphs exhibit a power law for the largest eigenvalue (i.e. λ1,w(t)) of the weighted adjacency matrix Aw and the number of edges E(t) over time. That is the exponent β ranged from 0.5 to 1.6
  • 22. Applications of these Laws  These patterns are helpful for 1. Spotting anomalous graphs and sub-graphs, 2. Answering questions about entities in a network, and 3. Answering questions about what-if scenarios.  Spotting anomalies is vital for 1. Determining abuse of networks 2. Fraudulent reputation building (in e-auction systems) 3. Detection of dwindling/abnormal social sub-groups 4. Network intrusion detection  Analyzing network properties is also useful for 1. Identifying authorities and search algorithms, 2. Discovering the “network value” of customers 3. Improve recommendation systems  What-if scenarios are vital for 1. Extrapolation, 2. Provisioning and 3. Algorithm design 1.6 NETWORK ANALYSIS  Social Network Analysis (SNA) is the study of social relations among a set of actors.  The key difference between network analysis and other approaches to social science is the focus on relationships between actors rather than the attributes of individual actors.  Network analysis takes a global view on social structures based on the belief that types and patterns of relationships emerge from individual connectivity and that the presence (or absence) of such types and patterns have substantial effects on the network and its constituents.
  • 23.  The network structure provides opportunities and imposes constraints on the individual actors by determining the transfer or flow of resources (material or immaterial) across the network.  SNA is thus a different approach to social phenomena and therefore requires a new set of concepts and new methods for data collection and analysis.  Network analysis provides a vocabulary for describing social structures, provides formal models that capture the common properties of all (social) networks and a set of methods applicable to the analysis of networks in general.  The concepts and methods of network analysis are grounded in a formal description of networks as graphs. Methods of analysis primarily originate from graph theory as these are applied to the graph representation of social network data.  The methods of data collection in network analysis are aimed at collecting relational data in a reliable manner. Data collection is typically carried out using standard questionnaires and observation techniques that aim to ensure the correctness and completeness of network data.  Often records of social interaction (publication databases, meeting notes, newspaper articles, documents and databases of different sorts) are used to build a model of social networks. 1.7 DEVELOPMENT OF THE SOCIAL WEB  The field of Social Network Analysis today is the result of the convergence of several streams of applied research in sociology, social psychology and anthropology.  Many of the concepts of network analysis have been developed independently by various researchers often through empirical studies of various social settings.  For example, many social psychologists of the 1940s found a formal description of social groups useful in depicting communication channels in the group when trying to explain processes of group communication.  Already in the mid-1950s anthropologists have found network representations useful in generalizing actual field observations, for example when comparing the level of reciprocity in marriage and other social exchanges across different cultures.
  • 24.  Despite the various efforts, each of the early studies used a different set of concepts and different methods of representation and analysis of social networks.  The term “social network” has been introduced by Barnes in 1954  This convergence was facilitated by the adoption of a graph representation of social networks usually credited to Moreno.  Moreno called a sociogram was a visual representation of social networks as a set of nodes connected by directed links.  The nodes represented individuals in Moreno’s work, while the edges stood for personal relations.  However, similar representations can be used to depict a set of relationships between any kind of social unit such as groups, organizations, nations etc.  It is a network image between workers (W), solderers(S) and inspectors (I).  While 2D and 3D visual modeling is still an important technique of network analysis, the sociogram is honored mostly for opening the way to a formal treatment of network analysis based on graph theory.  One of the relatively new areas of network analysis is the analysis of networks in entrepreneurship, an active area of research that builds and contributes to organization and management science.
  • 25.  The vocabulary, models and methods of network analysis also expand continuously through applications that require to handle ever more complex data sets.  An example of this process is the advances in dealing with longitudinal data. New probabilistic models are capable of modelling the evolution of social networks and answering questions regarding the dynamics of communities.  Formalizing an increasing set of concepts in terms of networks also contributes to both developing and testing theories in more theoretical branches of sociology.  The increasing variety of applications and related advances in methodology can be best observed at the yearly Sunbelt Social Networks Conference series, which started in 1980. The field of Social Network Analysis also has a journal of the same name since 1978.  While the field of network analysis has been growing steadily from the beginning, there have been two developments in the last two decades that led to an explosion in network literature.  First, advances in information technology brought a wealth of electronic data and significantly increased analytical power.  Second, the methods of SNA are increasingly applied to networks other than social networks such as the hyperlink structure on the Web or the electric grid. 1.8 KEY CONCEPTS AND MEASURES IN NETWORK ANALYSIS 1.8.1 Networks component  Actors (nodes, points, vertices): 1. Individuals, Organizations, Events … 2. Can have properties (attributes)  Relations (lines, arcs, edges, ties): between pairs of actors. 1. Undirected (symmetric) / Directed (asymmetric) 2. Binary / Valued
  • 26.  Most network analysis methods work on an abstract, graph based representation of real world networks.  The units of interest in a network are the combined sets of actors and their relations.  We represent actors with points and relations with lines.  In general, a relation can be:  Undirected / Directed  Binary / Valued
  • 27. 1.8.2 Types of networks  We can examine networks across multiple levels: 1. Ego network 2. Partial network 3. Complete or “Whole” network 1. Ego network  Have data on a respondent (ego) and the people they are connected to (alter).  May include estimates of connections among alters  Measures: 1. Size 2. Types of relations
  • 28. 2. Partial network  Ego networks plus some amount of tracing to reach contacts of contacts  Something less than full account of connections among all pairs of actors in the relevant population 3. Complete or “Whole” network  Connections among all members of a population.  Data on all actors within a particular (relevant) boundary.  Never exactly complete (due to missing data), but boundaries are set  E.g.: Friendships among workers in a company  Measures: 1. Graph properties 2. Density 3. Sub-groups 4. Positions 1.8.3 Basic data structures 1. from picture to matrices
  • 29. 2. from matrices to list 1.8.4 Measuring networks 1. Connectivity  Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them.
  • 30. Basic elements:  A path is a sequence of nodes and edges starting with one node and ending with another, tracing the indirect connection between the two. On a path, you never go backwards or revisit the same node twice. Example: a  b  cd  A walk is any sequence of nodes and edges, and may go backwards. Example: a  b  c  b c d  A cycle is a path that starts and ends with the same node. Example: a  b  c  a  If you can trace a sequence of relations from one actor to another, then the two are connected. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component.  Intuitively, a component is the set of people who are all connected by a chain of relations. 2. Distance and number of path  Distance is measured by the (weighted) number of relations separating a pair, using the shortest path.
  • 31. Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1  Paths are the different routes one can take. Node-independent paths are particularly important.
  • 32. 3. Centrality  Centrality refers to (one dimension of) location, identifying where an actor resides in a network.  Centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In the sense that they have many and important connections.  Three standard centrality measures capture a wide range of “importance” in a network: 1. Degree 2. Closeness 3. Betweenness 3.1. Degree centrality  No. of nodes adjacent to given node  Often used as measure of a node’s degree of connectedness and hence also influence and/or popularity  Useful in assessing which nodes are central with respect to spreading information and influencing others in their immediate ‘neighborhood’  Node 3 and 4 have the highest degree 4  Formula
  • 33. 3.2. Closeness centrality  An actor is considered important if he/she is relatively close to all other actors.  Sum of geodesic distances to all other nodes.  Inverse measure of centrality  It is a measure of reach, i.e. the speed with which information can reach other nodes from a given starting node
  • 34.  Node 3and 5 have the highest closeness, while node 2 fares almost as well.  formula 3.3. Betweenness centrality  Number of times a node lies along the shortest path between two others  Shows which nodes are more likely to be in communication paths between other nodes  Also useful in determining points where the network would break apart
  • 35.  Node 5 has the highest Betweenness centrality then 3  Betweenness centrality can be defined in terms of probability (1/gij), gij = number of geodesics that bond actors pi and pj. gij(pk)= number of geodesics which bond pi and pj and content pk. iij(pk) = probability that actor pk is in a geodesic randomly chosen among the ones which join pi and pj.  Betweenness centrality is the sum of these probabilities (Freeman, 1979).  Normalizad: C’B(pk) = CB(pk) / [(n-1)(n-2)/2] 3.3.1 Centralization  If we want to measure the degree to which the graph as a whole is centralized, we look at the dispersion of centrality  Freeman’s general formula for centralization (which ranges from 0 to 1):
  • 36. 3.3.2 Density  The more actors are connected to one another, the more dense the network will be.  Undirected network: n(n-1)/2 = 2n-1 possible pairs of actors.  Directed network: n(n-1)*2/2 = 2n-2possible lines. 1.8.5 Comparing across centrality values  Generally, the 3 centrality types will be positively correlated  When they are not correlated, it probably tells you something interesting about the network. Low Degree Low Closeness Low Betweenness High Degree Embedded in cluster that is far from the rest of the network Ego's connections are redundant - communication bypasses him/her High Closeness Key player tied to important important/active alters Probably multiple paths in the network, ego is near many people, but so are many others
  • 37. High Betweenness Ego's few ties are crucial for network flow Very rare cell. Would mean that ego monopolizes the ties from a small number 1.8.6 Social network software 1. UCINET  The Standard network analysis program, runs in Windows  Good for computing measures of network topography for single nets  Input-Output of data is a special 2-file format, but is now able to read PAJEK files directly.  Not optimal for large networks  Available from: Analytic Technologie 2. PAJEK  Program for analyzing and plotting very large networks  Intuitive windows interface  Started mainly a graphics program, but has expanded to a wide range of analytic capabilities  Can link to the R statistical package  Free  Available from: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ 3. NetDraw  Also very new, but by one of the best known names in network analysis software.  Free 1.9 DISCUSSION NETWORKS 1.9.1 Electronic discussion networks
  • 38.  Tyler, Wilkinson and Huberman analyze communication among employees of their own lab by using the corporate email archive. if they had exchanged at least a minimum number of total emails in a given period, filtering out one-way relationships.  Adamic and Adar revisits one of the oldest problems of network research, namely the question of local search.  How do people find short paths in social networks based on only local information about their immediate contacts?  Their findings support earlier results that additional knowledge on contacts such as their physical location and position in the organization allows employees to conduct their search much more efficiently than using the simple strategy of always passing the message to the most connected neighbor.  Discussions are largely in email and to a smaller part on the phone and in face-to-face meetings.  Group communication and collective decision taking in various settings are traditionally studied using much more limited written information such as transcripts and records of attendance and voting.  The main technical contribution of Gloor is a dynamic visualization of the discussion network that allows to quickly identify the moments when key discussions take place that activates the entire group and not just a few select members.  Gloor also performs a comparative study across the various groups based on the structures that emerge over time. 1.9.2 Blogs and online communities  Content analysis has also been the most commonly used tool in the computer-aided analysis of blogs (web logs), primarily with the intention of trend analysis for the purposes of marketing.  While blogs are often considered as “personal publishing” or a “digital diary”, bloggers themselves know that blogs are much more than that: modern blogging tools allow easily commenting and reacting to the comments of other bloggers, resulting in webs of communication among bloggers.  This fig shows some of the features of blogs that have been used in various studies to establish the networks of bloggers.
  • 39.  Blogs make a particularly appealing research target due to the availability of Structured electronic data in the form of RSS (Rich Site Summary) feeds.  RSS feeds contain the text of the blog posts as well as valuable metadata such as the timestamp of posts, which is the basis of dynamic analysis.  The 2004 US election campaign represented a turning point in blog research  as it has been the first major electoral contest where blogs have been exploited as a method of building networks among individual activists and supporters  Online community spaces and social networking services such as MySpace, Live Journal cater to socialization even more directly than blogs with features such as social networking (maintaining lists of friends, joining groups), messaging and photo sharing.  Most online social networking services (Friendster, Orkut, LinkedIn and their sakes) closely guard their data even from their own users.  A technological alternative to these centralized services is the FOAF network.
  • 40.  FOAF profiles are stored on the web site of the users and linked together using hyperlinks.  The drawback of FOAF is that at the moment there is a lack of tools for creating and maintaining profiles as well as useful services for exploiting this network.  Advantages 1. Easy to create and fast 2. Easy to add links, photos, videos 3. It can be used to create community  Disadvantage 1. Generally one author 2. Used for personal opinions and reflection 1.9.3 WEB BASED NETWORKS  The content of Web pages is the most inexhaustible source of information for social network analysis.  This content is not only vast, diverse and free to access but also in many cases more up to date than any specialized database.  On the downside, the quality of information varies significantly and reusing it for network analysis poses significant technical challenges.  There are two features of web pages that are considered as the basis of extracting social relations: links and co-occurrences.  The linking structure of the Web is considered as proxy for real world relationships as links are chosen by the author of the page and connect to other information sources that are considered authoritative and relevant enough to be mentioned.  The biggest drawback of this approach is that such direct links between personal pages are very sparse: due to the increasing size of the Web searching has taken over browsing as the primary mode of navigation on the Web.  As a result, most individuals put little effort in creating new links an updating link targets or have given up linking to other personal pages altogether.
  • 41.  Features in web pages that can be used for social network extraction.  Co-occurrences of names in web pages can also be taken as evidence of relationships and are a more frequent phenomenon.  On the other hand, extracting relationships based on co-occurrence of the names of individuals or institutions requires web mining as names are typically embedded in the natural text of web pages.  The techniques employed here are statistical methods possibly combined with an analysis of the contents of web pages.