Tutorial on Relationship Mining In Online Social Networks

Образец заголовка
Tutorial on Relationship Mining
In Online Social Networks
Peifeng Jing NetID: pjing2
Department of Computer Science
University of Illinois Urbana-Champaign
Prepared as an assignment for CS410: Text Information Systems in Spring 2016

Образец заголовкаAgenda
•  Introduction
•  Basic Concepts
•  Text Mining in Social Network
•  Sub-fields of Online Social Networks
•  Relationship Mining Systems
•  Summary

Social Networks Play An Important Role In
Our Daily Lives
•  People conduct communications and share
information through social relations with others
such as friends, family colleagues, collaborators,
and business partners.
h"p://socialnetworking.lovetoknow.com/image/37420~SocialNetworkingAnalysisSo@ware.jpg

Online Large-Scale Social Network
Is Growing
•  For example, Facebook, Twitter, Google, etc.
•  Over 1.59 billion users on Facebook
•  Over 555 million users on Twitter
h"p://www.staCsta.com/staCsCcs/272014/global-‐social-‐networks-‐ranked-‐by-‐number-‐of-‐users/

h"p://libeltyseo.com/wp-‐content/uploads/2013/03/social-‐networking.png

h"p://www.bramblingdesign.com/being-‐visible-‐Cps-‐on-‐how-‐to-‐best-‐use-‐social-‐network/

There Are Many Cool Applications In
Social Network and Relationship Mining
•  Business intelligence and market analysis
•  Discovery of community structures
•  Inferring social relationships
•  Personal profiling
•  E-mail filtering
•  Mining hidden advisor-advisee relationships in
academic networks
•  Positive/Negative relationship mining, e.g. trust
network and voting network
•  Biological networks, e.g. predicting food webs,
protein-protein interactions, metabolic networks, etc.

But We Have Problems To Find
Relations In Large-Scale Network
•  We used to depend on users to label
relations
•  Some personal information are hidden in
the online social network
•  People generally use unstructured or semi-
structured languages for communication

Образец заголовкаCan We Automatically Find Relations?
•  More digital data are available online
•  More people share personal information
through social media
h"p://www.slideshare.net/julia594/social-‐networking-‐sites-‐selling-‐informaCon-‐to-‐third-‐parCes

Образец заголовкаAgenda
•  Introduction
–  Problem Definition
–  Relationship Concepts in Sociology
•  Summary

What Is The Task of
Relationship Mining?
•  Introduction
•  Summary

Образец заголовкаWhat Is Relationship Mining?
•  User Information and
their relationships are not
complete in online social
network
•  Relationship Mining is to
predict or discover the
hidden relationships in
the social networks

Образец заголовкаFormulated Definition of Relationship Mining
•  Given a network G(V,E) and a list of
attributes A= {A1,…,Am}.
–  V = {v0,…,vn} represents users.
–  E={eij} are the set of connections
between users.
–  The attributes A represents profiles of
users.
–  The label L={lij} are the types of edge
{eij}
•  For user vi, some information might
be missing
–  attributes aij are unknown
–  edges eij are missing
–  labels lij are unknown
•  The task is to discover or predict the missing edges eij and unknown
labels lij
[1] Wenbin Tang, Honglei Zhuang, and Jie Tang, Learning to Infer Social Ties in Large Networkds, ECML/PKDD’11, 2011
[2]
Rui
Li,
Chi
Wang,
Kevin
Chen-‐Chuan
Chang,
User
Proﬁling
in
an
Ego
Network:
Coproﬁling
A"ributes
and
RelaConships,
WWW’14,
April
7-‐11,
2014,
Seuol,
Korea,
ACM

978-‐1-‐4503-‐2744-‐2/14/04

How Do We Measure The Edges
(Relations) In Social Network?
•  Binary Measurement
– 0: absent
– 1: existing
•  Strength of Social Tie
– A concept from Sociology
[1] Mark Granovetter, The Strength of Weak Ties: A Network Theory Revisited, Sociological Theory, Vol. 1 (1983), pp. 201-233
[2] Mark S. Granovetter, The Strength of Weak Ties, American Journal of Sociology, Vol. 78, No. 6 (1973), pp. 1360-1380
[3] David Easley, Jon Kleinberg, Network, Crowds, and Markets: Reasonging about a Highly Connected World, Cambridge University Press 2010, pp. 47-83

Образец заголовкаWhat Does Sociology Tell Us?
•  Introduction
•  Summary

Relationships Has Been Studied For A Long
Time In Sociology (Since 1954)
•  Sociologists consider relationships as
“Social Tie”
•  In sociology, social tie is measured by its
“Strength”
David Easley, Jon Kleinberg, Network, Crowds, and Markets: Reasonging about a Highly Connected World, Cambridge University Press 2010, pp. 47-83

Образец заголовкаWhat Is Social Tie?
•  Social tie (also called interpersonal tie) is the
information-carrying connections between people
•  Types of ties
–  Strong tie: the connections to people who you really trust,
whose social circles tightly overlap with your own.
–  Weak tie: the connections to people who are merely
acquaintances. Weak tie often provide access to novel
information that not circulate in the closely knit network of
strong ties
•  How do we distinguish strong and weak ties?
–  Strength of social ties
Eric Gilbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI ’09 Proceedings of the SIGCHI Conference of Human Factors in
Computing Systems, Page 211-220

Образец заголовкаProperty of Social Tie: Strength
•  Strength of ties: the strength of a tie is a (probably
linear) combination of the amount of time, the
emotional intensity, the intimacy (mutual confiding),
and the reciprocal services which characterize the
tie
•  Dimensions of Tie Strength
–  Amount of time, intimacy, intensity, reciprocal services,
network topology and information social circles, emotional
supports, socioeconomic status, education level, political
affiliation, race, gender, etc.
[1] Mohammad Karim Sohrabi, Soodeh Akbar, A comprehensive study on the effects of using data mining techniques to predict tie strength, Computers in
Human Behavior, 60 (2016), pp. 534-541
[2] Eric Glbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI’09 Proceedings of the SIGCHI Conference on Human Factors in
Computer Systems (2009) pp. 211-220

Образец заголовкаHistory of Tie Strength Research
•  The concept of tie strength was first introduced by Granovetter
(1973) and ties are split into “strong” and “weak”
•  Dimensions of tie strength are improved by Lin, Ensel, and Vaughn
(1981) and Wellman and Wortley (1990)
•  Marsden and Campbell (1984) first did researches to predict tie
strength
•  Krackhardt and Stern (1988) demonstrate strong relationship
between employees of different organizational sub-units can help
an organization to resist in the crisis
•  Strong partners are demonstrated to create crisis and pressure for
institutional changes in the organization (Krackhardt 1992)
•  Granovetter (1995) demonstrated that weak ties are more
beneficial for job seekers

Образец заголовкаHistory of Tie Strength Research (Cont.)
•  Burt (2009) believes that structural factors are effective in shaping
tie strength, such as network topology and informal social circle.
•  Wilson et al., Viswanath et al., Kahanda and Neville (2009) studied
social graphs and different interaction patterns. They also uses
characteristic features, topological features, transactional
characteristics and network-transactional features and concluded
that the most prominent features in predicting tie strength are
network-transactional characteristics
•  Gilbert and Karahalios (2009) uses 70 variables and achieve 85%
accuracy in predicting tie strength
•  In 2011, a study was conducted using regression analysis based
on the principle that tie strength is a combination of the variables,
such as friendships.

Образец заголовкаHistory of Tie Strength Research (Cont.)
•  In 2013, Servia-Rodriguez, Diaz-Redondo, Fernandez-Vilas and
Pazos-Arias extracted information through Facebook API’s (with
users permission)
•  In 2014, evaluation of performance testing is performed by means
of BFF (Fogues, Such, Espinosa and Garcia-Fornes)
•  Lee, Lee and Hwang (2014) used perceived business tie to
investigate trust transfer
•  Lin and Utz (2015) studied on the roles of tie strength in predicting
the emotional outcomes of reading a post on Facebook
•  Chen, Liy, and Zou (2016) proposed a Social Tie Factor Graph
(STFG) model to estimate the home locations of users in the
Twitter network based on user-centric data and tie strength

Another Approach To Categorize
Relationships: Positive/Negative Ties
•  Positive and Negative Social Ties
•  Also called “Signed Networks”
•  Positive Relationship: links to indicate friendship, support or
approval
•  Negative Relationship: links to indicate disapproval,
disagreement or distrust of opinions
•  It has cool applications in predicting voting and elections
Jure
Leskovec,
Daniel
Hu"enlocher,
Jon
Kleinberg,
PredicCng
PosiCve
and
NegaCve
Links
in
Online
Social
Networks,
WWW
2010,
April
26-‐30,
2010,
Raleigh,
North
Carolina,

USA,
ACM
978-‐1-‐60558-‐799-‐8/10/04

Why Do We Care Social Tie For
Relationship Mining?
•  We can use strength of social tie to
determine whether there are relations
between users
•  Based on theory of Sociology, different
types social ties represent different
properties of the social network

Образец заголовкаHow To Do Relationship Mining?
•  Introduction
•  Text Mining in Relationship Mining
•  Summary

Text Mining Is An Important Technology In
Relationship Mining
•  The most popular social networking websites are
Facebook, LinkedIn, and MySpace where text is
the dominant way of communication
•  People in online social networks generally use
unstructured or semi-structured languages for
communication
https://dcurt.is/facebooks-predicament http://www.forbes.com/forbes/welcome/

Образец заголовкаWhat Can Text Mining Do?
•  Pre-processing:
–  Feature Extraction
–  Feature Selection
–  Document Representation
Rizwana
Irfan,
et
al.,
A
Survey
on
Text
Mining
in
Social
Networks,
The
knowledge
Engineering
Review,
30(2)
(2015),
pp.
157-‐170

Образец заголовкаWhat Can Text Mining Do? (cont.)
•  Classification
–  Ontology Based
–  Machine Learning Based
Rizwana
Irfan,
et
al.,
A
Survey
on
Text
Mining
in
Social
Networks,
The
knowledge
Engineering
Review,
30(2)
(2015),
pp.
157-‐170

Образец заголовкаWhat Can Text Mining Do? (cont.)
•  Clustering
– Hierarchical Clustering
– Partitional Clustering
– Semantic-based
Clustering
Rizwana
Irfan,
et
al.,
A
Survey
on
Text
Mining
in
Social
Networks,
The
knowledge
Engineering
Review,
30(2)
(2015),
pp.
157-‐170

Образец заголовкаThree Sub-fields of Online Social Networks
•  Data
–  Data acquisition, storage and visualization
–  Scalability for large-scale network
•  Approaches on Relationship Mining
–  Positive/Negative Social Ties Prediction
–  Relationship classification
–  Relationship mining
•  Association between users’ attributes and relationships

Data Acquisition, Storage and
Visualization
•  Acquisition: HTML Web page; FOAF profiles from
the Semantic Web (using RDF crawler); Collection
of emails (from POP3 or IMAP store); Bibliographic
data; Publication data and research profile from
Web; Telecommunication data; and so on
•  Storage: Sesame server (Flink); RNKB (researcher
network knowledge base, ArnetMiner); Handoop
(BC-PDM)
•  Visualization: Model-View-Controller, Java Server
Pages and Java Standard Tag Library (Flink)
[1]
Yutaka
Matsuo,
et
al.,
POLYPHONET:
An
advanced
social
network
extracCon
system
from
the
Web,
Web
SemanCcs:
Science,
Services
and
Agents
on
the
World
Wide
Web,
2007

[2] Peter
Mika,
Flink:
SemanCc
Web
technology
for
the
extracCon
and
analysis
of
social
networks,
J.
Web
SemanCcs
3
(2)
(2005)
211–223

[3] Jie
Tang,
et
al.,
ArnetMiner:
ExtracCon
and
Mining
of
Academic
Social
Networks,
KDD’08,
August24-‐27,
2008,
Las
Vegas,
Nevada,
USA.

[4] Le Yu, et al. BC-PDM: Data Mining, Social Network Analysis and Text Mining System Based on Cloud Computing, KDD’12, Beijing China (2012) 1496-1499

Образец заголовкаScalability for Large-Scale Network
•  Filtering out pairs of persons that seem to have no
relation (POLYPHONET, TPFG model)
•  Parallel and cloud computing: Sesame Server
(Flink), Handoop (BC-PDM)
h"ps://clinked.com/2016/02/23/cloud-‐compuCng-‐beneﬁts-‐drawbacks/
h"p://www.ahay.org/wiki/Parallel_CompuCng

Образец заголовкаStrength Mining of Social Ties
•  Model strength as linear combination of the
predictive variables and network structures
•  Predictive variables: friendship, the intensity of
feelings, intimacy, mutual trust and mutual services,
and so on
•  In this study, Gilbert & Karahalio achieves accuracy
of about 85% with more than 70 predictive variables
[1] Mohammad Karim Sohrabi, Soodeh Akbar, A comprehensive study on the effects of using data mining techniques to predict tie strength, Computers in
Human Behavior, 60 (2016), pp. 534-541
[2] Eric Glbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI’09 Proceedings of the SIGCHI Conference on Human Factors in
Computer Systems (2009) pp. 211-220

Образец заголовкаExample (Gilbert & Karahalio 2009)
•  Predictive variables from online social media
–  Intensity: wall words exchanged, participant-initiated wall posts, friend-initiated wall posts,
inbox messages exchanged, inbox thread depth, participant’s status updates, friends status
updates
–  Intimacy: participant’s number of friends, friend’s number of friends, days since last
communication, wall intimacy words, inbox intimacy words, appearances together in photo,
participant’s appearance in photo, distance between hometowns (mi), friend’s relationship
status
–  Duration: days since first communication
–  Reciprocal Services: links exchanged by wall post, applications in common
–  Structural: number of mutual friends, groups in common, Norm. TF-IDF of interests and
about
–  Emotional Support: wall & inbox positive emotion words, wall & inbox negative emotion
words
–  Social Distance: age difference (days), number of occupations difference, educational
difference (degrees), overlapping words in religion, political difference (scale)
si =α + βRi +γDi + N(i)+εi
N(i) = λ0µM + λ1medM + λk (s −µM )k
+ λ5 minM + λ6 maxM
s∈M
∑
k=2
4
∑
M={sj: j and i are mutual friends}
Eric Glbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI’09 Proceedings of the SIGCHI Conference on Human Factors in Computer Systems (2009) pp. 211-220

Образец заголовкаPositive/Negative Prediction
•  Edge Sign Prediction Problem
Given a social network with signs on all its edges, but the
sign on the edge from node u to node v, denoted s(u, v), has
been “hidden.” How reliably can we infer this sign s(u, v)
using the information provided by the rest of the network?
•  Related Work
Leskovec et al. (2010) proposed a method to predict the
signs of links (positive or negative), yet the prediction of
both the existence of a link and its sign has not been well
studied. Recent development of social balance theory may
provide useful hints

Образец заголовкаExample (Leskovec et al. 2010)
•  Features: 23 features in the machine learning classification
–  Use and to denote the number of incoming positive and
negative edges to v, respectively. Similarly we use and to
denote the number of outgoing positive and negative edges from u,
respectively. We use C(u, v) to denote the total number of common
neighbors of u and v in an undirected
–  The second class of feature is each triad involving the edge (u, v),
consisting of a node w such that w has an edge either to or from u
and also an edge either to or from v; this leads to 16 possibilities
•  Logistic Regression Model
din
+
(v) din
−
(v)
dout
+
(u) dout
−
(u)
P(+ | x) =
1
1+exp[−(b0 + bi xi
i=1
n
∑ )]
Jure
Leskovec,
Daniel
Hu"enlocher,
Jon
Kleinberg,
PredicCng
PosiCve
and
NegaCve
Links
in
Online
Social
Networks,
WWW
2010,
April
26-‐30,
2010,
Raleigh,
North
Carolina,
USA,
ACM
978-‐1-‐60558-‐799-‐8/10/04

Образец заголовкаRelationship Classification
•  Real world domains are richly structured; entities of
multiple types are related to each other
•  Many relationships (links) are hidden in online social
network
•  Class of Relationships: friends, colleagues, families,
teammates, etc.
•  The relationship classification task: model for
effectively and efficiently mining relationship types

Образец заголовкаExample (Tang, et al. 2011)
•  Features: user-specific information, link-specific information
and global constraints
•  Relationship semantics: a triple (eij, rij, pij), where eij ∈ E is
a social relationship, rij ∈ Y is a label associated with the
relationship, and pij is the probability (confidence) obtained
by an algorithm for inferring relationship type
•  Factor functions: attribute factor, correlation factor
(correlation between the relationships), and constraint factor
(constraints between relationships)
Where is parameter configuration and s is factor functions
p(Y |G) =
1
Z
exp{θT
s(yi )
i
∑ }
θ
Wenbin
Tang,
Honglei
Zhuang,
and
Jie
Tang,
Learning
to
Infer
Social
Ties
in
Large
Networkds,
ECML/PKDD’11,
2011

Образец заголовкаRelationship Mining and Prediction
•  Many links are hidden in online networks. Generally, we do
not know which links are missing; So we need to predict/
mine hidden relationships
•  Two standard metrics are used to quantify the accuracy of
prediction algorithms
–  Area under the receiver operating characteristic curve (AUC):
Provided the rank of all non-observed links, the AUC value can be
interpreted as the probability that a randomly chosen missing link (i.e.,
a link in E) is given a higher score than a randomly chosen
nonexistent link
AUC = [(# of missing links that have higher score) + 0.5 x (# of missing link
that have the same score)] / (# of total comparisons)
–  Precision: Given the ranking of the non-observed links, the precision
is defined as the ratio of relevant items selected to the number of
items selected
Linyuan
Lu,
Tao
Zhou,
Link
preidicCon
in
complex
network:
A
survey,
Physica
A:
Sta-s-cal
Mechanics
and
its
applica-ons,
390
(6),
2011,
pp.
1150-‐1170

Образец заголовкаExample (Wang et al. 2010)
•  Mining advisor-advisee relationships from research
publication networks
•  Challenges
–  Latent relation
–  Time-dependent
–  Scalability
•  Joint probability
Rank score
P({yi,sti,edi}ai ∈Va ) =
1
Z
g(yi,sti,edi )
ai ∈Va
∏
rij = maxP(y1,..., yna
| yi = j)
Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, Jingyi Guo, Mining Advisor-Advisee Relationships from Research Publication Networks,
KDD’10 July 25-28, 2010, Washignton, DC, USA

Association of Users Attributes and
Relationships
•  The attributes of users and their relationships are
not independent from each other.
•  Two Tasks at The Same Time
–  User attribute profiling
–  Relationship type profiling
•  Advantages
–  Can achieve higher accuracy (e.g. 70%-90% precision in
Li et al, 2014)

Образец заголовкаExample (Staiano et al, 2012)
•  Inferring personality traits from social networks
•  Features: centrality measures, small world and
efficiency measures, transitivity measures, triadic
measures
•  Quantize personality traits score into two classes
(Low/High). Classification was performed by Means
of Random Forest
Jacopo Stalano, et al. Friends don’t Lie – Inferring Personality Traits from Social Network Structure, UbiComp’12 Pittsburgh, USA (2012) 321-330

Образец заголовкаExample (Li et al, 2014)
•  User profiling in an Ego network
•  Social connections are discriminatively correlated with attributes via a
hidden factor relationship type
•  Feature
–  The circle: a set of friends who have the same type of connections with the
ego
–  The attribute-circle dependency: the friends in a circle share the same value
with the ego user for certain attributes.
–  The circle-connection dependency: friends across circles are loosely
connected
•  Cost function: the linear combination of the three features
cost = λ1 {( (wt ⋅( fi − fj ))2
+ (wt ⋅( f0 − f i ))2
vi ∈Ci
∑
eij ∈E',vi,vj ∈Ci
∑ )}
t=1
K
∑ + λ2 (wt ⋅ fi −1)2
vi ∈L∩Ci
∑
t=1
K
∑ + λ3 1(1)
eij ∈E',xi!=xj
∑
Rui
Li,
Chi
Wang,
Kevin
Chen-‐Chuan
Chang,
User
Proﬁling
in
an
Ego
Network:
Coproﬁling
A"ributes
and
RelaConships,
WWW’14,
April
7-‐11,
2014,
Seuol,
Korea,
ACM

978-‐1-‐4503-‐2744-‐2/14/04

Образец заголовкаThere Are Many Well-Developed Relationship
Mining Systems
•  Flink (Peter Mika, 2005)
•  POLYPHONET (Matsuo et al., 2007)
•  BC-PDM (Yu et al., 2012)
•  etc.

Образец заголовкаFlink
Semantic Web technology for the extraction and
analysis of social networks
First Layer: metadata acquisition
Second Layer: Storage and Inference
Third Layer: visualization
Peter
Mika,
Flink:
SemanCc
Web
technology
for
the
extracCon
and
analysis
of
social
networks,
J.
Web
SemanCcs
3
(2)
(2005)
211–223

Образец заголовкаPOLYPHONET
•  Social network extraction system that extracts relations of persons,
detects groups of persons and obtains keywords for a person
•  Algorithms for social network extraction
–  Basic algorithm: co-occurrence, matching coefficient, Jaccard coefficient,
overlap coefficient
–  Advanced algorithm: classifying relations
–  Scalability: GoogleCooc, GoogleCoocTop
Overview of module dependency Relate–identify process of Iterative Social Network Mining
Yutaka
Matsuo,
et
al.,
POLYPHONET:
An
advanced
social
network
extracCon
system
from
the
Web,
Web
SemanCcs:
Science,
Services
and
Agents
on
the
World
Wide
Web,
2007

Образец заголовкаBC-PDM
•  Data mining, social network analysis and text mining
system based on cloud computing
The result of community detection
The architecture of BC-PDM
Le Yu, et al. BC-PDM: Data Mining, Social Network Analysis and Text Mining System Based on Cloud Computing, KDD’12, Beijing China (2012) 1496-1499

Образец заголовкаSummary
•  Online social network play a more important role
in our life; and digital data allows us to
automatically find relationships in our social
networks
•  Text Mining plays an important role in social
network applications
–  Pre-processing: feature extraction, feature
selection, document representation
–  Classification: Ontology Based, Machine Learning
Based
–  Clustering: Hierarchical Clustering, Partitional
Clustering, Semantic-based Clustering

Образец заголовкаSummary (cont.)
Researches in Relationship Mining
•  Data
–  Data acquisition, storage and visualization
–  Scalability for large-scale network
•  Approaches on Relationship Mining
–  Positive/Negative Social Ties Prediction
–  Relationship classification
–  Relationship mining
•  Association between users’ attributes and relationships
There are Many Systems for relationship extraction and mining
•  Flink, POLYPHONET, BC-PDM, etc.

Tutorial on Relationship Mining In Online Social Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Tutorial on Relationship Mining In Online Social Networks

Similar to Tutorial on Relationship Mining In Online Social Networks (20)

Recently uploaded

Recently uploaded (20)

Tutorial on Relationship Mining In Online Social Networks