2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

Multi-label Relational Neighbor Classification
using Social Context Features
Xi Wang and Gita Sukthankar
Department of EECS
University of Central Florida

Motivation
 The conventional relational
classification model focuses on
the single-label classification
problem.
 Real-world relational datasets
contain instances associated
with multiple labels.
 Connections between instances
in multi-label networks are
driven by various casual
reasons.
Example: Scientific collaboration network
Machine Learning
Data Mining
Artificial
Intelligence
1

Problem Formulation
 Node classification in multi-relational networks
 Input:
 Network structure (i.e., connectivity information)
 Labels of some actors in the network
 Output:
 Labels of the other actors
2

Classification in Networked Data
 Homophily: nodes with similar labels are more likely to be
connected
 Markov assumption:
 The label of one node depends on that of its immediate neighbors in
the graph
 Relational models are built based on the labels of neighbors.
 Predictions are made using collective inference.
3

Contribution
 A new multi-label iterative relational neighbor classifier
(SCRN)
 Extract social context features using edge clustering to
represent a node’s potential group membership
 Use of social features boosts classification performance
over benchmarks on several real-world collaborative
networked datasets
4

Relational Neighbor Classifier
 The Relational Neighbor (RN) classifier proposed by Macskassy et al.
(MRDM’03), is a simple relational probabilistic model that makes
predictions for a given node based solely on the class labels of its
neighbors.
Iteration 1 Iteration 2Training Graph
5

Relational Neighbor Classifier
 Weighted-vote relational neighbor classifier (wvRN)
estimates prediction probability as:
Here is the usual normalization factor, and
is the weight of the link between node and


ij Nv
jjjiii NcLPvvw
z
vcLP )|(),(
1
)|(
z w(vi,vj )
vi vj
6

Apply RN in Multi-relational Network
Ground truth
: nodes with both labels (red, green)
: nodes with green label only
: nodes with red label only
7

Edge-Based Social Feature Extraction
 Connections in human networks are mainly affiliation-
driven.
 Since each connection can often be regarded as principally
resulting from one affiliation, links possess a strong
correlation with a single affiliation class.
 The edge class information is not readily available in most
social media datasets, but an unsupervised clustering
algorithm can be applied to partition the edges into disjoint
sets (KDD’09,CIKM’09).
8

Cluster edges using K-Means
 Scalable edge clustering method proposed by Tang et al.
(CIKM’09).
 Each edge is represented in a feature-based format, where
each edge is characterized by its adjacent nodes.
 K-means clustering is used to separate the edges into
groups, and the social feature (SF) vector is constructed
based on edge cluster IDs.
Original network Step1 : Edge representations Step2: Construct social features
9

Edge-Clustering Visualization
Figure: A subset of DBLP with 95 instances. Edges are clustered into 10
groups, with each shown in a different color.
10

Proposed Method: SCRN
 The initial set of reference features for class c can be
defined as the weighted sum of social feature vectors for
nodes known to be in class c:
 Then node ’s class propagation probability for class c
conditioned on its social features:
RV(c) =
1
|Vc
K
|
P(li
c
=1)´SF(vi )
viÎVc
K
å
vi
PCP (li
c
| SF(vi ))= sim(SF(vi ), RV(c))
11

SCRN
 SCRN estimates the class-membership probability of node
belonging to class c using the following equation:
P(li
c
| Ni,SF(vi )) =
1
z
PCP (li
c
| SF(vi ))´w(vi,vj )´ P(lj
c
| Nj )
vj ÎNi
å
class propagation probability
similarity between connected nodes
(link weight)
class probability of its neighbors
vi
12

SCRN Overview
Input: , Max_Iter
Output: for nodes in
1. Construct nodes’ social feature space
2. Initialize the class reference vectors for each class
3. Calculate the class-propagation probability for each test
node
4. Repeat until # of iterations > Max_Iter or predictions
converge
 Estimate test node’s class probability
 Update the test node’s class probability in collective inference
 Update the class reference vectors
 Re-calculate each node’s class-propagation probability
{G,V,E,C,LK }
LU VU
13

SCRN Visualization
Figure: SCRN on synthetic multi-label network with 1000 nodes and 32 classes
(15 iterations).
14

Datasets
DBLP
 We construct a weighted collaboration network for
authors who have published at least 2 papers during the
2000 to 2010 time- frame.
 We selected 15 representative conferences in 6 research
areas:
DataBase: ICDE,VLDB, PODS, EDBT
Data Mining: KDD, ICDM, SDM, PAKDD
Artificial Intelligence: IJCAI, AAAI
Information Retrieval: SIGIR, ECIR
Computer Vision: CVPR
Machine Learning: ICML, ECML
15

Datasets
IMDb
 We extract movies and TV shows released between
2000 and 2010, and those directed by the same director
are linked together.
 We only retain movies and TV programs with greater
than 5 links.
 Each movie can be assigned to a subset of 27 different
candidate movie genres in the database such as
“Drama", “Comedy", “Documentary" and “Action”.
16

Datasets
YouTube
 A subset of data (15000 nodes) from the original
YouTube dataset[1] using snowball sampling.
 Each user in YouTube can subscribe to different interest
groups and add other users as his/her contacts.
 Class labels are 47 interest groups.
[1] http://www.public.asu.edu/~ltang9/social_ dimension.html
17

Comparative Methods
Edge (EdgeCluster)
wvRN
Prior
Random
18

Experiment Setting
 Size of social feature space :
 1000 for DBLP and YouTube; 10000 for IMDb
 Class propagation probability is calculated with the
Generalized Histogram Intersection Kernel.
 Relaxation Labeling is used in the collective inference
framework for SCRN and wvRN.
 We assume the number of labels for testing nodes is known.
19

Experiment Setting
 We employ the network cross-validation (NCV) method
(KAIS’11) to reduce the overlap between test samples.
 Classification performance is evaluated based on Micro-F1,
Macro-F1 and Hamming Loss.
20

Results (Micro-F1)
DBLP
10
20
30
40
50
60
70
5 10 15 20 25 30
Micro-F1accuracy(%)
Training data percentage(%)
SCRN
Edge
wvRN
Prior
Random
21

Results (Macro-F1)
DBLP
10
20
30
40
50
60
70
5 10 15 20 25 30
Macro-F1accuracy(%)
Training data percentage (%)
SCRN
Edge
wvRN
Prior
Random
22

Results (Hamming Loss)
DBLP
23

IMDb
24

YouTube
25

Conclusion
 Links in multi-relational networks are heterogeneous.
 SCRN exploits label homophily while simultaneously
leveraging social feature similarity through the introduction
of class propagation probabilities.
 Significantly boosts classification performance on multi-
label collaboration networks.
 Our open-source implementation of SCRN is available at:
http://code.google.com/p/multilabel-classification-on-social-network/
26

Reference
 MACSKASSY, S. A., AND PROVOST, F. A simple relational classifier. In
Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM) at
KDD, 2003, pp. 64–76.
 TANG, L., AND LIU, H. Relational learning via latent social dimensions. In
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD), 2009, pp. 817–826.
 TANG, L., AND LIU, H. Scalable learning of collective behavior based on sparse
social dimensions. In Proceedings of International Conference on Information and
Knowledge Management (CIKM), 2009, pp. 1107-1116.
 NEVILLE, J., GALLAGHER, B., ELIASSI-RAD, T., AND WANG, T. Correcting
evaluation bias of relational classifiers with network cross validation. Knowledge
and Information Systems (KAIS), 2011, pp. 1–25.
27

2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

Similar to 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features" (20)

Recently uploaded

Recently uploaded (20)

2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

Editor's Notes