Graph Based Machine Learning with Applications to Media Analytics

Graph based machine learning
with applications to media analytics

Lei Ding, PhD
9-1-2011

with collaborators at

Outline
•  Graph based machine learning
–  Basic structures
–  Algorithms
–  Examples
•  Applications in media analytics
–  Social analysis of videos
–  Content analysis of images

What is a graph

Not the graph we are going to talk about

What is a graph
•  A graph is composed of
–  Vertices (nodes): pixels, actors in videos, genes, ads, etc.
–  Edges: their relations
–  In machine learning, we are interested in predicting some quantity
(a class label, or a continuous value) at each unlabeled vertex

What is a graph
•  A graph is composed of
–  Vertices (nodes): pixels, actors in videos, genes, ads, etc.
–  Edges: their relations
–  In machine learning, we are interested in predicting some quantity
(a class label, or a continuous value) at each unlabeled vertex
•  Broadly speaking, there are two kinds of graphs

undirected directed

Graph based machine learning for
media analytics
•  Oftentimes, media content can be represented using graphs
•  Therefore, challenging inference problems with media content
can be answered by learning on graphs

Social content model

Content network
encodes content
similarity (videos,
audios, etc.)
Content generation
process

Social network
encodes peoples’
social connections

Can be used for media genre classification, media recommendation, etc.

Graph based machine learning
•  On undirected graphs
–  Optimization based approaches (e.g. energy minimization)
–  Probabilistic models (e.g. random fields)
•  On directed graphs
–  Optimization based approaches (e.g. directed energy minimization)
–  Probabilistic models (e.g. latent Dirichlet allocation, Bayesian networks)

Relations
•  How are they related to traditional stats learning (e.g. logistic regression)

(Sutton McCallum, 2007)

Learning on undirected graphs
•  Classification methods
–  We have some labeled data, and
want to predict labels for others
–  e.g. manifold regularization

•  Clustering methods
–  We would like to partition data
into clusters
–  e.g. spectral clustering

Constructing data graphs
•  How to transform a dataset ({xi}, i=1..m) into a graph

Affinity matrix
•  A graph is usually represented using an affinity matrix W,
where the corresponding entry is 1 if two vertices are
connected, and 0 otherwise

Graph Laplacians
•  L=D-W, where W is an affinity matrix, D is a diagonal matrix of
row sums

•  Discretization of Laplace-Beltrami operator on manifolds, which
is the sum of second order derivatives on tangent space (more
details later)

Function on graph
•  A vector can be used to represent a function over the graph
–  We can encode what we already know or what we want to predict in a
label function
–  For example in this graph, a vertex can represent a person, and the
function can represent if he is a likely customer
0 1

1

1
0

0

f = [ 1, 1, 0, 0, 1, 0 ] T

Properties of graph Laplacians
•  Symmetric and positive semi-definite
•  Graph Laplacian induces a smoothness term
–  Transposed label function f * Laplacian matrix L * label function f (always
non-negative)

–  Smoothness term (fTLf) measures how much the function f varies with
respect to the underlying graph
–  We have labels on some vertices, and want to predict labels on other
vertices. A smooth function (small fTLf) typically predicts well
•  Laplacian eigenvectors with small eigenvalues can be used for
data clustering / classification, data set parametrization, image
segmentation, etc.

Properties of graph Laplacians
•  Symmetric and positive semi-definite
•  Graph Laplacian induces a smoothness term
–  Transposed label function f * Laplacian matrix L * label function f (always
non-negative)

–  Smoothness term (fTLf) measures how much the function f varies with
respect to the underlying graph
–  We have labels on some vertices, and want to predict labels on other
vertices. A smooth function (small fTLf) typically predicts well
•  Laplacian eigenvectors with small eigenvalues can be used for
data clustering / classification, data set parametrization, image
segmentation, etc.

Now we are ready to see the algorithms, but let’s
take a little break to understand things even further

Manifold perspective of
data modeling

Why graphs encode underlying
data geometry
If we consider data as samples from an underlying manifold (which is a fairly weak
assumption), and construct the corresponding adjacency graph, then eigenvectors
of graph Laplacian approximate eigenfunctions of the Laplace-Beltrami operator
of the underlying data manifold

(Belkin Niyogi, 2008)

Laplacian eigenvectors
“understand” geometry

(Rustamov, 2007)

Spectral clustering

More information in von Luxburg (2007)

Spectral clustering explained
•  Why the eigenvectors of L with small eigenvalues are used as the new
representation?
•  The minimizers fi for the following total smoothness term are eigenvectors
of L with the smallest eigenvalues

Laplacian eigenmap
•  Using Laplacian eigenvectors with the smallest eigenvalues as
the new representation
•  Can be seen as a non-linear extension of PCA


Results on real data
•  Transform data using Laplacian eigenmap, and use linear
regression on the new representation


Manifold regularization
•  A comprehensive regularization framework

•  Through applying the representer theorem in functional
analysis, the optimal solution is as follows

(Belkin et al., 2006)

Results on real data

(Belkin et al., 2006)

Summary
•  Learning on graphs provides a set of powerful techniques for
data analysis and predictive analytics that “understand” the
geometry of underlying data
•  Spectral clustering – addresses the limitation with traditional
K-means
•  Laplacian eigenmap manifold regularization – learn a label
function respecting underlying data geometry, and hence
provide benefits over standard methods like PCA and linear
regression
•  Lots of other approaches as well – will talk about label
propagation based on graphs later in this presentation

Applications in media analytics
High-level analysis
Social relational inference
People to communities

Mid-level analysis
Event detection
Visual features to events

Low-level analysis
Segmentation
Pixels to semantic objects

Application 1: social analysis of
multimedia data

Friends or foes? Acquaintances or strangers? In same or different teams?

Social network learning and analysis

Social network learning and analysis

(Ding Yilmaz, 2010; 2011)

Application areas
•  Social content: given the growing popularity of social media, inferring
relations among people is becoming important
•  Visual recognition: social context is shown to help improve
recognition results from images (e.g. Wang et al., ECCV 10)
•  Surveillance: social network learning and analysis for surveillance
applications (e.g. Yu et al., CVPR 2009)
•  Sociology: necessary step in building intelligent systems for aiding
sociological discovery

Basic video processing
•  Videos segmented into semantic segments
–  Scenes, or visually coherent sets of shots, for movies and TV shows
–  Shot detection and merging based on key-frame similarity (Rasheed
Shah, 03)
•  Identifying the actors appearing in each segment
–  Using scripts and closed captions for movies
–  Face detection and recognition for other videos

Overall process
Social Relations video-level

A number [-1,+1] for each scene: positive
if actors in a scene are likely in the same Grouping cues
community, negative if otherwise

scene-level
Estimate the likely events in a scene
Event estimates

Dynamic systems represent scenes
Scene models

Feature observations frame-level

Visual features
•  Generic optical flow orientation histogram

Using visual concepts
•  Visual concept detection provides useful semantic features for inferring
social relations
•  Using Columbia s 374 SVM concept detectors on color/texture/edge
features, a concept score vector is generated for each scene

Evidence synthesis by Gaussian processes

Learned social affinity

—  Learned social network is represented by affinity matrix K

RACOM dataset
•  Ten example movies: (1) G.I. Joe: The Rise of Cobra (2009); (2) Harry
Potter and the Half-Blood Prince (2009); (3) Public Enemies (2009); (4)
Troy (2004); (5) Braveheart (1995); (6) Year One (2009); (7) Coraline
(2009); (8) True Lies (1994); (9) The Chronicles of Narnia: The Lion, the
Witch and the Wardrobe (2005); (10) The Lord of the Rings: The Return
of the King (2003) .

Analyzing social networks
•  We extend the max-min modularity principle such that it works with the
learned social networks, in order to detect the two communities for each
movie
•  We also identify the leaders of each community, which interestingly,
correspond to the hero/villain most of the time

Youtube dataset
•  10 videos for soccer games; 10 videos for demonstration;
•  The goal here is to predict a grouping cue for each scene.
We evaluate against ground truth labeling

Youtube results
•  Event categories are considered and labeled in a middle step
–  Soccer: (chasing, confronting, hugging, others)
–  Demonstration: (marching, confronting, public speaking, others)
•  Precision (+) for within-community instances and Precision (-) for across-
community instances are reported separately

Application 2: image content analysis
•  Interactive whole-object segmentation
–  Inputs: an image labeled pixels (seeds) for objects/background
–  Outputs: labels for all other pixels

(Ding Yilmaz, 2010)

Overview
•  To segment whole objects from images given user-supplied seeds
–  Different from unsupervised segmentation from a single image,
which typically generates homogeneous regions
–  The challenge is to segment objects using a small number of seeds
•  In addressing this problem, we have proposed
–  Probabilistic hypergraph image model (PHIM)
–  Automatic label set augmentation using boundary features
–  Multiple view learning synthesizing features

Graphs vs. hypergraphs
•  Graph based approaches have been popular for interactive segmentation
–  Graph cut (Rother et al., 2004)
–  Random walk (Grady, 2006)
•  Hypergraphs vs. graphs for images
–  Higher order relations among pixels that tend to form a segment are
encoded as hyperedges, which are collections of vertices
–  Model long-range dependencies among the entities (known and unknown
labels)

Our model: PHIM
•  We propose to use probabilistic hypergraph image model
(PHIM)
–  The relation between a hyperedge and a vertex is probabilistic, based
on probabilities learned from image appearance characteristics
•  Vertices: superpixels
•  Hyperedges: pair-wise + higher-order (generated by mean-
shift weak segmentation with varying color bandwidths)

Our model: PHIM (cont’d)
•  Feature vector Fs of a superpixel s contains average LUV color values
•  Incidences: kernel density estimator taking superpixel features as the input

•  Hyperedge weights: inhomogeneous hyperedges are down-weighted
–  Reduces to standard graph based edge weights when the hyperedge is of
size 2

Laplacians on PHIM
•  Normalized Laplacians on PHIM: induced quadratic form measures the
smoothness of a function with respect to the underlying edge system
–  We use probabilistic incidences (hv,e) in defining Laplacians on PHIM

•  Notations
–  f: vector of function values on vertices (+1 for object; -1 for
background)
–  H: probabilistic incidence matrix; W: hyperedge weight matrix
–  De: hyperedge degree matrix; Dv: vertex degree matrix

How to do segmentation
•  Constrained smoothness minimization
–  Essentially an interpolation, as we have confidence in user-supplied
segment labels

•  This interpolation can also be solved in an iterative manner
using the natural random walk

Dataset
•  GrabCut dataset of 50 images (Rother et al., 2004)
•  Seed pixels are provided in the form of trimaps
•  Ground-truth segmentations are supplied

Results on segmentation
•  Error rates averaged over the GrabCut dataset of 50 images
–  PHIM performs better than a standard graph
–  Our error rate 5.33% is much better than 7.9% achieved in (Blake et al.,
2006), and is comparable to state-of-the-art results from pixel-level
optimization

The end
•  Thanks!
•  References
–  Ulrike von Luxburg, A Tutorial on Spectral Clustering, 2007
–  Charles Sutton and Andrew McCallum, An Introduction to Conditional Random Fields for
Relational Learning, 2007
–  Raif Rustamov, Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation,
2007
–  Mikhail Belkin and Partha Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data
Representation, 2003
–  Mikhail Belkin and Partha Niyogi, Semi-Supervised Learning on Riemannian Manifolds, 2004
–  Mikhail Belkin, Partha Niyogi and Vikas Sindwani, Manifold Regularization: A Geometric
Framework for Learning from Labeled and Unlabeled Examples, 2006
–  Mikhail Belkin and Partha Niyogi, Convergence of Laplacian Eigenmaps, 2008
–  Lei Ding and Alper Yilmaz, Learning Relations Among Movie Characters: A Social Network
Perspective, 2010
–  Lei Ding and Alper Yilmaz, Interactive Image Segmentation Using Probabilistic Hypergraphs,
2010
–  Lei Ding and Alper Yilmaz, Inferring Social Relations from Visual Concepts, 2011

Graph Based Machine Learning with Applications to Media Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Graph Based Machine Learning with Applications to Media Analytics

Similar to Graph Based Machine Learning with Applications to Media Analytics (20)

More from NYC Predictive Analytics

More from NYC Predictive Analytics (10)

Recently uploaded

Recently uploaded (20)

Graph Based Machine Learning with Applications to Media Analytics