Graph Based Machine Learning with Applications to Media Analytics

6,580 views
6,182 views

Published on

Summary: Graphs are structures commonly used in computer science that model the interactions among entities. I will start from introducing the basic formulations of graph based machine learning, which has been a popular topic of research in the past decade and led to a powerful set of techniques. Particularly, I will show examples on how it acts as a generic data mining and predictive analytic tool. In the second part, I am going to discuss applications of such learning techniques in media analytics: (1) image analysis, where visually coherent objects are isolated from images; (2) social analysis of videos, where actors' social properties are predicted from videos. Materials in this part are based on our recent publications in highly selective venues (papers on https://sites.google.com/site/leiding2010/ ).

Bio: Lei Ding is a researcher making sense of large amounts of data in all media types. He currently works in Intent Media as a scientist, focusing on data analytics and applied machine learning in online advertising. Previously, he has worked in several research institutions including Columbia University, UIUC and IBM Research on digital / social media analysis and understanding. He received a Ph.D. degree in Computer Science and Engineering from The Ohio State University, where he was a Distinguished University Fellow.

Published in: Education, Technology
1 Comment
12 Likes
Statistics
Notes
  • I like it. It gives a general PGM view for algorithms in machine learning.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,580
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
0
Comments
1
Likes
12
Embeds 0
No embeds

No notes for slide

Graph Based Machine Learning with Applications to Media Analytics

  1. 1. Graph based machine learningwith applications to media analytics Lei Ding, PhD 9-1-2011 with collaborators at
  2. 2. Outline•  Graph based machine learning –  Basic structures –  Algorithms –  Examples•  Applications in media analytics –  Social analysis of videos –  Content analysis of images
  3. 3. Outline•  Graph based machine learning –  Basic structures –  Algorithms –  Examples•  Applications in media analytics –  Social analysis of videos –  Content analysis of images
  4. 4. What is a graph Not the graph we are going to talk about
  5. 5. What is a graph•  A graph is composed of –  Vertices (nodes): pixels, actors in videos, genes, ads, etc. –  Edges: their relations –  In machine learning, we are interested in predicting some quantity (a class label, or a continuous value) at each unlabeled vertex
  6. 6. What is a graph•  A graph is composed of –  Vertices (nodes): pixels, actors in videos, genes, ads, etc. –  Edges: their relations –  In machine learning, we are interested in predicting some quantity (a class label, or a continuous value) at each unlabeled vertex•  Broadly speaking, there are two kinds of graphs undirected directed
  7. 7. Graph based machine learning for media analytics•  Oftentimes, media content can be represented using graphs•  Therefore, challenging inference problems with media content can be answered by learning on graphs
  8. 8. Social content modelContent networkencodes contentsimilarity (videos,audios, etc.) Content generation processSocial networkencodes peoples’social connections Can be used for media genre classification, media recommendation, etc.
  9. 9. Graph based machine learning•  On undirected graphs –  Optimization based approaches (e.g. energy minimization) –  Probabilistic models (e.g. random fields)•  On directed graphs –  Optimization based approaches (e.g. directed energy minimization) –  Probabilistic models (e.g. latent Dirichlet allocation, Bayesian networks)
  10. 10. Relations•  How are they related to traditional stats learning (e.g. logistic regression) (Sutton McCallum, 2007)
  11. 11. Graph based machine learning•  On undirected graphs –  Optimization based approaches (e.g. energy minimization) –  Probabilistic models (e.g. random fields)•  On directed graphs –  Optimization based approaches (e.g. directed energy minimization) –  Probabilistic models (e.g. latent Dirichlet allocation, Bayesian networks)
  12. 12. Learning on undirected graphs•  Classification methods –  We have some labeled data, and want to predict labels for others –  e.g. manifold regularization•  Clustering methods –  We would like to partition data into clusters –  e.g. spectral clustering
  13. 13. Constructing data graphs•  How to transform a dataset ({xi}, i=1..m) into a graph
  14. 14. Affinity matrix•  A graph is usually represented using an affinity matrix W, where the corresponding entry is 1 if two vertices are connected, and 0 otherwise
  15. 15. Graph Laplacians•  L=D-W, where W is an affinity matrix, D is a diagonal matrix of row sums•  Discretization of Laplace-Beltrami operator on manifolds, which is the sum of second order derivatives on tangent space (more details later)
  16. 16. Function on graph•  A vector can be used to represent a function over the graph –  We can encode what we already know or what we want to predict in a label function –  For example in this graph, a vertex can represent a person, and the function can represent if he is a likely customer 0 1 1 1 0 0 f = [ 1, 1, 0, 0, 1, 0 ] T
  17. 17. Eigenvectors reviewed
  18. 18. Properties of graph Laplacians•  Symmetric and positive semi-definite•  Graph Laplacian induces a smoothness term –  Transposed label function f * Laplacian matrix L * label function f (always non-negative) –  Smoothness term (fTLf) measures how much the function f varies with respect to the underlying graph –  We have labels on some vertices, and want to predict labels on other vertices. A smooth function (small fTLf) typically predicts well•  Laplacian eigenvectors with small eigenvalues can be used for data clustering / classification, data set parametrization, image segmentation, etc.
  19. 19. Properties of graph Laplacians•  Symmetric and positive semi-definite•  Graph Laplacian induces a smoothness term –  Transposed label function f * Laplacian matrix L * label function f (always non-negative) –  Smoothness term (fTLf) measures how much the function f varies with respect to the underlying graph –  We have labels on some vertices, and want to predict labels on other vertices. A smooth function (small fTLf) typically predicts well•  Laplacian eigenvectors with small eigenvalues can be used for data clustering / classification, data set parametrization, image segmentation, etc. Now we are ready to see the algorithms, but let’s take a little break to understand things even further
  20. 20. Manifolds
  21. 21. Manifold perspective of data modeling
  22. 22. Why graphs encode underlying data geometryIf we consider data as samples from an underlying manifold (which is a fairly weakassumption), and construct the corresponding adjacency graph, then eigenvectorsof graph Laplacian approximate eigenfunctions of the Laplace-Beltrami operatorof the underlying data manifold (Belkin Niyogi, 2008)
  23. 23. Laplacian eigenvectors“understand” geometry (Rustamov, 2007)
  24. 24. Spectral clustering More information in von Luxburg (2007)
  25. 25. Spectral clustering explained•  Why the eigenvectors of L with small eigenvalues are used as the new representation?•  The minimizers fi for the following total smoothness term are eigenvectors of L with the smallest eigenvalues
  26. 26. Results
  27. 27. Laplacian eigenmap•  Using Laplacian eigenvectors with the smallest eigenvalues as the new representation•  Can be seen as a non-linear extension of PCA (Belkin Niyogi, 2003)
  28. 28. Results on real data•  Transform data using Laplacian eigenmap, and use linear regression on the new representation (Belkin Niyogi, 2004)
  29. 29. Manifold regularization•  A comprehensive regularization framework•  Through applying the representer theorem in functional analysis, the optimal solution is as follows (Belkin et al., 2006)
  30. 30. Results on real data (Belkin et al., 2006)
  31. 31. Summary•  Learning on graphs provides a set of powerful techniques for data analysis and predictive analytics that “understand” the geometry of underlying data•  Spectral clustering – addresses the limitation with traditional K-means•  Laplacian eigenmap manifold regularization – learn a label function respecting underlying data geometry, and hence provide benefits over standard methods like PCA and linear regression•  Lots of other approaches as well – will talk about label propagation based on graphs later in this presentation
  32. 32. Outline•  Graph based machine learning –  Basic structures –  Algorithms –  Examples•  Applications in media analytics –  Social analysis of videos –  Content analysis of images
  33. 33. Applications in media analytics High-level analysis Social relational inference People to communities Mid-level analysis Event detection Visual features to events Low-level analysis Segmentation Pixels to semantic objects
  34. 34. Application 1: social analysis of multimedia dataFriends or foes? Acquaintances or strangers? In same or different teams?
  35. 35. Social network learning and analysis
  36. 36. Social network learning and analysis
  37. 37. Social network learning and analysis(Ding Yilmaz, 2010; 2011)
  38. 38. Application areas•  Social content: given the growing popularity of social media, inferring relations among people is becoming important•  Visual recognition: social context is shown to help improve recognition results from images (e.g. Wang et al., ECCV 10)•  Surveillance: social network learning and analysis for surveillance applications (e.g. Yu et al., CVPR 2009)•  Sociology: necessary step in building intelligent systems for aiding sociological discovery
  39. 39. Basic video processing•  Videos segmented into semantic segments –  Scenes, or visually coherent sets of shots, for movies and TV shows –  Shot detection and merging based on key-frame similarity (Rasheed Shah, 03)•  Identifying the actors appearing in each segment –  Using scripts and closed captions for movies –  Face detection and recognition for other videos
  40. 40. Actor appearance matrix
  41. 41. Overall process Social Relations video-levelA number [-1,+1] for each scene: positiveif actors in a scene are likely in the same Grouping cuescommunity, negative if otherwise scene-levelEstimate the likely events in a scene Event estimates Dynamic systems represent scenes Scene models Feature observations frame-level
  42. 42. Key steps
  43. 43. Visual features•  Generic optical flow orientation histogram
  44. 44. Auditory features
  45. 45. Using visual concepts•  Visual concept detection provides useful semantic features for inferring social relations•  Using Columbia s 374 SVM concept detectors on color/texture/edge features, a concept score vector is generated for each scene
  46. 46. Evidence synthesis by Gaussian processes
  47. 47. Learned social affinity—  Learned social network is represented by affinity matrix K
  48. 48. Learned social networks
  49. 49. RACOM dataset•  Ten example movies: (1) G.I. Joe: The Rise of Cobra (2009); (2) Harry Potter and the Half-Blood Prince (2009); (3) Public Enemies (2009); (4) Troy (2004); (5) Braveheart (1995); (6) Year One (2009); (7) Coraline (2009); (8) True Lies (1994); (9) The Chronicles of Narnia: The Lion, the Witch and the Wardrobe (2005); (10) The Lord of the Rings: The Return of the King (2003) .
  50. 50. Analyzing social networks•  We extend the max-min modularity principle such that it works with the learned social networks, in order to detect the two communities for each movie•  We also identify the leaders of each community, which interestingly, correspond to the hero/villain most of the time
  51. 51. Max-min modularity
  52. 52. Visual maps
  53. 53. Quantitative evaluation
  54. 54. Detected social communities
  55. 55. Youtube dataset•  10 videos for soccer games; 10 videos for demonstration;•  The goal here is to predict a grouping cue for each scene. We evaluate against ground truth labeling
  56. 56. Youtube results•  Event categories are considered and labeled in a middle step –  Soccer: (chasing, confronting, hugging, others) –  Demonstration: (marching, confronting, public speaking, others)•  Precision (+) for within-community instances and Precision (-) for across- community instances are reported separately
  57. 57. Application 2: image content analysis•  Interactive whole-object segmentation –  Inputs: an image labeled pixels (seeds) for objects/background –  Outputs: labels for all other pixels (Ding Yilmaz, 2010)
  58. 58. Overview•  To segment whole objects from images given user-supplied seeds –  Different from unsupervised segmentation from a single image, which typically generates homogeneous regions –  The challenge is to segment objects using a small number of seeds•  In addressing this problem, we have proposed –  Probabilistic hypergraph image model (PHIM) –  Automatic label set augmentation using boundary features –  Multiple view learning synthesizing features
  59. 59. Graphs vs. hypergraphs•  Graph based approaches have been popular for interactive segmentation –  Graph cut (Rother et al., 2004) –  Random walk (Grady, 2006)•  Hypergraphs vs. graphs for images –  Higher order relations among pixels that tend to form a segment are encoded as hyperedges, which are collections of vertices –  Model long-range dependencies among the entities (known and unknown labels)
  60. 60. Our model: PHIM•  We propose to use probabilistic hypergraph image model (PHIM) –  The relation between a hyperedge and a vertex is probabilistic, based on probabilities learned from image appearance characteristics•  Vertices: superpixels•  Hyperedges: pair-wise + higher-order (generated by mean- shift weak segmentation with varying color bandwidths)
  61. 61. Our model: PHIM (cont’d)•  Feature vector Fs of a superpixel s contains average LUV color values•  Incidences: kernel density estimator taking superpixel features as the input•  Hyperedge weights: inhomogeneous hyperedges are down-weighted –  Reduces to standard graph based edge weights when the hyperedge is of size 2
  62. 62. Laplacians on PHIM•  Normalized Laplacians on PHIM: induced quadratic form measures the smoothness of a function with respect to the underlying edge system –  We use probabilistic incidences (hv,e) in defining Laplacians on PHIM•  Notations –  f: vector of function values on vertices (+1 for object; -1 for background) –  H: probabilistic incidence matrix; W: hyperedge weight matrix –  De: hyperedge degree matrix; Dv: vertex degree matrix
  63. 63. How to do segmentation•  Constrained smoothness minimization –  Essentially an interpolation, as we have confidence in user-supplied segment labels•  This interpolation can also be solved in an iterative manner using the natural random walk
  64. 64. Dataset•  GrabCut dataset of 50 images (Rother et al., 2004)•  Seed pixels are provided in the form of trimaps•  Ground-truth segmentations are supplied
  65. 65. Results on segmentation•  Error rates averaged over the GrabCut dataset of 50 images –  PHIM performs better than a standard graph –  Our error rate 5.33% is much better than 7.9% achieved in (Blake et al., 2006), and is comparable to state-of-the-art results from pixel-level optimization
  66. 66. Comparative results
  67. 67. The end•  Thanks!•  References –  Ulrike von Luxburg, A Tutorial on Spectral Clustering, 2007 –  Charles Sutton and Andrew McCallum, An Introduction to Conditional Random Fields for Relational Learning, 2007 –  Raif Rustamov, Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation, 2007 –  Mikhail Belkin and Partha Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, 2003 –  Mikhail Belkin and Partha Niyogi, Semi-Supervised Learning on Riemannian Manifolds, 2004 –  Mikhail Belkin, Partha Niyogi and Vikas Sindwani, Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples, 2006 –  Mikhail Belkin and Partha Niyogi, Convergence of Laplacian Eigenmaps, 2008 –  Lei Ding and Alper Yilmaz, Learning Relations Among Movie Characters: A Social Network Perspective, 2010 –  Lei Ding and Alper Yilmaz, Interactive Image Segmentation Using Probabilistic Hypergraphs, 2010 –  Lei Ding and Alper Yilmaz, Inferring Social Relations from Visual Concepts, 2011

×