Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

2,789 views

Published on

Max-kernel search: How to search for just about anything?

Nearest neighbor search is a well studied and widely used task in computer science and is quite pervasive in everyday applications. While search is not synonymous with learning, search is a crucial tool for the most nonparametric form of learning. Nearest neighbor search can directly be used for all kinds of learning tasks — classification, regression, density estimation, outlier detection. Search is also the computational bottleneck in various other learning tasks such as clustering and dimensionality reduction. Key to nearest neighbor search is the notion of “near”-ness or similarity. Mercer kernels form a class of general nonlinear similarity functions and are widely used in machine learning. They can define a notion of similarity between pairs of objects of any arbitrary type and have been successfully applied to a wide variety of object types — fixed-length data, images, text, time series, graphs. I will present a technique to do nearest neighbor search with this class of similarity functions provably efficiently, hence facilitating faster learning for larger data.

Published in: Technology

Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

  1. 1. Max-kernel search How to search for just about anything? Parikshit Ram
  2. 2. Similarity search q ● Set of objects ● Query R ● Similarity function 1
  3. 3. Finding similar images 2
  4. 4. Drug discovery 3 http://fineartamerica.com
  5. 5. Movie recommendations 4
  6. 6. Similarity search is ubiquitous ● Machine learning ● Computer vision ● Theory ● Databases ● Information retrieval ● Web application ● Collaborative filtering ● Scientific computing 5
  7. 7. Search-based classification 6
  8. 8. Search-based classification 6 ?
  9. 9. Search-based classification 6 k-nearest-neighbor classification/regression
  10. 10. Search-based classification 7 “RomCom fan”
  11. 11. Search-based classification 7 “Kids movie fanatic”
  12. 12. Search-based outlier detection 8
  13. 13. 9
  14. 14. Search-based ML Advantage ● nonparametric - lets the data speak ● no need to train complex models Key ingredient ● notion of similarity (domain/data-specific) Main challenge: efficiency ● Sheer size of the data ● Varied data types 10
  15. 15. Properties of similarity functions 11 ● symmetry OR
  16. 16. 11 3 1 The dissimilarity is the size of the set-theoretic difference
  17. 17. Properties of similarity functions 11 ● symmetry ● self-similarity OR OR
  18. 18. 11 We do not really care about this.
  19. 19. Properties of similarity functions 11 ● symmetry ● self-similarity OR OR
  20. 20. 12
  21. 21. 12
  22. 22. 12 Metrics used everywhere
  23. 23. 12 Metrics used everywhere
  24. 24. 12 Bregman divergences widely used for distributions Mercer kernels widely used in ML for variety of objects and problems ??? not quite explored in search or ML Metrics used everywhere
  25. 25. Breadth of Kernel Functions Objects Kernel Functions Images linear, polynomial, Gaussian, Pyramid match Documents cosine Sequences p-spectrum kernel, alignment score Trees subtree, syntactic, partial tree Graphs random walk Time series cross-correlation, dynamic time-warping Natural Lang. convolution, decomposition, lexical semantic 13
  26. 26. What is a Kernel Function? In words A pairwise symmetric function ● Correlation in a richer but hidden feature space ● Cannot access the hidden space Object space Hidden space Hidden mapping 14
  27. 27. Max-kernel Search Find the object in R most similar to q with respect to a kernel 15
  28. 28. Existing methods ● Brute-force (parallel/distributed) ○ Domain-specific optimizations ● Coerce data to use metrics ○ Only approximate No standard search tools! 16
  29. 29. Understanding kernels If two objects equally similar to each other then they are equally similar to the query q 17
  30. 30. IF 17 Understanding kernels THEN
  31. 31. 18 Indexing our collection
  32. 32. 18 Indexing our collection
  33. 33. Multi-resolution index in O( n log n ) time p 18 Indexing our collection Cover Tree (BKL 2006)
  34. 34. How to Search with this Index? 19 q p
  35. 35. How to Search with this Index? 19 q p p' p''
  36. 36. How to Search with this Index? q p p'' p' 19
  37. 37. How to Search with this Index? q p p'' p' 19
  38. 38. How to Search with this Index? q p p'' p' Safely ignore a large chunk (potentially millions) 19
  39. 39. Results: Efficiency Improvement 20
  40. 40. Results: Efficiency 10000x ● Widely applicable algorithm ● Performance data/kernel-dependent 10x Improvement 20
  41. 41. Results: Sublinear Query Time Improvement Object set size Bigger data implies bigger efficiency gains 21
  42. 42. Can We Prove it? What Makes Search Hard? Thm. For a set R of n objects, the query time is ● expansion constant ○ the distribution of the data ● directional concentration constant ○ the distribution of a kernel-induced transformation of the data 22
  43. 43. Endnote ● Search is an essential tool for ML ● Exploring different types of similarity functions increases the applicability and quality of search ● Kernels are widely applicable similarity functions ○ now we have provably fast max kernel search Code/tutorial for Fast Exact Max-Kernel Search 23 version 1.0.5 http://www.mlpack.org Ryan R. Curtin Email: pari@skytree.net

×