Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Finding needles in haystacks with deep neural networks

1,308 views

Published on

PyData London 2016 talk on detecting duplicate products at Lyst

Published in: Data & Analytics
  • Be the first to comment

Finding needles in haystacks with deep neural networks

  1. 1. Finding needles in haystacks with deep neural networks 1 Calvin Giles calvin.giles@gmail.com @calvingiles
  2. 2. 2 Calvin Giles calvin.giles@gmail.com @calvingiles Who am I Data scientist at Lyst PyData organiser Recovering Nuclear Physicist Started learning ML 7 years ago Trained my first neural network 6 months ago
  3. 3. Needles We have 30 million products in our database. We would like to find pairs of these products that are the same. 3
  4. 4. Haystacks A brute force search of our database would find one correct pair in every 1,000. 4
  5. 5. One product Eight retailers Users like duplicate grouping Machine learning tasks are easier on merged data Why bother? 5
  6. 6. But first, a tangent into perspective… (Or why some perspectives are better than others) 6
  7. 7. Get some perspective, the lazy way 7
  8. 8. Get some perspective, the lazy way 8 Step 1 Train a neural network for classification
  9. 9. Get some perspective, the lazy way 9 Step 1 Train a neural network for classification Step 2 Label one example per class
  10. 10. Get some perspective, the lazy way 10 Step 1 Train a neural network for classification Step 2 Label one example per class Step 3 Train a simple model e.g. SVM
  11. 11. Get some perspective, the lazy way 11 Step 1 Train a neural network for classification Step 2 Label one example per class Step 3 Train a simple model e.g. SVM Step 4 Label samples that confuse the model
  12. 12. Get some perspective, the lazy way 12 Step 1 Train a neural network for classification Step 2 Label one example per class Step 3 Train a simple model e.g. SVM Step 4 Label samples that confuse the model Step 5 Repeat steps 3 and 4 until bored
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. 16
  17. 17. 17
  18. 18. Network structure 18
  19. 19. Network structure 19
  20. 20. Network structure 20 Network
  21. 21. Network structure 21 Network
  22. 22. Network structure 22 Network Image A representation
  23. 23. Network structure 23 Network Image A representation
  24. 24. Network structure 24 Network Image A representation Image B representation
  25. 25. Network structure 25 Network Image A representation Image B representation Loss
  26. 26. Network structure 26 Network Image A representation Image B representation Loss Back propagate
  27. 27. Contrastive loss 27 (from http://docs.chainer.org/en/stable/reference/functions.html#chainer.functions.contrastive)
  28. 28. 28 (from http://docs.chainer.org/en/stable/reference/functions.html#chainer.functions.contrastive) Contrastive loss
  29. 29. 29 (from http://docs.chainer.org/en/stable/reference/functions.html#chainer.functions.contrastive) Contrastive loss
  30. 30. 30 (from http://docs.chainer.org/en/stable/reference/functions.html#chainer.functions.contrastive) Contrastive loss
  31. 31. How to train a neural network for duplicate detection 31
  32. 32. How to train a neural network for duplicate detection 32 Step 1 Read a paper on face detection [Chorpa05]
  33. 33. How to train a neural network for duplicate detection 33 Step 1 Read a paper on face detection [Chorpa05] Step 2 Implement a siamese network
  34. 34. How to train a neural network for duplicate detection 34 Step 1 Read a paper on face detection [Chorpa05] Step 2 Implement a siamese network Step 3 Watch the loss decrease
  35. 35. How to train a neural network for duplicate detection 35 Step 1 Read a paper on face detection [Chorpa05] Step 2 Implement a siamese network Step 3 Watch the loss decrease Step 4 Look at the results
  36. 36. 36
  37. 37. 37
  38. 38. 38 Cleaning with phash and corroboration
  39. 39. 39 Match visually identical images with phash Cleaning with phash and corroboration
  40. 40. 40 Match visually identical images with phash High false positives and false negatives Cleaning with phash and corroboration
  41. 41. 41 Match visually identical images with phash High false positives and false negatives For a pair of products, consider the corroboration between multiple images Cleaning with phash and corroboration
  42. 42. Now with a clean dataset 42
  43. 43. 43 Now with a clean dataset
  44. 44. Hyper-parameter tuning is important 44
  45. 45. Hyper-parameter tuning is important 45
  46. 46. That was an old paper 46
  47. 47. That was an old paper 47 Step 1 Read two recent papers on face detection [Wang14, Schroff15]
  48. 48. That was an old paper 48 Step 1 Read two recent papers on face detection [Wang14, Schroff15] Step 2 Implement a triplet loss network
  49. 49. That was an old paper 49 Step 1 Read two recent papers on face detection [Wang14, Schroff15] Step 2 Implement a triplet loss network Step 3 Watch the loss decrease
  50. 50. That was an old paper 50 Step 1 Read two recent papers on face detection [Wang14, Schroff15] Step 2 Implement a triplet loss network Step 3 Watch the loss decrease Step 4 Visualise the detected duplicates
  51. 51. That was an old paper 51
  52. 52. 52
  53. 53. 53
  54. 54. There must be a better way 54
  55. 55. 55 2 products, 2 dimensions
  56. 56. 56 2 products, 2 dimensions
  57. 57. 57 2 products, 2 dimensions
  58. 58. 58 2 products, 2 dimensions + Gradient Clipping
  59. 59. 59
  60. 60. 60
  61. 61. 61
  62. 62. 62
  63. 63. 63
  64. 64. 64
  65. 65. 65
  66. 66. 66
  67. 67. 67
  68. 68. 68
  69. 69. 69
  70. 70. 70 So, the model can describe a trivial problem, can it handle more data?
  71. 71. 71 So, the model can describe a trivial problem, can it handle more data?
  72. 72. 72 Remove any pairs nearer than .001
  73. 73. 73 Remove any pairs nearer than .001
  74. 74. 74 This is great, but all the cool kids* are L2 norming their output *[Chopra05, Wang14, Schroff15]
  75. 75. Network structure 75 Network Image A representation Image B representation Loss
  76. 76. Network structure 76 Network Image A representation Image B representation Loss L2 L2
  77. 77. 77
  78. 78. Convolutional layers from ZfNet pre-trained by Eddie and weights fixed ReLU activation 78 224x224x3 (zero mean) 96C7S3:BN 256C5S2:BN 384C3 384C3 256C3
  79. 79. Fully connected MLP trained with ReLU activation and contrastive loss 79 9216 (from ZfNet) FC1024 FC128 MomentumSGD(.01, 0.9) GradientClipping(.01) WeightDecay(.01)
  80. 80. 80
  81. 81. 81
  82. 82. 82
  83. 83. 83 WIP
  84. 84. Neural networks can describe arbitrarily complex functions 84
  85. 85. Neural networks can describe arbitrarily complex functions They will cheat 85
  86. 86. More data can be unreasonably effective 86
  87. 87. More data can be unreasonably effective It will slow you down 87
  88. 88. Neural networks can be difficult to interpret 88
  89. 89. Neural networks can be difficult to interpret This is not an excuse to not try 89
  90. 90. Neural networks are fun :-) 90
  91. 91. thank you
  92. 92. References 92 S Chopra, R Hadsell, and Y LeCun. Learning a Similarity Metric Discriminatively, with Application to Face Verification. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005 vol. 1 pp. 539-546. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1467314 Jiang Wang, Yang song, Thomas Leung, Chuck Rosenberg, Jinbin Wang, James Philbin, Bo Chen, and Ying Wu. Learning Fine-grained Image Similarity with Deep Ranking. 2014. http://arxiv.org/abs/1404.4661v1 Florian Schroff, Dmitry Kalenichenko, and James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. 2015. http://arxiv.org/abs/1503.03832v3

×