Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Creation and Optimization of a Logo
Recognition System
Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao
Academic Mento...
Problem Description
Problem: What if there was an
app that could provide a
smartphone user with
information about a compan...
Outline
Model Introduction
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Model Testing and Results
Qi, Richfield, Zeng...
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Model Testing and Results
Application Demons...
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Model Testing and Results
Application Demons...
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Qi, Richfield, Z...
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Interest points...
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Interest points...
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Interest points...
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Qi, Richfield, Zeng, Zhao
RIPS-HK:...
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the...
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the...
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the...
SURF: Interest points description
Calculate dominant orientation based on Haar wavelet analysis
Qi, Richfield, Zeng, Zhao
R...
SURF: Interest points description
Calculate dominant orientation based on Haar wavelet analysis
Build 4*4 descriptor
Qi, R...
BOW Training
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Vector Clustering
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster c...
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster c...
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster c...
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster c...
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster c...
K-means Clustering
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Hierarchical K-means
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
FEATURE VECTORS
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Ze...
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK...
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK...
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X
Qi, Richfield, Zeng, Zhao
RIPS-...
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X X
Qi, Richfield, Zeng, Zhao
RIP...
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X XXXXXX
X X X
Qi, Richfield, Zen...
Bag of Words and Hierarchical K-means
word
1
word
2
word
3
word
4
word
5
0
2
4
6
8
3
8
2
5
1
matches
Qi, Richfield, Zeng, Z...
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Inverted File Index
word 1:
word 2
word 3
word 4
word 5
word 6
...
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Inverted File Index
word 1: image 1, image 3, image 5, ...
word 2: image 4, image 9, image 16, ...
word 3: image 4, image ...
Classification: Inverted File Index
Benefit: retrieval via the inverted file is faster than
searching every image
Qi, Richfiel...
Classification: Inverted File Index
Benefit: retrieval via the inverted file is faster than
searching every image
Drawback: l...
Classification: Inverted File Index
Benefit: retrieval via the inverted file is faster than
searching every image
Drawback: l...
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Qi, Richfield, Zeng...
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
...
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
...
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
...
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
...
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
...
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
...
Convolutional Neural
Networks (CNNs)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Neural Networks
Figure: Neural network from http://www.texample.net/media/
tikz/examples/PNG/neural-network.png
Qi, Richfie...
Convolutional Neural Networks
Convolutional neural networks are neural networks with an
additional biological inspiration....
Convolutional Neural Networks
Convolutional neural networks are neural networks with an
additional biological inspiration....
Convolutional Neural Networks
Convolutional neural networks are neural networks with an
additional biological inspiration....
Figure: Description of convolution process from http://www.
songho.ca/dsp/convolution/files/conv2d_matrix.jpg.
Qi, Richfiel...
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we use...
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we use...
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we use...
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we use...
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we use...
AlexNet
Figure: Image of AlexNet architecture (from [?]). This also illustrates
how original the network was split to trai...
GoogLeNet
Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12x
fewer parameters than AlexNet.
Qi, Richfield,...
Filter/Layer Visualization
Let’s do some filter/layer visualization!
143.89.75.120/filayer.html
Qi, Richfield, Zeng, Zhao
RIP...
Model Testing
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images...
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images...
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images...
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images...
Testing the original pipeline
parameter tuning
cross validation
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
BOW structure: how to choose vocabulary size:
words = BL
B: number of branch; L: number of level
Qi, Rich...
Parameter Tuning
BOW structure: how to choose vocabulary size:
words = BL
B: number of branch; L: number of level
Too larg...
Parameter Tuning
BOW structure: how to choose vocabulary size:
words = BL
B: number of branch; L: number of level
Too larg...
Parameter Tuning
vocabulary size
How to choose the number of images returned by inverted
file index search
accuracy
the com...
Parameter Tuning
vocabulary size
How to choose the number of images returned by inverted
file index search
accuracy
the com...
Parameter Tuning
vocabulary size
How to choose the number of images returned by inverted
file index search
accuracy
the com...
Parameter Tuning
vocabulary size
the number of images returned by searching
the number of image shown
Re-ranking: how to d...
Parameters for Evaluation
vocabulary size
number of branch
number of level
the number of images returned by searching
the ...
Cross Validation
application
model selection
model assessment
procedure
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Cross Validation
randomly divide the data into K
equal sized parts.
leave out part k, fit the
model to the other K-1
parts(...
Testing Result
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
test on vocabulary size
optimal number of words: 500000 to 800000
number of branch = 14 or 15
number of lev...
Testing Result
With other
parameters fixed,
test on
weight factor
number of return
image
number of image
shown on the
clien...
Testing Result
optimal parameter
setting:
number of image
shown = 6
set number of
return image to
be 15, saving
about 0.3s...
Testing Summary
optimal parameter setting:
number of words: 500000 to 800000
number of image return: 15
number of image sh...
Evaluation of Deep Learning framework
Cross-validation for AlexNet (Top-5 Accuracy)
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94...
Evaluation of Deep Learning framework
Cross-validation for AlexNet
Final Accuracy reaches: (AlexNet)
AlexNet
Top-1 Accurac...
Evaluation of Deep Learning framework
Cross-validation for GoogleNet (Top-5 Accuracy)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Le...
Evaluation of Deep Learning framework
Cross-validation for AlexNet
Cross-validation for GoogleNet
Final Accuracy reaches: ...
Evaluation of Deep Learning framework
Final Comparison
GoogleNet AlexNet Visual Bag of Words
Accuracy (Top-5) 97.39% 96.73...
Demonstration
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system
We can enlarge the data set. (Currently 167 cl...
Future development
There is still something we can do to improve the system
We can enlarge the data set. (Currently 167 cl...
Future development
There is still something we can do to improve the system
We can enlarge the data set. (Currently 167 cl...
We would like to thank
Mr. Sun Lin and Lenovo-Hong Kong.
Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong
University ...
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Upcoming SlideShare
Loading in …5
×

rips-hk-lenovo (1)

249 views

Published on

  • Login to see the comments

  • Be the first to like this

rips-hk-lenovo (1)

  1. 1. Creation and Optimization of a Logo Recognition System Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao Academic Mentor: Dr. Albert Ku Industrial Mentor: Mr. Sun Lin August 6, 2015 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  2. 2. Problem Description Problem: What if there was an app that could provide a smartphone user with information about a company just by recognizing that company’s logo in an image? Goal: Create this app. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  3. 3. Outline Model Introduction Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  4. 4. Outline Model Introduction Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  5. 5. Outline Model Introduction Bag of Features Model Convolutional Neural Network Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  6. 6. Outline Model Introduction Bag of Features Model Convolutional Neural Network Model Testing and Results Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  7. 7. Outline Model Introduction Bag of Features Model Convolutional Neural Network Model Testing and Results Application Demonstration Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  8. 8. Outline Model Introduction Bag of Features Model Convolutional Neural Network Model Testing and Results Application Demonstration Conclusions and Future Work Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  9. 9. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  10. 10. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  11. 11. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  12. 12. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  13. 13. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  14. 14. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  15. 15. Feature Extraction Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  16. 16. Feature Extraction and description: SURF Interest points detection Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  17. 17. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  18. 18. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Interest points description Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  19. 19. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Interest points description Good representation form of image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  20. 20. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Interest points description Good representation form of image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  21. 21. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  22. 22. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Use box filter to approximate the second order derivative of Gaussian filter Second-order box filter Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  23. 23. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Use box filter to approximate the second order derivative of Gaussian filter Taking advantages of integral image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  24. 24. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Use box filter to approximate the second order derivative of Gaussian filter Taking advantages of integral domain Apply scale-space analysis to choose the appropriate points scale Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  25. 25. SURF: Interest points description Calculate dominant orientation based on Haar wavelet analysis Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  26. 26. SURF: Interest points description Calculate dominant orientation based on Haar wavelet analysis Build 4*4 descriptor Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  27. 27. BOW Training Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  28. 28. Feature Vector Clustering Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  29. 29. Basics of K-means Clustering Method in N-dimensional Space Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  30. 30. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  31. 31. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  32. 32. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  33. 33. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Cluster points based on min distance Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  34. 34. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Cluster points based on min distance Recalculate cluster centers: vi = 1 ci ci j=1 xj Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  35. 35. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Cluster points based on min distance Recalculate cluster centers: vi = 1 ci ci j=1 xj vi=new cluster center, ci=number of data points in ith cluster, xj=jth data point in ith cluster. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  36. 36. K-means Clustering Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  37. 37. Hierarchical K-means Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  38. 38. Bag of Words and Hierarchical K-means FEATURE VECTORS CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  39. 39. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  40. 40. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  41. 41. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. X Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  42. 42. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. X X Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  43. 43. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. X XXXXXX X X X Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  44. 44. Bag of Words and Hierarchical K-means word 1 word 2 word 3 word 4 word 5 0 2 4 6 8 3 8 2 5 1 matches Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  45. 45. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  46. 46. Inverted File Index word 1: word 2 word 3 word 4 word 5 word 6 ... Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  47. 47. Inverted File Index word 1: image 1, image 3, image 5, ... word 2: image 4, image 9, image 16, ... word 3: image 4, image 12, image 13, ... word 4: image 1, image 5, image 7, ... word 5: image 2, image 3, image 9, ... word 6: image 7, image 12, image 17, ... ... Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  48. 48. Classification: Inverted File Index Benefit: retrieval via the inverted file is faster than searching every image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  49. 49. Classification: Inverted File Index Benefit: retrieval via the inverted file is faster than searching every image Drawback: lack of spatial accuracy Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  50. 50. Classification: Inverted File Index Benefit: retrieval via the inverted file is faster than searching every image Drawback: lack of spatial accuracy Need additional verification to re-rank the retrieval images Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  51. 51. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  52. 52. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  53. 53. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  54. 54. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  55. 55. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  56. 56. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. If original norm is significantly smaller, count as “match”. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  57. 57. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. If original norm is significantly smaller, count as “match”. Sum up number of “matches” for each list image and divide by total number of features. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  58. 58. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. If original norm is significantly smaller, count as “match”. Sum up number of “matches” for each list image and divide by total number of features. The returned list is then re-ranked based on this “match ratio” and returned to the user. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  59. 59. Convolutional Neural Networks (CNNs) Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  60. 60. Neural Networks Figure: Neural network from http://www.texample.net/media/ tikz/examples/PNG/neural-network.png Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  61. 61. Convolutional Neural Networks Convolutional neural networks are neural networks with an additional biological inspiration. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  62. 62. Convolutional Neural Networks Convolutional neural networks are neural networks with an additional biological inspiration. Each layer is of two basic types: convolution and pooling. Convolution is the process of convolving an image with a kernel. This idea comes from image processing where it has been used for things like edge detection. Here, we want to learn kernels specific to the data. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  63. 63. Convolutional Neural Networks Convolutional neural networks are neural networks with an additional biological inspiration. Each layer is of two basic types: convolution and pooling. Convolution is the process of convolving an image with a kernel. This idea comes from image processing where it has been used for things like edge detection. Here, we want to learn kernels specific to the data. Pooling refers to the process of providing a statistical summary of the outputs of several nearby “neurons”, e.g. by taking an average or max. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  64. 64. Figure: Description of convolution process from http://www. songho.ca/dsp/convolution/files/conv2d_matrix.jpg. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  65. 65. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  66. 66. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  67. 67. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. GoogLeNet [?], the winner of the ILSVRC 2014. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  68. 68. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. GoogLeNet [?], the winner of the ILSVRC 2014. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  69. 69. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. GoogLeNet [?], the winner of the ILSVRC 2014. Both of these are provided in Caffe’s Model Zoo, with a file that stores the weights of these models after training on ImageNet. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  70. 70. AlexNet Figure: Image of AlexNet architecture (from [?]). This also illustrates how original the network was split to train on two GPUs. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  71. 71. GoogLeNet Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12x fewer parameters than AlexNet. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  72. 72. Filter/Layer Visualization Let’s do some filter/layer visualization! 143.89.75.120/filayer.html Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  73. 73. Model Testing Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  74. 74. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  75. 75. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. One problem we faced was that we downloaded either mislabeled images or irrelevant images. We filtered the dataset using two methods: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  76. 76. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. One problem we faced was that we downloaded either mislabeled images or irrelevant images. We filtered the dataset using two methods: compute the proportion of matching SIFT descriptors between the downloaded image and a reference image for that brand, and toss the image if it doesn’t meet some threshold Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  77. 77. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. One problem we faced was that we downloaded either mislabeled images or irrelevant images. We filtered the dataset using two methods: compute the proportion of matching SIFT descriptors between the downloaded image and a reference image for that brand, and toss the image if it doesn’t meet some threshold import ManualLabor Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  78. 78. Testing the original pipeline parameter tuning cross validation Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  79. 79. Parameter Tuning BOW structure: how to choose vocabulary size: words = BL B: number of branch; L: number of level Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  80. 80. Parameter Tuning BOW structure: how to choose vocabulary size: words = BL B: number of branch; L: number of level Too large: lack of generalization, overfitting Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  81. 81. Parameter Tuning BOW structure: how to choose vocabulary size: words = BL B: number of branch; L: number of level Too large: lack of generalization, overfitting Too small: lack of discrimination,mismatched Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  82. 82. Parameter Tuning vocabulary size How to choose the number of images returned by inverted file index search accuracy the computation time of re-ranking Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  83. 83. Parameter Tuning vocabulary size How to choose the number of images returned by inverted file index search accuracy the computation time of re-ranking How to choose the number of image shown in the client side accuracy mobile application, the size of screen Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  84. 84. Parameter Tuning vocabulary size How to choose the number of images returned by inverted file index search accuracy the computation time of re-ranking How to choose the number of image shown in the client side accuracy mobile application, the size of screen post Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  85. 85. Parameter Tuning vocabulary size the number of images returned by searching the number of image shown Re-ranking: how to determine weight factor w in the weighted function scores = w ∗ I + (1 − w) ∗ F I: number of inliers F: frequency of the brands in the return images Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  86. 86. Parameters for Evaluation vocabulary size number of branch number of level the number of images returned by searching the number of image shown weight factor w in the weighted function calculation of the accuracy one correct return then accuracy = 1 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  87. 87. Cross Validation application model selection model assessment procedure Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  88. 88. Cross Validation randomly divide the data into K equal sized parts. leave out part k, fit the model to the other K-1 parts(combined), and then obtain predictions for the left-out kth part this is done in turn for each part k=1,2,...K, and then the results are combined choose k = 5 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  89. 89. Testing Result Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  90. 90. Testing Result test on vocabulary size optimal number of words: 500000 to 800000 number of branch = 14 or 15 number of level = 5 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  91. 91. Testing Result With other parameters fixed, test on weight factor number of return image number of image shown on the client side Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  92. 92. Testing Result optimal parameter setting: number of image shown = 6 set number of return image to be 15, saving about 0.3s Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  93. 93. Testing Summary optimal parameter setting: number of words: 500000 to 800000 number of image return: 15 number of image shown: 6 stability of the system was also test: standard deviation of 5 fold cross validation range from 0.005 to 0.007 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  94. 94. Evaluation of Deep Learning framework Cross-validation for AlexNet (Top-5 Accuracy) 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 1000 6000 11000 16000 21000 26000 31000 36000 41000 46000 51000 56000 61000 66000 71000 76000 81000 86000 91000 96000 101000 106000 111000 116000 121000 126000 131000 136000 141000 146000 151000 156000 161000 166000 171000 176000 181000 186000 191000 196000 Cross Validation Example 94.63% 94.02% 93.80% 94.02% 93.90% 93.59% 94.11% 93.44% 94.54% 93.80% Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  95. 95. Evaluation of Deep Learning framework Cross-validation for AlexNet Final Accuracy reaches: (AlexNet) AlexNet Top-1 Accuracy 93.33% Top-5 Accuracy 96.73% Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  96. 96. Evaluation of Deep Learning framework Cross-validation for GoogleNet (Top-5 Accuracy) Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  97. 97. Evaluation of Deep Learning framework Cross-validation for AlexNet Cross-validation for GoogleNet Final Accuracy reaches: (GoogleNet) GoogleNet Top-1 Accuracy 94.05% Top-5 Accuracy 97.39% Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  98. 98. Evaluation of Deep Learning framework Final Comparison GoogleNet AlexNet Visual Bag of Words Accuracy (Top-5) 97.39% 96.73% 87.6% Efficiency Preprocess 8.47ms 7.5ms 6ms Classification 17.7ms 6.94ms SURF Feature extraction 24ms Total Time (Including some system level operation) 129ms 170ms 281ms Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  99. 99. Demonstration Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  100. 100. Future development There is still something we can do to improve the system We can enlarge the data set. (Currently 167 classes and 16,000 images) Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  101. 101. Future development There is still something we can do to improve the system We can enlarge the data set. (Currently 167 classes and 16,000 images) Test different deep learning frameworks. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  102. 102. Future development There is still something we can do to improve the system We can enlarge the data set. (Currently 167 classes and 16,000 images) Test different deep learning frameworks. Combine locally hand-crafted feature and globally deep learned feature to achieve better accuracy. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  103. 103. We would like to thank Mr. Sun Lin and Lenovo-Hong Kong. Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong University of Science and Technology. Professor Susanna Serna and the Institute for Pure and Applied Mathematics. The National Science Foundation for program funding - Grant DMS #0931852. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  104. 104. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo

×