Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Stanford-Nokia CollaborationMobile Augmented RealityAugust 2009 Review<br />	Bernd Girod		Radek Grzeszczuk<br />          ...
Mobile Augmented Reality Team<br />Radek Grzeszczuk<br />Bernd Girod<br />Vijay Chandrasekhar<br />Gabriel Takacs<br />Wei...
Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<...
Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<...
Mobile Visual Search<br />User takes picture<br />… chooses action          …<br />…confirms POI    <br />
Mobile Visual Search Applications <br />Museum Guide<br />Tourist Guide<br />Landmarks<br />Wine Labels<br />Comparison Sh...
GPS<br />Server<br />Landmark Recognition withFeature Matching on the Phone<br />Memorial Church<br />
Prefetched Data<br />“Bag of Words” Matching<br />Query Image<br />Geometric<br />Consistency<br />Check<br />Feature<br /...
Computing Visual Words<br />dx<br />dy<br />scale<br />SIFT Descriptor<br />SURF Descriptor<br />y<br />x<br />Σdx<br />Σd...
Matching Performance<br />~90 images/kernel<br />~90 images/kernel<br />~1000 images/kernel<br />True Matches<br />False M...
Timing Analysis(Q2 2008)<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />100 KByte JPEG; uplink 60 Kbps<br />Download...
Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<...
Advanced Feature Compression<br />Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al.,  VCIP 09]<br />Direct co...
Patch<br />CHoG: Compressed Histogram of Gradients<br />Gradient distributions<br />for each bin<br />Gradients<br />dx<br...
CHoG: Histogram Compression<br />0.46<br />1/2<br />0.21<br />1/4<br />0.46<br />0.16<br />1/8<br />  0.09<br />  0.08<br ...
Enumerating Huffman Trees<br />Rooted binary trees with nleaf nodes<br />
Feature Matching Performance<br />Tree Structured Vector Quantizer<br />SURF Transform<br />Random<br />Projections<br />B...
Compressed Domain Matching<br />1   2   3    4   5   6 <br />1<br />2<br />3<br />4<br />5<br />6 <br />Dist(·)<br />Dista...
Nearest Neighbor Search<br />372<br />Exact<br />ANN0.3 % errors<br />Exact<br />47<br />28<br />400<br />350<br />300<br ...
Location Histogram Coding<br />Feature<br />Locations<br />(x,y)<br />Spatial<br />Binning<br />Context-based<br />Arithme...
Compressed Feature Vector<br />52<br />84<br />1024<br />1088<br />59<br />Size (bits)<br />SIFT<br />Location x,y<br />10...
Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<...
Pairwise Comparison<br />“Bag of Words” Matching & Affine Consistency Check<br />
Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
Growing Vocabulary Tree<br />k = 3<br />[Nistér and Stewenius, 2006]<br />
k = 3<br />Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
Querying Vocabulary Tree<br />Query<br />
Recognition Accuracy<br />Forestof 6 trees<br />Recall (Percent)<br />Singlevocabulary<br />tree<br />Number of database i...
Vocabulary Forest<br />SVT<br />Features<br />…<br />…<br />Image<br />…<br />Image<br />…<br />IFS<br />Count<br />…<br /...
Real-time System: Send Image<br />Image<br />Wireless<br />Network<br />Information<br />Server<br />VocTreeImage <br />Ma...
Features<br />Wireless<br />Network<br />Information<br />Server<br />VocTree<br />Image<br />Matching<br />FeatureExtract...
Timing Analysis<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />Server Delay<br />Execution Time (sec)<br />Upload<br...
Timing Analysis<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />Execution Time (sec)<br />Server Delay<br />Upload<br...
Timing Analysis<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />Execution Time (sec)<br />Server Delay<br />Server De...
Streaming MAR<br />Server<br />Extract Features<br />Search K-D Tree<br />Check Geometry<br />Send Query Frame<br />Send I...
Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<...
Multiview Database<br />Front View Images<br />Top View Images<br />Bottom View Images<br />Right View Images<br />Left Vi...
Multiview Vocabulary Trees<br />Left<br />Front<br />Top<br />Bottom<br />Right<br />Query Image<br />Select Top Matches<b...
Multiview Matching Performance<br />Front SVT<br />Multiview SVTs<br />Image Recall<br />Match Rate <br />Query View<br />...
Compact Architectural Models from Geo-Registered Image Collections<br />GPS-tagged Images<br />Building Outline<br />Camer...
View-Invariant Matching Pipeline<br />Feature<br />Store<br />Feature Extraction<br />Image<br />Database<br />Rectified<b...
Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<...
Research Directions<br />Research area: image features<br />Keypoint detection optimized for CHoG, prioritization<br />Com...
Upcoming SlideShare
Loading in …5
×

Nokia Augmented Reality

4,096 views

Published on

Radek.Grzeszczuk's presentation from the Aug. 24, 2009 SDForum Virtual World SIG in Palo Alto and online at http://www.virtualworldsig.com

Published in: Technology, Education
  • Be the first to comment

Nokia Augmented Reality

  1. 1. Stanford-Nokia CollaborationMobile Augmented RealityAugust 2009 Review<br /> Bernd Girod Radek Grzeszczuk<br /> Stanford University Nokia Research Center<br />
  2. 2. Mobile Augmented Reality Team<br />Radek Grzeszczuk<br />Bernd Girod<br />Vijay Chandrasekhar<br />Gabriel Takacs<br />Wei-Chao Chen<br />Natasha Gelfand<br />Yingen Xiong<br />Kari Pulli<br />Sam Tsai<br />David Chen<br />Jana Kosecka<br />Ramakrishna Vedantham<br />Mina Makar<br />
  3. 3. Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<br />Computer vision: “Bag of Words” matching<br />Feature compression for server-side matching<br />Approaches explored: Transform coding of features, patch compression<br />Compressible descriptor: CHoG (Compressed Histogram of Gradients)<br />Scalability for large data bases<br />From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests”<br />Accuracy vs. data base size<br />Towards 3D<br />Multi-viewvocabulary trees<br />Matching against 3-d models<br />Summary and future directions<br />
  4. 4. Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<br />Computer vision: “Bag of Words” matching<br />Feature compression for server-side matching<br />Approaches explored: Transform coding of features, patch compression<br />Compressible descriptor: CHoG (Compressed Histogram of Gradients)<br />Scalability for large data bases<br />From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests”<br />Accuracy vs. data base size<br />Towards 3D<br />Multi-viewvocabulary trees<br />Matching against 3-d models<br />Summary and future directions<br />
  5. 5. Mobile Visual Search<br />User takes picture<br />… chooses action …<br />…confirms POI <br />
  6. 6. Mobile Visual Search Applications <br />Museum Guide<br />Tourist Guide<br />Landmarks<br />Wine Labels<br />Comparison Shopping<br />Ads/Catalogs<br />CDs/DVDs/Books<br />Movie Posters<br />
  7. 7. GPS<br />Server<br />Landmark Recognition withFeature Matching on the Phone<br />Memorial Church<br />
  8. 8. Prefetched Data<br />“Bag of Words” Matching<br />Query Image<br />Geometric<br />Consistency<br />Check<br />Feature<br />Descriptors<br />Feature<br />Correspondences<br />Database Images<br />
  9. 9. Computing Visual Words<br />dx<br />dy<br />scale<br />SIFT Descriptor<br />SURF Descriptor<br />y<br />x<br />Σdx<br />Σdy<br />Σ|dx|<br />Σ|dy|<br />Σ<br />Σ<br />Σ<br />Σ<br />Σ<br />Σ<br />Σ<br />Σ <br />Color<br />Gray<br />Dxx<br />Σdx<br />Σdy<br />Σ|dx|<br />Σ|dy|<br />Maxima<br />Dxy<br />…<br />…<br />DxxDyy-(0.9Dxy)2<br />Σdx<br />Σdy<br />Σ|dx|<br />Σ|dy|<br />Dyy<br />Orient along <br />dominant gradient<br />Oriented Patch<br />Gradient Field<br />Filters<br />Blob Response<br />
  10. 10. Matching Performance<br />~90 images/kernel<br />~90 images/kernel<br />~1000 images/kernel<br />True Matches<br />False Matches<br />
  11. 11. Timing Analysis(Q2 2008)<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />100 KByte JPEG; uplink 60 Kbps<br />Downloads<br />Upload<br />Upload<br />Geometric<br />Consistency<br />Extract<br />Features<br />Extract<br />Features<br />Feature Matching<br />Extract Features <br />on Phone<br />All on Phone<br />All on Server<br />
  12. 12. Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<br />Computer vision: “Bag of Words” matching<br />Feature compression for server-side matching<br />Approaches explored: Transform coding of features, patch compression<br />Compressible descriptor: CHoG (Compressed Histogram of Gradients)<br />Scalability for large data bases<br />From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests”<br />Accuracy vs. data base size<br />Towards 3D<br />Multi-viewvocabulary trees<br />Matching against 3-d models<br />Summary and future directions<br />
  13. 13. Advanced Feature Compression<br />Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al., VCIP 09]<br />Direct compression of oriented image patch [M. Makar et al., ICASSP 09]<br />Descriptor designed for compressibility: CHoG[Chandrasekhar et al., CVPR 09]<br />Tree-Structured Vector QuantizationTree Histogram Coding [Chen et al., DCC 09]<br />Compression of Location Information[Tsai et al., Mobimedia 09]<br />
  14. 14. Patch<br />CHoG: Compressed Histogram of Gradients<br />Gradient distributions<br />for each bin<br />Gradients<br />dx<br />dx<br />dx<br />dx<br />dx<br />dx<br />dx<br />dx<br />dy<br />dy<br />dy<br />dy<br />dy<br />dy<br />dy<br />dy<br />Spatial<br />binning<br />01101<br />101101<br />Histogram<br />compression<br />0100011<br />111001<br />0010011<br />01100<br />1010100<br />CHoGDescriptor<br />
  15. 15. CHoG: Histogram Compression<br />0.46<br />1/2<br />0.21<br />1/4<br />0.46<br />0.16<br />1/8<br /> 0.09<br /> 0.08<br />1/16<br />1/16<br />0.21<br />Gradient distribution<br />0.08<br />0.16<br />0.09<br />Huffman treeapproximatesprobabilities<br />Gradient binning<br />
  16. 16. Enumerating Huffman Trees<br />Rooted binary trees with nleaf nodes<br />
  17. 17. Feature Matching Performance<br />Tree Structured Vector Quantizer<br />SURF Transform<br />Random<br />Projections<br />BoostSSC<br />Patch + SIFT<br />CHoG<br />SIFT Transform<br />Ground truth data setof matching patches<br />Descriptor Size (bits)<br />[Winder & Brown CVPR ’07]<br />
  18. 18. Compressed Domain Matching<br />1 2 3 4 5 6 <br />1<br />2<br />3<br />4<br />5<br />6 <br />Dist(·)<br />Distance<br />Distance<br />Look-up table<br />Tree index<br />Gradient binning<br />Gradient distribution<br />
  19. 19. Nearest Neighbor Search<br />372<br />Exact<br />ANN0.3 % errors<br />Exact<br />47<br />28<br />400<br />350<br />300<br />250<br />Query Time (sec)<br />200<br />150<br />100<br />50<br />0<br />SIFT<br />CHoG<br />106 database descriptors<br />103 query descriptors<br />
  20. 20. Location Histogram Coding<br />Feature<br />Locations<br />(x,y)<br />Spatial<br />Binning<br />Context-based<br />Arithmetic Coding<br />-<br />Refinement Bits<br />Quantize<br />+<br />[Tsai et al., MobiMedia 2009]<br />
  21. 21. Compressed Feature Vector<br />52<br />84<br />1024<br />1088<br />59<br />Size (bits)<br />SIFT<br />Location x,y<br />1088 bits<br />CHoG <br />Location x,y<br />~ 84 bits<br />Compressedx,yCHoG<br />~ 59 bits<br />[Tsai et al., MobiMedia 2009]<br />
  22. 22. Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<br />Computer vision: “Bag of Words” matching<br />Feature compression for server-side matching<br />Approaches explored: Transform coding of features, patch compression<br />Compressible descriptor: CHoG (Compressed Histogram of Gradients)<br />Scalability for large data bases<br />From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests”<br />Accuracy vs. data base size<br />Towards 3D<br />Multi-viewvocabulary trees<br />Matching against 3-d models<br />Summary and future directions<br />
  23. 23. Pairwise Comparison<br />“Bag of Words” Matching & Affine Consistency Check<br />
  24. 24. Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
  25. 25. Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
  26. 26. Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
  27. 27. Growing Vocabulary Tree<br />k = 3<br />[Nistér and Stewenius, 2006]<br />
  28. 28. k = 3<br />Growing Vocabulary Tree<br />[Nistér and Stewenius, 2006]<br />
  29. 29. Querying Vocabulary Tree<br />Query<br />
  30. 30. Recognition Accuracy<br />Forestof 6 trees<br />Recall (Percent)<br />Singlevocabulary<br />tree<br />Number of database images<br />
  31. 31. Vocabulary Forest<br />SVT<br />Features<br />…<br />…<br />Image<br />…<br />Image<br />…<br />IFS<br />Count<br />…<br />Count<br />…<br />Early Termination<br />GCC<br />…<br />Combine Matches<br />
  32. 32. Real-time System: Send Image<br />Image<br />Wireless<br />Network<br />Information<br />Server<br />VocTreeImage <br />Matching<br />Feature <br />Extraction<br />Camera<br />Client<br />
  33. 33. Features<br />Wireless<br />Network<br />Information<br />Server<br />VocTree<br />Image<br />Matching<br />FeatureExtraction<br />Camera<br />Client<br />Coding<br />Real-time System: Send Features<br />
  34. 34. Timing Analysis<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />Server Delay<br />Execution Time (sec)<br />Upload<br />Image<br />40 kByte<br />Server Delay<br />Upload Features<br />2.2 kByte<br />Extract Features<br />“Send Features” “Send Image”<br />
  35. 35. Timing Analysis<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />Execution Time (sec)<br />Server Delay<br />Upload<br />Image<br />40 kByte<br />Server Delay<br />Upload 2.2 kByte<br />Extract Features<br />“Send Features” “Send Image”<br />
  36. 36. Timing Analysis<br />Nokia N95<br />332 MHz ARM<br />64 MB RAM <br />Execution Time (sec)<br />Server Delay<br />Server Delay<br />Extract Features<br />“Send Features” “Send Image”<br />
  37. 37. Streaming MAR<br />Server<br />Extract Features<br />Search K-D Tree<br />Check Geometry<br />Send Query Frame<br />Send ID and Geometry<br />Network<br />Low Motion<br />John Mayer<br />Inside Wants Out<br />Display ID and Draw Boundary<br />CompensateCamera Pose<br />Time<br />High Motion<br />Client<br />TrackCamera Pose<br />…<br />
  38. 38. Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<br />Computer vision: “Bag of Words” matching<br />Feature compression for server-side matching<br />Approaches explored: Transform coding of features, patch compression<br />Compressible descriptor: CHoG (Compressed Histogram of Gradients)<br />Scalability for large data bases<br />From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests”<br />Accuracy vs. data base size<br />Towards 3D<br />Multi-view vocabulary trees<br />City-scale landmark recognition using view invariant matching<br />Summary and future directions<br />
  39. 39. Multiview Database<br />Front View Images<br />Top View Images<br />Bottom View Images<br />Right View Images<br />Left View Images<br />
  40. 40. Multiview Vocabulary Trees<br />Left<br />Front<br />Top<br />Bottom<br />Right<br />Query Image<br />Select Top Matches<br />Select Top Matches<br />Select Top Matches<br />Select Top Matches<br />Select Top Matches<br />Geometric Consistency Check <br />Top Match<br />
  41. 41. Multiview Matching Performance<br />Front SVT<br />Multiview SVTs<br />Image Recall<br />Match Rate <br />Query View<br />Query View<br />Top<br />Right<br />Bottom<br />Right<br />Front<br />Left<br />Top<br />Bottom<br />Front<br />Left<br />
  42. 42. Compact Architectural Models from Geo-Registered Image Collections<br />GPS-tagged Images<br />Building Outline<br />Camera Poses Estimation<br />Robust Map Alignment<br />Efficient View<br />Selection<br />3D Model of Landmark<br />Unstructured Image Collections: Panoramio<br />Structured Image Collections: Street View data (Navteq)<br />[Grzeszczuk, 3DIM 2009]<br />
  43. 43. View-Invariant Matching Pipeline<br />Feature<br />Store<br />Feature Extraction<br />Image<br />Database<br />Rectified<br />Database Images<br />Image Rectification using 3D Model<br />Feature Extraction<br />Matching<br />Results<br />Oblique<br />Query Image<br />Rectified<br />Query Image<br />Image Rectification using Vanishing Points<br />
  44. 44. Outline<br />Review: landmark recognition system<br />Architecture: location-based pre-fetching and matching on the phone<br />Computer vision: “Bag of Words” matching<br />Feature compression for server-side matching<br />Approaches explored: Transform coding of features, patch compression<br />Compressible descriptor: CHoG (Compressed Histogram of Gradients)<br />Scalability for large data bases<br />From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests”<br />Accuracy vs. data base size<br />Towards 3D<br />Multi-viewvocabulary trees<br />Matching against 3-d models<br />Summary and future directions<br />
  45. 45. Research Directions<br />Research area: image features<br />Keypoint detection optimized for CHoG, prioritization<br />Comprehensive performance analysis of compressed feature matching<br />Next generation CHoG: soft kernels vs. hard binning, embedded, refinablebitstream<br />Beyond RANSAC: advanced geometry matching and coding, incorporate scale and orientation<br />Research area: image database/vocabulary trees<br />Optimum tree/forest growing, CHoG trees, incremental data base update<br />Fast query, early termination, distance metrics, scoring, nearest neighbor algorithms<br />Trees for phone implementation, inverted file caching, tree histogram coding<br />Research area: streaming mobile augmented reality<br />Camera pose estimation, feature tracking, temporally coherent feature extraction<br />Continuous recognition strategies, scheduling, latency minimization<br />Superposition of graphics information, motion compensation, occlusion handling<br />Research area: 3D modeling<br />Image matching pipeline using 3D models<br />Automatic image rectification, features from texture maps<br />Methods for integrating heterogeneous image sources<br />Demonstrate improved landmark recognition for large-scale urban scene<br />Collaboration with Marc Pollefeys, ETH Zurich<br />

×