Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast Object Instance Search From One Example

331 views

Published on

To locate the object jointly across video frames using spatiotemporal search.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Fast Object Instance Search From One Example

  1. 1. FAST OBJECT INSTANCE SEARCH FROM ONE EXAMPLE NAYAN SETH JINGJING MENG, JUNSONG YUAN, YAP-PENG TAN, GANG WANG, "FAST OBJECT INSTANCE SEARCH IN VIDEOS FROM ONE EXAMPLE" ARIZONA STATE UNIVERSITY
  2. 2. ARIZONA STATE UNIVERSITY OBJECTIVE ▸ To locate the object jointly across video frames using spatiotemporal search. Source: Object Localization Via Deep Neural Network
  3. 3. ARIZONA STATE UNIVERSITY WHY? ▸ Vigilance ▸ Combining with AI technologies to help improve crop productivity [Video] ▸ Autonomous Cars [Video] ▸ Robots which can catch moving objects
  4. 4. ARIZONA STATE UNIVERSITY PREVIOUS WORK ▸ Boundary Box (made efficient using Branch & Bound) ▸ Max Path (uses Sliding Window) Tran, D., Yuan, J., & Forsyth, D. (2014). Video Event Detection: From Subvolume Localization to Spatiotemporal Path Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2), 404-416.
  5. 5. ARIZONA STATE UNIVERSITY FEW MORE METHODS ▸ Video Google ▸ TRECVID
  6. 6. ARIZONA STATE UNIVERSITY RANDOMISED VISUAL PHRASE ▸ Generates frame-wise confidence maps ▸ Method works on individual frames ▸ Why? ▸ Handle scale variations of object and provide robustness ▸ No need to rely on image segmentation
  7. 7. ARIZONA STATE UNIVERSITY RANDOMISED VISUAL PHRASE (CONTD …) ▸ Local invariant features are extracted ▸ Random partition of image ▸ Each patch is bundled with visual phrases ▸ For each RVP, similarity score is computed with respect to query object independently. ▸ Score used as voting weight for corresponding patch ▸ Final confidence score of each pixel computed considering all the voting weights for the pixel.
  8. 8. ARIZONA STATE UNIVERSITY Yuning Jiang, Jingjing Meng, Junsong Yuan, and Jiebo Luo, “Randomized spatial context for object search,” Image Processing, IEEE Transactions on, p. to appear, 2015.
  9. 9. ARIZONA STATE UNIVERSITY LOCALISATION ▸ Object localisation done using RVP ▸ Drawbacks ▸ First, RVP depends on a heuristic segmentation coefficient α (alpha) to locate target objects in an image ▸ Second, its performance drops with insufficient rounds of partition, as the confidence map would not be salient. ▸ Hence used along with Max Path for better accuracy
  10. 10. ARIZONA STATE UNIVERSITY ALGORITHM ▸ V = {F1, F2…Fn} where Fi ∈ V ▸ Assumption: Trajectories are non-overlapping ▸ V = {V1, V2…Vn} where Vi ∈ V ▸ Ti = {Ti1, Ti2…Til} where l is total number of object trajectories ▸ Tij = {Bij1, Bij2…Bijk} where k is total number of frames in trajectory Tij ▸ To find the trajectory, Ti * = argmax s(Tij) where Tij ∈ Ti [equation 1] ▸ s(Tij) = ∑ s(B) where B ∈ Tij [equation 2] ▸ Once we have the best trajectory Ti * for each video chunk, we can then return the ranked results of all trajectories.
  11. 11. WHERE DOES MAX PATH COME INTO PLAY… to solve equation 1 ARIZONA STATE UNIVERSITY
  12. 12. ARIZONA STATE UNIVERSITY THERE’S A CATCH
  13. 13. ARIZONA STATE UNIVERSITY COARSE TO FINE SEARCH ▸ Build two file indexes ▸ Coarsely Sampled Frames (filters low confidence video chunks) ▸ Full Dataset Frames (computationally expensive) ▸ Ranking generated from the confidence score of Coarsely Sampled Frames ▸ Top ranking chunks - per frame confidence map generated
  14. 14. ARIZONA STATE UNIVERSITY OTHER METHODS DEPLOYED ▸ Hessian Affine Detectors [4] ▸ FLANN [5]
  15. 15. ARIZONA STATE UNIVERSITY RESULTS ▸ K = 200 rounds of partition for coarse round with α = 3 ▸ K = 50 rounds of partition for fine round with α = 3 ▸ Max Path done at 1:1 aspect ratio with local neighborhood of 3x3, spatial step size of 10 and temporal step of 1 frame ▸ β = -2 (the -ve coefficient)
  16. 16. ARIZONA STATE UNIVERSITY EFFICIENCY ▸ Quad Core Dual Processor ▸ 2.3GHz and 32GB RAM (no GPU) ▸ Coarse Ranking & Filtering of 10 objects, the average time is 0.833 seconds ▸ 28.738 seconds for obtaining top 100 trajectories for each query (3.833 seconds for frame wise voting map and 24.866 seconds for Max-Path search) ▸ 29.57 seconds excluding I/O for 5.5hr video dataset
  17. 17. ARIZONA STATE UNIVERSITY OBJECTS QUERIED Jingjing Meng, Junsong Yuan, Yap-Peng Tan, Gang Wang, "Fast Object Instance Search In Videos From One Example"
  18. 18. ARIZONA STATE UNIVERSITY Jingjing Meng, Junsong Yuan, Yap-Peng Tan, Gang Wang, "Fast Object Instance Search In Videos From One Example"
  19. 19. ARIZONA STATE UNIVERSITY ACCURACY Jingjing Meng, Junsong Yuan, Yap-Peng Tan, Gang Wang, "Fast Object Instance Search In Videos From One Example"
  20. 20. ARIZONA STATE UNIVERSITY REFERENCE 1. Tran, D., Yuan, J., & Forsyth, D. (2014). Video Event Detection: From Subvolume Localization to Spatiotemporal Path Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2), 404-416. 2. Yuning Jiang, Jingjing Meng, Junsong Yuan, and Jiebo Luo, “Randomized spatial context for object search,” Image Processing, IEEE Transactions on, p. to appear, 2015. 3. Jingjing Meng, Junsong Yuan, Yap-Peng Tan, Gang Wang, "Fast Object Instance Search In Videos From One Example" 4. Michal Perd’och, Ondrej Chum, and Jiri Matas, “Ef- ficient representation of local geometry for large scale object retrieval,” in Proc. IEEE Conf. on Computer Vi- sion and Pattern Recognition. IEEE, 2009, pp. 9–16. 5. Marius Muja and David G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configura- tion,” in International Conference on Computer Vision Theory and Application VISSAPP’09). 2009, pp. 331– 340, INSTICC Press.

×