Your SlideShare is downloading. ×
GeoVid                 Geo-referenced Video                    Management            Roger Zimmermann, Seon Ho Kim, Sakire...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Motivation (1)• Trends      – User-generated video content is growing rapidly.      – Mobile devices make it easy to captu...
Motivation (2)• User-Tagging      – Laborious and often ambiguous or subjective• Complementary Technique      – Automatica...
Motivation (3)• Recent progress in sensors and integrationTraditionally:                  Network                      Sen...
Challenges (1)• Capacity constraint of the battery• Wireless bandwidth bottleneck• Searchability of videos    Open-domain ...
Challenges (2)• Video and sensor information storage and indexing• Result ranking• Result presentation                    ...
Sensor-Rich Video• Characteristics:      – Concurrently collect sensor generated geospatial (and        other) contextual ...
Overview of Approach 1. Viewable scene modeling 2. Video and meta-data acquisition 3. Indexing, querying, and presentation...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Viewable Scene Modeling (1)• More accurately describe the video stream through a camera  field-of-view• Data collection us...
Viewable Scene Modeling (2)        Circle Scene Coverage   FOV Scene Coverage9/24/2011                                    ...
Modeling Parameters (DB)Attributes        Explanationfilename          Uploaded video file name<Plat,Plng>       <Latitude...
Viewable Scene Modeling (3)• Sensor values are sampled at different intervals     – GPS: 1 per second     – Compass: 40 pe...
v0.1 Acquisition Prototype      Capture software for       ◦ HD video, GPS data         stream, & compass         data st...
v0.2 Acquisition Prototype                  Setup for data collection: laptop computer;               OceanServer OS5000-U...
Smartphone Acquisition (1)             iPhone App     Android App9/24/2011                                 19
Mobile App Implementation                                     Data format that stores sensor                              ...
Smartphone Acquisition (2)• Android App   – http://geovid.org/Android/index.html            Available for            downl...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Spatio-Temporal Search                     <-117.010,                     46.725>                     <-117.013,          ...
Query Execution                  x time t                      • Moving cameras:                           1              ...
Ex.: Spatial Range QueryVideo 1                                Video 2Queryarea9/24/2011                                  ...
Querying GeoRef Videos• Run spatio-temporal range queries• Extract videos that capture an area of interest:  overlapping r...
Approach – Search (1)• FOV model converts the  video search problem  into spatial object  selection problem• Search only t...
Approach – Search (2)                            Filter step using MBR            Minimum bounding rectangle              ...
Vector Model                         Filter step using vector                                                      px    y...
Query Processing – Point Query• Point query   – “For a given query point q qx,qy in 2D geo-space, find all     video fra...
QP – Point Query with r• Point query with bounded distance r   – “For a given query point q qx,qy in 2D geo-space, find ...
QP – Directional Point Query• Directional Point Query   – “For a given query point q qx,qy in 2D geo-space, find all    ...
Vector Model – Implementation   • So far, we represented FOV as a single vector   • Problem: Single-vector model underesti...
Vector Model – Implementation   • Solution: Introduce an overestimation constant ().     Expand the search space by  alo...
Experimental Results• Implemented smartphone apps with GPS and digital compass;  software for recording video synchronized...
Experimental Results• Purpose of experiments   – Demonstrate the proof-of-concept, feasibility, and     applicability   – ...
ER – Point Query  • Recall:   the number of overlapping FOVs returned by the filter step                    the total numb...
ER – Point Query with Distance r9/24/2011            “Where is the Pizza Hut?”   39
ER – Filtering with Vector Model• 1,000 random point queries with 10,652 FOVs in the database• Results from the filter ste...
ER – Directional Point Query• Results from the filter step for the directional point query  with viewing direction 45o±5o•...
ER – Directional Range Query• For the directional range query with viewing direction 0 o±5o     Using MBR (no direction)  ...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Query Results: Video Segments• Example query: Search for the “University of Idaho  Kibbie Dome”.• The query processed base...
Search and Results: 2D               http://geovid.org/Query.html9/24/2011                                     45
2D: Technologies •   LAMP stack (Linux, Apache, MySQL, PHP) •   Google Maps API •   Ajax, XML •   UDF + MySQL •   Flowplay...
Results of 2D Presentation Challenge • Video is separate from map and requires “mental   orientation” (rotation) that is n...
Presentation of Videos             [Ni et al. 2009] Sample screenshots9/24/2011                                          48
Search and Results: 3D9/24/2011                            49
3D: Technologies •   LAMP stack (Linux, Apache, MySQL, PHP) •   Google Maps / Google Earth API •   Ajax, XML, KML •   UDF ...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Transmission of               Meta-data and VideoTwo simple approaches: Immediate   transmission after capturing through ...
Power-Efficient Method Framework    to support an efficient mobile video  capture and transmission. Observation: not all...
System Environment                  Sensor Meta-                      data                                Query           ...
Linear Regression-based Model Parameters of the HTC G1 smartphone used in the power model  The overall system power consum...
Validation of Power Model                              Screenshot of the                                 PowerTutor       ...
Simulator Architecture    Modules                 Immediate                  OnDemand14.3km13.6km      Network           ...
Query Model Query workload: a list of query rectangles that are mapped to specific locations            h: generate differ...
Performance:              Without Battery Recharging              Closed system where batteries cannot be recharged.      ...
Performance:                 With Battery Recharging  Mobile node density will eventually reach a dynamic equilibrium.    ...
Performance:            With Battery Recharging   Energy consumption and average query response latency with              ...
Performance:           With Battery RechargingEnergy consumption and average    Total transmitted data size    query respo...
Hybrid StrategyOverall energy consumption and query response latency whenusing a hybrid strategy with both Immediate and O...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Real-World Video Collection“Capture the sensor inputs and fuse them with the video streams”  Recorded 134 video clips us...
Real-World Video Collection                    ChallengesThe collected real-world video data has not been large enough to...
Synthetic Video                    Meta-data Generation                           Input: Camera Template Specification    ...
Camera Movement                          Computation (1)                         Network-based Movement Cameras move on a...
Camera Movement                     Computation (2)                       Free Camera Movement Cameras move freely. Impr...
Camera Movement                 Compuation (3)                 Mixed Camera Movement Cameras sometimes follow the network...
Camera Rotation                      Compuation (1) Assigning meaningful camera direction angles is one of the novel  fea...
Camera Rotation                               Computation (2)                                          Fixed Camera1) Calc...
Camera Rotation                       Computation (3)                           Fixed Camera     Real-world data        Sy...
Camera Rotation                        Computation (3)                     Randomly Rotating Camera1) Calculate moving dir...
Experimental Evaluation (1) Goal: Evaluate the effectiveness of the synthetic data generation approach  through a high le...
Experimental Evaluation (2)              Comparison of Camera Movement Speed                                              ...
Experimental Evaluation (3)                               Comparison of Camera Rotation                                   ...
Experimental Evaluation (4)                                  Performance Issues   The measured data generation times for ...
Summary    Proposed a two step synthetic data generation            1. Computation of the camera movements            2. ...
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Motivation (1)• Keyword is still the primary input method for  multimedia search9/24/2011                                 ...
Motivation (2)• Tech advance in geo-information systems:      – More comprehensive data      – Better usability      – Nic...
Motivation (3)• Tech advance in the manufacturing of smart  phones      – Mobile OS, various sensors, large storage, long ...
Target (1)• Bridging the semantic gap: tagging                            Marina Bay Sands, Marina Bay,                   ...
Target (2)• Bridging the semantic gap: tagging   Viewable scene model                                     Marina Bay Sands...
Framework9/24/2011               89
Visible Object Computation (1)• Nice mapping between real world and  simulated 3D world            Video snapshot   Google...
Visible Object Computation (2)             video        FoVs            Real                           GIS            Worl...
Visible Object Computation (3)• For each FoV, compute the visible objects, and  their visible angle ranges• Three types of...
Visible Object Computation (4)• Horizontally split the FoV into an number  atomic sectors that contains a unique list of  ...
Visible Object Computation (4)• Repeat the process for every FoVs of the video• Extract textual information      – ID, nam...
Tag Ranking & Associating (1)• The visible objects are not equally relevant to  the video• More tags are generated with ou...
Tag Ranking & Associating (2)• 6 basic visual criteria      –     Closeness to the FoV center      –     Distance to the c...
Tag Ranking & Associating (3)• Exploiting the temporal existence of the  object• Associating the tag to a specific segment...
Evaluation (1)•   Implement prototype•   Collect sample videos•   Compare to YouTube auto-generated tags•   User study    ...
Evaluation (2)• User study results9/24/2011                       99
Prototype• http://eiger.ddns.comp.nus.edu.sg/~zhijie/Qu  ery_v3.2.html9/24/2011                                    100
OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result Presenta...
Conclusions• Annotation using sensors can provide automatic and  objective meta-data for indexing and searching.• Georefer...
Thank You            Further information at:             http://geovid.org            rogerz@comp.nus.edu.sg9/24/2011     ...
Relevant Publications (1)    [ACM MM ’08]    Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim    Viewable Scene Modeling fo...
Relevant Publications (2)    [ACM MM ’11]    Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, Roger Zimmermann    Automatic Tag...
Upcoming SlideShare
Loading in...5
×

CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug

519

Published on

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
519
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug"

  1. 1. GeoVid Geo-referenced Video Management Roger Zimmermann, Seon Ho Kim, Sakire Arslan Ay, Beomjoo Seo, Jia Hao, Guanfeng Wang, Ma He, Shunkai Fang, Lingyan Zhang, Zhijie Shen National University University of of Singapore Southern California http://geovid.org9/24/2011 1
  2. 2. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 2VIII.Conclusions9/24/2011 2
  3. 3. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 3VIII.Conclusions9/24/2011 3
  4. 4. Motivation (1)• Trends – User-generated video content is growing rapidly. – Mobile devices make it easy to capture video.• Challenge – Video is still difficult to manage and search.• Content-based Image Processing – It is very desirable to extract high-level semantic concepts from images and video – However, this is tremendously challenging9/24/2011 4
  5. 5. Motivation (2)• User-Tagging – Laborious and often ambiguous or subjective• Complementary Technique – Automatically add additional sensor information to the video during its collection → Sensor-rich video (we also call it geo-referenced video) – Ex.: location and direction information can now be collected through a number of sensors (e.g., GPS, compass, accelerometer).9/24/2011 5
  6. 6. Motivation (3)• Recent progress in sensors and integrationTraditionally: Network Sensors interface + + +Now: Video capturing Various sensors WiFi Handheld mobility9/24/2011 6
  7. 7. Challenges (1)• Capacity constraint of the battery• Wireless bandwidth bottleneck• Searchability of videos Open-domain video content is very difficult to be efficiently and accurately searched 79/24/2011 7
  8. 8. Challenges (2)• Video and sensor information storage and indexing• Result ranking• Result presentation 89/24/2011 8
  9. 9. Sensor-Rich Video• Characteristics: – Concurrently collect sensor generated geospatial (and other) contextual data – Automatic: no user-interaction required (real-time tagging) – Data is objective (however, it may be noisy and/or inaccurate) → Generate a time-series of meta-data tags.• Meta-data can be efficiently searched and processed and may allow us to deduce certain properties about the video content.9/24/2011 9
  10. 10. Overview of Approach 1. Viewable scene modeling 2. Video and meta-data acquisition 3. Indexing, querying, and presentation of results1) 2) 3) d  9/24/2011 10
  11. 11. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 11VIII.Conclusions9/24/2011 11
  12. 12. Viewable Scene Modeling (1)• More accurately describe the video stream through a camera field-of-view• Data collection using sensors – Camera location from GPS, camera direction from digital compass, viewing angle from camera parameters• Details can be found in *ACM MM’08+, *ACM MM’09+9/24/2011 13
  13. 13. Viewable Scene Modeling (2) Circle Scene Coverage FOV Scene Coverage9/24/2011 14
  14. 14. Modeling Parameters (DB)Attributes Explanationfilename Uploaded video file name<Plat,Plng> <Latitude, longitude> coordinate for camera location (read from GPS)altitude The altitude of view point (read from GPS)alpha Camera heading relative with the ground (read from compass)R Viewable distancetheta Angular extent for camera field-of viewtilt Camera pitch relative with the ground (read from compass)roll Camera roll relative with the ground (read from compass)ltime Local time for the FOVtimecode9/24/2011159/24/2011 Timecode for the FOV in video (extracted from video) 15
  15. 15. Viewable Scene Modeling (3)• Sensor values are sampled at different intervals – GPS: 1 per second – Compass: 40 per second – Video frames: 30 per second• Each frame is associated with the temporarily closest sensor values.• Interpolation can be used.• Optimization is implemented for GPS: position is only measured if movement is more than 10 m.9/24/2011 16
  16. 16. v0.1 Acquisition Prototype  Capture software for ◦ HD video, GPS data stream, & compass data stream GPSCompass9/24/2011 17
  17. 17. v0.2 Acquisition Prototype Setup for data collection: laptop computer; OceanServer OS5000-US compass; Canon VIXIA HV30 camera; Pharos iGPS-500 receiver.9/24/2011189/24/2011 18
  18. 18. Smartphone Acquisition (1) iPhone App Android App9/24/2011 19
  19. 19. Mobile App Implementation Data format that stores sensor data: JSON (JavaScript Object Video Stream Recorder Notation) Location Receiver Orientation Receiver Data Storage and Synchronization Control Data Uploader Battery Status Monitor 209/24/2011 20
  20. 20. Smartphone Acquisition (2)• Android App – http://geovid.org/Android/index.html Available for download• iPhone App – http://geovid.org/iphone/index.html Will be submitted to the App Store9/24/2011 21
  21. 21. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 22VIII.Conclusions9/24/2011 22
  22. 22. Spatio-Temporal Search <-117.010, 46.725> <-117.013, 46.725> <-117.013, Search for the 46.728> videos of <-117.010, 46.728>“Kibbie Dome” <-117.010, 46.725> <-117.013, 46.725> <-117.013, Search for 46.728> videos that <-117.010, 46.728> capture the given .9/24/2011 . 23 trajectory .
  23. 23. Query Execution x time t • Moving cameras: 1 Find relevant Camera location x video segments, but omit irrelevant x segments x Object X x x x x x time t29/24/2011 25
  24. 24. Ex.: Spatial Range QueryVideo 1 Video 2Queryarea9/24/2011 26
  25. 25. Querying GeoRef Videos• Run spatio-temporal range queries• Extract videos that capture an area of interest: overlapping region9/24/2011 27
  26. 26. Approach – Search (1)• FOV model converts the video search problem into spatial object selection problem• Search only the All objects (n1) overlapping FOVs (not Test on simple approximations (n1 objects) the entire clip) Filter Step• Two step approach, positives (n2) negatives filter and refinement, is common in spatial DBs Test on exact geometry (n2 objects)• Note that refinement Refinement step can be very time Step true positives (n3) false positives consuming in videos 9/24/2011 28
  27. 27. Approach – Search (2) Filter step using MBR Minimum bounding rectangle 17% 17% Refinement stepdirectionoverlap Filter step no checks info! MBR has checks overlap FOV Between FOV and query point between MBR and query point. Using MBR, meta-data cannot be fully utilized in filter step.9/24/2011 29
  28. 28. Vector Model Filter step using vector px y V -Vx +Vx Vy py Space transformation p Vx x FOV as vector V -Vy +Vy Camera location and direction can be used in filter step! Potential to be more accurate in filtering.9/24/2011 30
  29. 29. Query Processing – Point Query• Point query – “For a given query point q qx,qy in 2D geo-space, find all video frames that overlap with q.”• Only vectors inside the triangle shaped area in both px-Vx and py-Vy spaces will remain after the filter step. query point 2D geo-space px-Vx py-Vy9/24/2011 The maximum magnitude of any vector is limited to M 31
  30. 30. QP – Point Query with r• Point query with bounded distance r – “For a given query point q qx,qy in 2D geo-space, find all video frames that overlap with q, and that were taken within distance r.” 2D geo-space px-Vx py-Vy9/24/2011 32
  31. 31. QP – Directional Point Query• Directional Point Query – “For a given query point q qx,qy in 2D geo-space, find all video frames taken with the camera pointing in the Northwest direction and overlapping with q.” 2D geo-space px-Vx py-Vy9/24/2011 33
  32. 32. Vector Model – Implementation • So far, we represented FOV as a single vector • Problem: Single-vector model underestimates the coverage of the FOV. 2D geo-space px-Vx py-Vy9/24/2011 34
  33. 33. Vector Model – Implementation • Solution: Introduce an overestimation constant (). Expand the search space by  along the V axis. 2D geo-space px-Vx py-Vy9/24/2011 35
  34. 34. Experimental Results• Implemented smartphone apps with GPS and digital compass; software for recording video synchronized with sensor inputs.• Recorded hundreds of real video clips on the street while driving.• Stored georeferenced video meta-data in a MySQL database .• Implemented User Defined Functions for queries using vector model.• Constructed map-based user interface on the web.9/24/2011 36
  35. 35. Experimental Results• Purpose of experiments – Demonstrate the proof-of-concept, feasibility, and applicability – No emphasis on performance issues• Generate random queries and search overlapping video segments camera position x query point9/24/2011 Camera positions and query points 37
  36. 36. ER – Point Query • Recall: the number of overlapping FOVs returned by the filter step the total number of actually overlapping FOVs • Precision: the number of overlapping FOVs returned in the filter step the total number of all FOVs returned in the filter step9/24/2011 38
  37. 37. ER – Point Query with Distance r9/24/2011 “Where is the Pizza Hut?” 39
  38. 38. ER – Filtering with Vector Model• 1,000 random point queries with 10,652 FOVs in the database• Results from the filter step for the point query with bounded distance of 50 meters vector model with different values of overestimation9/24/2011 constant () 40
  39. 39. ER – Directional Point Query• Results from the filter step for the directional point query with viewing direction 45o±5o• MBR has no info about the direction, so it returns all 30,491 FOVs.• For  ≥ 0.3M, the vector model returns 90% less FOVs in the filter step compared to the MBR.9/24/2011 (Recall for  = 0.3M is 0.948) 41
  40. 40. ER – Directional Range Query• For the directional range query with viewing direction 0 o±5o Using MBR (no direction) Using Vector (direction: North) Our search portal - http://geovid.org Skip9/24/2011 42
  41. 41. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 43VIII.Conclusions9/24/2011 43
  42. 42. Query Results: Video Segments• Example query: Search for the “University of Idaho Kibbie Dome”.• The query processed based on the viewable scene modeling returns more relevant video segments.9/24/2011 44
  43. 43. Search and Results: 2D http://geovid.org/Query.html9/24/2011 45
  44. 44. 2D: Technologies • LAMP stack (Linux, Apache, MySQL, PHP) • Google Maps API • Ajax, XML • UDF + MySQL • Flowplayer + Wowza Media Server9/24/2011 46
  45. 45. Results of 2D Presentation Challenge • Video is separate from map and requires “mental orientation” (rotation) that is not intuitive. Proposed Solution • Use Google Earth (or other mirror worlds) as a backdrop to overlay the acquired video clips in the correct locations and viewing directions. • Therefore, present the results in 3D. • Follow the path of the camera trajectory.9/24/2011 47
  46. 46. Presentation of Videos [Ni et al. 2009] Sample screenshots9/24/2011 48
  47. 47. Search and Results: 3D9/24/2011 49
  48. 48. 3D: Technologies • LAMP stack (Linux, Apache, MySQL, PHP) • Google Maps / Google Earth API • Ajax, XML, KML • UDF + MySQL • IFRAME Shim • HTML5 Video Techniques (Time Seeking) • 3D Perspective Videos (DrawImage, Canvas)9/24/2011 50
  49. 49. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 51VIII.Conclusions9/24/2011 51
  50. 50. Transmission of Meta-data and VideoTwo simple approaches: Immediate transmission after capturing through wireless network + Immediate availability of the data – Consumes lots of energy and bandwidth Delayed transmission when a faster network is available – Sacrifices real time access + Low power consumption9/24/2011 53
  51. 51. Power-Efficient Method Framework to support an efficient mobile video capture and transmission. Observation: not all collected videos have high priority. Core idea: separate the small amount of sensor meta- data from the large video content. Meta-data is transmitted to a server in real-time. Video content is searchable by viewable scene properties established from meta-data attached to each video. Video is transmitted in an on-demand manner.9/24/2011 54
  52. 52. System Environment Sensor Meta- data Query Request Video Request Message (VRM) Video Segments Video ContentData Acquisition Data Storage and Query and Indexing Processing Upload Key idea: save considerable battery energy by delaying the costly transmission of the video segments that have not been requested. 55 9/24/2011 55
  53. 53. Linear Regression-based Model Parameters of the HTC G1 smartphone used in the power model The overall system power consumption as a function of time t 56 [A. Shye, B. Sholbrock, and G. Memik. Into The Wild: Studying Real User Activity Patterns to Guide Power Optimization for Mobile Architectures. In Micro, 2009.]9/24/2011 56
  54. 54. Validation of Power Model Screenshot of the PowerTutor Power model vs. PowerTutor. [B. Tiwana and L. Zhang. PowerTutor. http://powertutor.org, 2009.]9/24/2011 57
  55. 55. Simulator Architecture Modules Immediate OnDemand14.3km13.6km Network Evaluation Metrics N AP Topology w Generator AP Layout N node Node Energy ts Consumption Trajectory Execution T Generator Trajectory Plan Engine Power c Video+FOV Model Query Response Dc Generator FOV Scene Plan Latency q Query Transmitted Mq Generator Query List Data h 60 [Brinkhoff. A framework for generating network-based moving objects. 02] 9/24/2011 60
  56. 56. Query Model Query workload: a list of query rectangles that are mapped to specific locations h: generate different distributions of queries Spatial query distribution with three different clustering parameter values h h=0 h=0.5 h=19/24/2011 61
  57. 57. Performance: Without Battery Recharging Closed system where batteries cannot be recharged. Number of nodes alive. Query response latency. Node lifetimes and query response latency with9/24/2011 N = 2,000 nodes. 62
  58. 58. Performance: With Battery Recharging Mobile node density will eventually reach a dynamic equilibrium. Energy consumption and access latency with increasing meta-data upload period.9/24/2011 63
  59. 59. Performance: With Battery Recharging Energy consumption and average query response latency with 64 varying number of access points.9/24/2011 64
  60. 60. Performance: With Battery RechargingEnergy consumption and average Total transmitted data size query response latency with as a function of query varying query clustering clustering parameter h. 9/24/2011 parameter h. 65
  61. 61. Hybrid StrategyOverall energy consumption and query response latency whenusing a hybrid strategy with both Immediate and OnDemand as 66 a function of the switching threshold (h=0.5).9/24/2011 66
  62. 62. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 67VIII.Conclusions9/24/2011 67
  63. 63. Real-World Video Collection“Capture the sensor inputs and fuse them with the video streams”  Recorded 134 video clips using the recording prototype system in Moscow, ID (total 170 mins video).  Videos covered a 6km by 5km region quite uniformly.  Average camera movement speed was 27km/h, and average camera rotation was around 12 degrees/s.  Collected meta-data included 10,652 FOV scenes in total.9/24/2011 68
  64. 64. Real-World Video Collection ChallengesThe collected real-world video data has not been large enough to evaluate realistic applications on the large scale.  Collecting real-world data requires considerable time and effort.A complementary solution is to synthetically generate georeferenced video meta-data. 9/24/2011 69
  65. 65. Synthetic Video Meta-data Generation Input: Camera Template Specification Camera movement computation TIGER/Line The Brinkhoff Algorithm The GSTD Algorithm Files Merge Trajectories network-based mixed free movement movement movement Camera direction computation Calculate Moving Direction Adjust Directions on Randomize Direction Angles Turns Output: Georeferenced Video Meta-data9/24/2011 70
  66. 66. Camera Movement Computation (1) Network-based Movement Cameras move on a road-network Adopted the Brinkhoff algorithm for camera trajectory generation Introduced stops and acceleration/ deceleration events at some road crossings and transitions Camera accelerates with a constant rate(user defines the acceleration rate) In a deceleration event reduction in camera speed is simulated based on the Binomial distribution B(n, p) vnext  v prev  n When n=20 and p=0.5 speed is reduced to half at every time instant 9/24/2011 71
  67. 67. Camera Movement Computation (2) Free Camera Movement Cameras move freely. Improved the GSTD algorithm to generate the camera trajectories with unconstrained movement:  Added speed control mechanism  Camera movement data is generated in geographic coordinate system (i.e., as latitude/longitude coordinates) 9/24/2011 72
  68. 68. Camera Movement Compuation (3) Mixed Camera Movement Cameras sometimes follow the network and sometimes move randomly on an unconstraint path. i. Generate a network based trajectory (Tinit) ii. Randomly select n sub-segments (S1 through Sn) on the trajectory 0|Si|  (Tinit/4) and Nrand= (Si) / |Tinit|) (user defines Nrand) i. Replace Si with Trand(i) ii. Update timestamps9/24/2011 73
  69. 69. Camera Rotation Compuation (1) Assigning meaningful camera direction angles is one of the novel features of the proposed data generator. Camera direction computation Calculate Moving Direction Adjust Directions on Randomize Direction Angles Turns Output: Georeferenced Video Meta-data Fixed camera: Random rotation camera: 1) Calculate moving direction 1) Calculate moving direction 2) Adjust directions on turns 2) Adjust directions on turns 3) Randomize direction angles 9/24/2011 74
  70. 70. Camera Rotation Computation (2) Fixed Camera1) Calculate moving direction 2) Adjust directions on turns Trajectory Tk t1 t2 Rotation angle from t1 to t2 is Smooth down the rotation by larger than  max (i.e., distributing the rotation rotation threshold) amount forwards and backwards is the moving direction vector at time t 9/24/2011 75
  71. 71. Camera Rotation Computation (3) Fixed Camera Real-world data Synthetic data before Synthetic data after direction adjustment direction adjustment Illustration of camera direction adjustment for vehicle cameras9/24/2011 76 76
  72. 72. Camera Rotation Computation (3) Randomly Rotating Camera1) Calculate moving direction2) Adjust directions on turns3) Randomize direction angles  Randomly rotate the directions at each sample point towards left or right  Rotation amount is inversely proportional to the current camera speed level  The rotation amount is guaranteed to be less than rotation threshold max 9/24/2011 77
  73. 73. Experimental Evaluation (1) Goal: Evaluate the effectiveness of the synthetic data generation approach through a high level comparison between the real-world and synthetic data. Datasets:  Generated two groups of synthetic data: 1. Using vehicle camera template 2. Using passenger camera template Both synthetic data groups were created based on the road network of Moscow, ID. Methodology:  Analyze and compare the movements and rotations of real-world and synthetic datasets.  Report: 1. The average and maximum values for speed and rotation 2. Frequency distribution of different speed and rotation levels 9/24/2011 78
  74. 74. Experimental Evaluation (2) Comparison of Camera Movement Speed Maximum speed (km/h) Average speed (km/h) StdDev of speed Synthetic data with fixed camera 87.91 27.14 12.82 Synthetic data with free camera rotation 87.28 27.32 13.01 Real-world data 0.564 27.03 13.68 Characteristics of the camera speed Real-world data Synthetic data Comparison of camera speed distributions for real-world dataof camera movement speed on map Illustration and synthetic data with fixed camera9/24/2011 79
  75. 75. Experimental Evaluation (3) Comparison of Camera Rotation Maximum rotation(degrees/s) Average rotation(degrees/s) StdDev of rotationSynthetic data with fixed camera 32.33 4.64 7.24Synthetic data with free camera rotation 55.27 12.59 9.35Real-world data 107.30 11.53 14.02 Characteristics of the camera rotation ( max =60 degrees) Comparison of camera rotation distributions for Comparison of camera rotation distributions for real-world data and synthetic data with fixed real-world dataSynthetic data data with random and synthetic camera Real-world data rotation camera Illustration of camera rotation on map 9/24/2011 80
  76. 76. Experimental Evaluation (4) Performance Issues  The measured data generation times for different types of datasets and parameter settings Camera Template Trajectory Rotation Pattern Number of Time to Generate camera Time to Assign Total Time Pattern Videos trajectory (s) DirectionsVehicle camera Tnetwork Fixed camera 2,980 124 39 163Passenger camera Tnetwork Random rotation 2,980 115 201 316Pedestrian camera Tfree Random rotation 2,970 32 263 255Pedestrian camera Tmixed Random rotation 2,970 271 215 486  The generator can create synthetic datasets in a reasonable amount of time with off-the-shelf computational resources9/24/2011 81
  77. 77. Summary Proposed a two step synthetic data generation 1. Computation of the camera movements 2. Computation of the camera movements Compared the high-level properties of the synthetically generated data and those of real- world georeferenced video data. The synthetic meta-data exhibit equivalent characteristics to the real data, and hence can be used in a variety of mobile video management research.9/24/2011 82
  78. 78. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 83VIII.Conclusions9/24/2011 83
  79. 79. Motivation (1)• Keyword is still the primary input method for multimedia search9/24/2011 84
  80. 80. Motivation (2)• Tech advance in geo-information systems: – More comprehensive data – Better usability – Nicer visualization9/24/2011 85
  81. 81. Motivation (3)• Tech advance in the manufacturing of smart phones – Mobile OS, various sensors, large storage, long battery life*• Content-based methods – Still difficult to bridge the semantic gap – High computation cost9/24/2011 86
  82. 82. Target (1)• Bridging the semantic gap: tagging Marina Bay Sands, Marina Bay, Singapore, cloudy, etc.9/24/2011 87
  83. 83. Target (2)• Bridging the semantic gap: tagging Viewable scene model Marina Bay Sands, Marina Bay, Singapore, cloudy, etc. Geo-information systems9/24/2011 88
  84. 84. Framework9/24/2011 89
  85. 85. Visible Object Computation (1)• Nice mapping between real world and simulated 3D world Video snapshot Google earth9/24/2011 90
  86. 86. Visible Object Computation (2) video FoVs Real GIS World9/24/2011 91
  87. 87. Visible Object Computation (3)• For each FoV, compute the visible objects, and their visible angle ranges• Three types of objects: – Front – Vertically visible – Occluded9/24/2011 92
  88. 88. Visible Object Computation (4)• Horizontally split the FoV into an number atomic sectors that contains a unique list of candidate objects• Vertically check the object visibility9/24/2011 93
  89. 89. Visible Object Computation (4)• Repeat the process for every FoVs of the video• Extract textual information – ID, name, type, coordinates, center, address, description, websites (external links) – From OpenStreetMap, GeoDeck – Able to expanding the sources (e.g., Wikipedia)9/24/2011 94
  90. 90. Tag Ranking & Associating (1)• The visible objects are not equally relevant to the video• More tags are generated with our method – SG dataset: 60 – USC dataset: 499/24/2011 95
  91. 91. Tag Ranking & Associating (2)• 6 basic visual criteria – Closeness to the FoV center – Distance to the camera location – Horizontally visible angle range of the object – Vertically visible angle range of the object – Horizontally visible percentage of the object – Vertically visible percentage of the object• Additional hints from GIS or external sources – Special property (e.g., attraction, landmark …) – Wikipedia entry9/24/2011 96
  92. 92. Tag Ranking & Associating (3)• Exploiting the temporal existence of the object• Associating the tag to a specific segment9/24/2011 97
  93. 93. Evaluation (1)• Implement prototype• Collect sample videos• Compare to YouTube auto-generated tags• User study – Familiarity to the place – Relevance of our tags – Relevance of YouTube tags – Preference to our of YouTube tags – User ranking9/24/2011 98
  94. 94. Evaluation (2)• User study results9/24/2011 99
  95. 95. Prototype• http://eiger.ddns.comp.nus.edu.sg/~zhijie/Qu ery_v3.2.html9/24/2011 100
  96. 96. OutlineI. Introduction & MotivationII. Scene Modeling & AcquisitionIII. Query Processing & Vector ModelIV. Result PresentationV. Power ManagementVI. Synthetic Video Meta-Data GenerationVII.Textual Annotation 101VIII.Conclusions9/24/2011 101
  97. 97. Conclusions• Annotation using sensors can provide automatic and objective meta-data for indexing and searching.• Georeferenced video search has a great potential, especially in searching user generated videos.• Many open questions: – Standard format of meta-data – Standard way of embedding meta-data – Index structures of meta-data for fast searching – Supporting new query types – Combining with content based features – Relevance ranking for result presentation9/24/2011 102
  98. 98. Thank You Further information at: http://geovid.org rogerz@comp.nus.edu.sg9/24/2011 seonkim@usc.edu 104
  99. 99. Relevant Publications (1) [ACM MM ’08] Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim Viewable Scene Modeling for Geospatial Video Search ACM Multimedia Conference (ACM MM 2008), Oct. 2008. [ACM MM ’09] Sakire Arslan Ay, Lingyan Zhang, Seon Ho Kim, Ma He, Roger Zimmermann GRVS: A Georeferenced Video Search Engine ACM Multimedia Conference (ACM MM 2009), Technical Demo, Oct. 2009. [MMSJ ’10] Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim Relevance Ranking in Georeferenced Video Search Multimedia Systems Journal, Springer, 2010. [MMSys ’11] Seon Ho Kim, Hao Jia, Sakie Arslan Ay, Roger Zimmermann Energy-Efficient Mobile Video Management using Smartphones ACM Multimedia Systems Conference (ACM MMSys 2011), Feb. 2011.9/24/2011 105
  100. 100. Relevant Publications (2) [ACM MM ’11] Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, Roger Zimmermann Automatic Tag Generation and Ranking for Sensor-rich Outdoor Videos ACM Multimedia Conference (ACM MM 2011), Nov. 2011. [ACM MM ’11] Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim SRV-TAGS: An Automatic TAGging and Search System for Sensor-Rich Outdoor Videos ACM Multimedia Conference (ACM MM 2011), Technical Demo, Nov. 2011. [ACM MM ’11] Hao Jia, Guanfeng Wang, Beomjoo Seo, Roger Zimmermann Keyframe Presentation for Browsing of User-generated Videos on Map Interface ACM Multimedia Conference (ACM MM 2011), Short Paper, Nov. 2011. [ACM MM ’11] Beomjoo Seo, Jia Hao, Guanfeng Wang Sensor-rich Video Exploration on a Map Interface ACM Multimedia Conference (ACM MM 2011), Technical Demo, Nov. 2011.9/24/2011 106

×