SlideShare a Scribd company logo
1 of 100
GeoVid
                 Geo-referenced Video
                    Management
            Roger Zimmermann, Seon Ho Kim, Sakire Arslan Ay,
              Beomjoo Seo, Jia Hao, Guanfeng Wang, Ma He,
                Shunkai Fang, Lingyan Zhang, Zhijie Shen

               National University          University of
                  of Singapore           Southern California


                           http://geovid.org
9/24/2011                                                      1
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           2
VIII.Conclusions
9/24/2011                                      2
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           3
VIII.Conclusions
9/24/2011                                      3
Motivation (1)
• Trends
      – User-generated video content is growing rapidly.
      – Mobile devices make it easy to capture video.

• Challenge
      – Video is still difficult to manage and search.

• Content-based Image Processing
      – It is very desirable to extract
        high-level semantic concepts from
        images and video
      – However, this is tremendously
        challenging
9/24/2011                                                  4
Motivation (2)
• User-Tagging
      – Laborious and often ambiguous or subjective

• Complementary Technique
      – Automatically add additional sensor information to the
        video during its collection

            → Sensor-rich video
            (we also call it geo-referenced video)

      – Ex.: location and direction
        information can now be collected
        through a number of sensors
        (e.g., GPS, compass, accelerometer).
9/24/2011                                                        5
Motivation (3)
• Recent progress in sensors and integration

Traditionally:                  Network
                      Sensors   interface

                  +        +            +

Now:                            Video capturing
                                Various sensors
                                WiFi
                                Handheld mobility
9/24/2011                                           6
Challenges (1)
• Capacity constraint of the battery


• Wireless bandwidth bottleneck


• Searchability of videos
    Open-domain video content is very
    difficult to be efficiently and accurately
    searched


                                                 7

9/24/2011                                            7
Challenges (2)
• Video and sensor information storage and indexing


• Result ranking


• Result presentation




                                                      8

9/24/2011                                                 8
Sensor-Rich Video
• Characteristics:
      – Concurrently collect sensor generated geospatial (and
        other) contextual data
      – Automatic: no user-interaction required
        (real-time tagging)
      – Data is objective
        (however, it may be noisy and/or inaccurate)

            → Generate a time-series of meta-data tags.

• Meta-data can be efficiently searched and processed
  and may allow us to deduce certain properties about
  the video content.

9/24/2011                                                       9
Overview of Approach
 1. Viewable scene modeling
 2. Video and meta-data acquisition
 3. Indexing, querying, and presentation of results


1)                  2)             3)
             d

         




 9/24/2011                                            10
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           11
VIII.Conclusions
9/24/2011                                   11
Viewable Scene Modeling (1)
• More accurately describe the video stream through a camera
  field-of-view
• Data collection using sensors
   – Camera location from GPS, camera direction from digital
      compass, viewing angle from camera parameters
• Details can be found in *ACM MM’08+, *ACM MM’09+




9/24/2011                                                      13
Viewable Scene Modeling (2)




        Circle Scene Coverage   FOV Scene Coverage

9/24/2011                                            14
Modeling Parameters (DB)
Attributes        Explanation
filename          Uploaded video file name
<Plat,Plng>       <Latitude, longitude> coordinate for camera location
                  (read from GPS)
altitude          The altitude of view point (read from GPS)
alpha             Camera heading relative with the ground (read from
                  compass)
R                 Viewable distance
theta             Angular extent for camera field-of view
tilt              Camera pitch relative with the ground (read from
                  compass)
roll              Camera roll relative with the ground (read from compass)
ltime             Local time for the FOV
timecode
9/24/201115
9/24/2011         Timecode for the FOV in video (extracted from video)   15
Viewable Scene Modeling (3)
• Sensor values are sampled at different intervals
     – GPS: 1 per second
     – Compass: 40 per second
     – Video frames: 30 per second

• Each frame is associated with the temporarily closest
  sensor values.
• Interpolation can be used.
• Optimization is implemented for GPS: position is only
  measured if movement is more than 10 m.


9/24/2011                                             16
v0.1 Acquisition Prototype
      Capture software for
       ◦ HD video, GPS data
         stream, & compass
         data stream
                GPS
Compass




9/24/2011                                17
v0.2 Acquisition Prototype




                  Setup for data collection: laptop computer;
               OceanServer OS5000-US compass; Canon VIXIA
                     HV30 camera; Pharos iGPS-500 receiver.
9/24/201118
9/24/2011                                                       18
Smartphone Acquisition (1)
             iPhone App     Android App




9/24/2011                                 19
Mobile App Implementation
                                     Data format that stores sensor
                                     data: JSON (JavaScript Object
            Video Stream Recorder
                                     Notation)
              Location Receiver


             Orientation Receiver

           Data Storage and
        Synchronization Control

                Data Uploader


            Battery Status Monitor
                                                                      20

9/24/2011                                                              20
Smartphone Acquisition (2)
• Android App
   – http://geovid.org/Android/index.html
            Available for
            download

• iPhone App
   – http://geovid.org/iphone/index.html

            Will be submitted
            to the App Store


9/24/2011                                   21
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           22
VIII.Conclusions
9/24/2011                                   22
Spatio-Temporal Search

                     <-117.010,
                     46.725>
                     <-117.013,
                     46.725>
                     <-117.013,
 Search for the      46.728>
   videos of         <-117.010,
                     46.728>
“Kibbie Dome”


                     <-117.010,
                     46.725>
                     <-117.013,
                     46.725>
                     <-117.013,
       Search for    46.728>
       videos that   <-117.010,
                     46.728>
 capture the given           .
9/24/2011                    .      23
        trajectory           .
Query Execution
                  x time t                      • Moving cameras:
                           1
                                                  Find relevant
       Camera
       location   x                               video segments,
                                                  but omit irrelevant
                  x                               segments

                  x
                      Object X
                  x

                  x     x        x   x
                                      time t2




9/24/2011                                                         25
Ex.: Spatial Range Query


Video 1                                Video 2




Query
area

9/24/2011                                   26
Querying GeoRef Videos
• Run spatio-temporal range queries
• Extract videos that capture an area of interest:
  overlapping region




9/24/2011                                            27
Approach – Search (1)
• FOV model converts the
  video search problem
  into spatial object
  selection problem
• Search only the                                     All objects (n1)

  overlapping FOVs (not
                                       Test on simple approximations (n1 objects)
  the entire clip)          Filter
                            Step
• Two step approach,
                                     positives (n2)                      negatives
  filter and refinement, is
  common in spatial DBs
                                         Test on exact geometry (n2 objects)
• Note that refinement Refinement
  step can be very time Step         true positives (n3)                 false positives
  consuming in videos

 9/24/2011                                                                             28
Approach – Search (2)
                            Filter step using MBR

            Minimum bounding rectangle



                                                          17%
                                                        17%




                                           Refinement stepdirectionoverlap
                                             Filter step no checks info!
                                              MBR has checks overlap
                   FOV                      Between FOV and query point
                                           between MBR and query point.

            Using MBR, meta-data cannot be fully utilized in filter step.
9/24/2011                                                                    29
Vector Model
                         Filter step using vector
                                                      px
    y



                   V                            -Vx        +Vx
        Vy                                            py
                              Space transformation

            p
                  Vx
                                x
            FOV as vector V                     -Vy        +Vy
    Camera location and direction can be used in filter step!
          Potential to be more accurate in filtering.
9/24/2011                                                        30
Query Processing – Point Query
• Point query
   – “For a given query point q qx,qy in 2D geo-space, find all
     video frames that overlap with q.”
• Only vectors inside the triangle shaped area in both px-Vx and
  py-Vy spaces will remain after the filter step.
                        query point




             2D geo-space             px-Vx         py-Vy
9/24/2011   The maximum magnitude of any vector is limited to M   31
QP – Point Query with r
• Point query with bounded distance r
   – “For a given query point q qx,qy in 2D geo-space, find all
     video frames that overlap with q, and that were taken
     within distance r.”




            2D geo-space           px-Vx             py-Vy

9/24/2011                                                           32
QP – Directional Point Query
• Directional Point Query
   – “For a given query point q qx,qy in 2D geo-space, find all
     video frames taken with the camera pointing in the
     Northwest direction and overlapping with q.”




            2D geo-space           px-Vx             py-Vy

9/24/2011                                                           33
Vector Model – Implementation
   • So far, we represented FOV as a single vector
   • Problem: Single-vector model underestimates the
     coverage of the FOV.




            2D geo-space    px-Vx          py-Vy
9/24/2011                                              34
Vector Model – Implementation
   • Solution: Introduce an overestimation constant ().
     Expand the search space by  along the V axis.




            2D geo-space        px-Vx            py-Vy

9/24/2011                                                  35
Experimental Results
• Implemented smartphone apps with GPS and digital compass;
  software for recording video synchronized with sensor inputs.
• Recorded hundreds of real video clips on the street while driving.
• Stored georeferenced video meta-data in a MySQL database .
• Implemented User Defined Functions for queries using vector
  model.
• Constructed map-based user interface on the web.




9/24/2011                                                       36
Experimental Results
• Purpose of experiments
   – Demonstrate the proof-of-concept, feasibility, and
     applicability
   – No emphasis on performance issues
• Generate random queries and search overlapping video
  segments            camera position x query point




9/24/2011
                   Camera positions and query points      37
ER – Point Query




  • Recall:   the number of overlapping FOVs returned by the filter step
                    the total number of actually overlapping FOVs




  • Precision:     the number of overlapping FOVs returned in the filter step
                      the total number of all FOVs returned in the filter step


9/24/2011                                                                        38
ER – Point Query with Distance r




9/24/2011
            “Where is the Pizza Hut?”   39
ER – Filtering with Vector Model
• 1,000 random point queries with 10,652 FOVs in the database
• Results from the filter step for the point query with bounded
  distance of 50 meters




                vector model with different values of overestimation
9/24/2011                           constant ()                       40
ER – Directional Point Query
• Results from the filter step for the directional point query
  with viewing direction 45o±5o




• MBR has no info about the direction, so it returns all
    30,491 FOVs.
• For  ≥ 0.3M, the vector model returns 90% less FOVs in
    the filter step compared to the MBR.
9/24/2011
                  (Recall for  = 0.3M is 0.948)         41
ER – Directional Range Query
• For the directional range query with viewing direction 0 o±5o




     Using MBR (no direction)          Using Vector (direction: North)

                 Our search portal - http://geovid.org            Skip

9/24/2011                                                           42
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           43
VIII.Conclusions
9/24/2011                                   43
Query Results: Video Segments




• Example query: Search for the “University of Idaho
  Kibbie Dome”.
• The query processed based on the viewable scene
  modeling returns more relevant video segments.
9/24/2011                                              44
Search and Results: 2D
               http://geovid.org/Query.html




9/24/2011                                     45
2D: Technologies
 •   LAMP stack (Linux, Apache, MySQL, PHP)
 •   Google Maps API
 •   Ajax, XML
 •   UDF + MySQL
 •   Flowplayer + Wowza Media Server




9/24/2011                                     46
Results of 2D Presentation
 Challenge
 • Video is separate from map and requires “mental
   orientation” (rotation) that is not intuitive.


 Proposed Solution
 • Use Google Earth (or other mirror worlds) as a
   backdrop to overlay the acquired video clips in the
   correct locations and viewing directions.
 • Therefore, present the results in 3D.
 • Follow the path of the camera trajectory.

9/24/2011                                                47
Presentation of Videos




             [Ni et al. 2009] Sample screenshots

9/24/2011                                          48
Search and Results: 3D




9/24/2011                            49
3D: Technologies
 •   LAMP stack (Linux, Apache, MySQL, PHP)
 •   Google Maps / Google Earth API
 •   Ajax, XML, KML
 •   UDF + MySQL
 •   IFRAME Shim
 •   HTML5 Video Techniques (Time Seeking)
 •   3D Perspective Videos (DrawImage, Canvas)



9/24/2011                                        50
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           51
VIII.Conclusions
9/24/2011                                   51
Transmission of
               Meta-data and Video
Two simple approaches:
 Immediate   transmission after capturing through
   wireless network
    + Immediate availability of the data
    – Consumes lots of energy and bandwidth
 Delayed   transmission when a faster network is
   available
    – Sacrifices real
                  time access
    + Low power consumption

9/24/2011                                            53
Power-Efficient Method
 Framework    to support an efficient mobile video
  capture and transmission.
 Observation: not all collected videos have high
  priority.
 Core idea: separate the small amount of sensor meta-
  data from the large video content.
 Meta-data is transmitted to a server in real-time.
 Video content is searchable by viewable scene
  properties established from meta-data attached to each
  video.
 Video is transmitted in an on-demand manner.
9/24/2011                                            54
System Environment
                  Sensor Meta-
                      data                                Query
                                                         Request
                 Video Request
                 Message (VRM)

                                                    Video Segments
                 Video Content

Data Acquisition                      Data Storage and                 Query
      and                                Indexing                    Processing
    Upload



               Key idea: save considerable battery energy by
                delaying the costly transmission of the video
                  segments that have not been requested.                          55

   9/24/2011                                                                       55
Linear Regression-based Model
 Parameters of the HTC G1 smartphone used in the power model




  The overall system power consumption as a function of time t



                                                                                                                       56
            [A. Shye, B. Sholbrock, and G. Memik. Into The Wild: Studying Real User Activity Patterns to Guide Power
                                      Optimization for Mobile Architectures. In Micro, 2009.]
9/24/2011                                                                                                               56
Validation of Power Model
                              Screenshot of the
                                 PowerTutor

            Power model vs.
            PowerTutor.




                                 [B. Tiwana and L. Zhang. PowerTutor.
                                     http://powertutor.org, 2009.]

9/24/2011                                                               57
Simulator Architecture
    Modules                 Immediate                  OnDemand

14.3km13.6km      Network                                                                  Evaluation Metrics
        N AP      Topology
         w        Generator            AP Layout
        N node      Node                                                                         Energy
        ts                                                                                    Consumption
                  Trajectory                                       Execution
        T         Generator            Trajectory Plan              Engine
                                                                     Power
        c        Video+FOV                                          Model                   Query Response
         Dc       Generator            FOV Scene Plan                                           Latency

        q
                    Query                                                                     Transmitted
        Mq
                  Generator             Query List                                               Data
        h
                                                                                                            60

                 [Brinkhoff. A framework for generating network-based moving objects. 02]
    9/24/2011                                                                                                60
Query Model
 Query workload: a list of query rectangles that are mapped to
 specific locations
            h: generate different distributions of queries
 Spatial query distribution with three different clustering
 parameter values h




             h=0                 h=0.5                h=1
9/24/2011                                                        61
Performance:
              Without Battery Recharging
              Closed system where batteries cannot be recharged.




            Number of nodes alive.         Query response latency.
               Node lifetimes and query response latency with
9/24/2011
                              N = 2,000 nodes.                       62
Performance:
                 With Battery Recharging
  Mobile node density will eventually reach a dynamic equilibrium.




            Energy consumption and access latency with increasing
                         meta-data upload period.
9/24/2011                                                           63
Performance:
            With Battery Recharging




   Energy consumption and average query response latency with
                                                            64
                varying number of access points.
9/24/2011                                                    64
Performance:
           With Battery Recharging




Energy consumption and average    Total transmitted data size
    query response latency with     as a function of query
       varying query clustering    clustering parameter h.
 9/24/2011   parameter h.                                   65
Hybrid Strategy




Overall energy consumption and query response latency when
using a hybrid strategy with both Immediate and OnDemand as
                                                          66
         a function of the switching threshold (h=0.5).
9/24/2011                                                  66
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           67
VIII.Conclusions
9/24/2011                                   67
Real-World Video Collection
“Capture the sensor inputs and fuse them with the video streams”
  Recorded 134 video clips using the recording prototype system
 in Moscow, ID (total 170 mins video).
  Videos covered a 6km by 5km region quite uniformly.
  Average camera movement speed was 27km/h, and average
 camera rotation was around 12 degrees/s.
  Collected meta-data included 10,652 FOV scenes in total.




9/24/2011                                                    68
Real-World Video Collection
                    Challenges
The collected real-world video data has not been large
 enough to evaluate realistic applications on the large
 scale.
   Collecting real-world data requires considerable
    time and effort.
A complementary solution is to synthetically generate
 georeferenced video meta-data.



 9/24/2011                                            69
Synthetic Video
                    Meta-data Generation
                           Input: Camera Template Specification


                                 Camera movement computation
            TIGER/Line
                           The Brinkhoff Algorithm              The GSTD Algorithm
               Files

                                                Merge Trajectories

                                 network-based             mixed                 free
                                   movement               movement             movement
                                   Camera direction computation
                                          Calculate Moving Direction

                         Adjust Directions on
                                                        Randomize Direction Angles
                                Turns




                           Output: Georeferenced Video Meta-data
9/24/2011                                                                                 70
Camera Movement
                          Computation (1)
                         Network-based Movement

 Cameras move on a road-network
 Adopted the Brinkhoff algorithm for camera trajectory generation
 Introduced stops and acceleration/ deceleration events at some road
  crossings and transitions
 Camera accelerates with a constant rate(user defines the acceleration rate)
 In a deceleration event reduction in
  camera speed is simulated based on
  the Binomial distribution
                              B(n, p)
           vnext    v prev 
                                n
    When n=20 and p=0.5 speed is
    reduced to half at every time instant
   9/24/2011                                                                    71
Camera Movement
                     Computation (2)
                       Free Camera Movement

 Cameras move freely.
 Improved the GSTD algorithm to generate the camera trajectories with
  unconstrained movement:
       Added speed control mechanism
       Camera movement data is generated in geographic coordinate system
        (i.e., as latitude/longitude coordinates)




  9/24/2011                                                                 72
Camera Movement
                 Compuation (3)
                 Mixed Camera Movement

 Cameras sometimes follow the network and sometimes move randomly on an
  unconstraint path.
                                 i.  Generate a network based
                                     trajectory (Tinit)
                                 ii. Randomly select n sub-segments
                                     (S1 through Sn) on the trajectory
                                     0|Si|  (Tinit/4) and Nrand= (Si) /
                                     |Tinit|)
                                      (user defines Nrand)
                                 i.  Replace Si with Trand(i)
                                 ii. Update timestamps



9/24/2011                                                            73
Camera Rotation
                      Compuation (1)
 Assigning meaningful camera direction angles is one of the novel
  features of the proposed data generator.

                               Camera direction computation
                                  Calculate Moving Direction

                    Adjust Directions on
                                            Randomize Direction Angles
                           Turns



                      Output: Georeferenced Video Meta-data

  Fixed camera:                                    Random rotation camera:

  1) Calculate moving direction                    1) Calculate moving direction

  2) Adjust directions on turns                    2) Adjust directions on turns
                                                   3) Randomize direction angles
 9/24/2011                                                                         74
Camera Rotation
                               Computation (2)
                                          Fixed Camera

1) Calculate moving direction                                 2) Adjust directions on turns
         Trajectory Tk




                                                    t1
                                                         t2



                                              Rotation angle from t1 to t2 is   Smooth down the rotation by
                                              larger than  max    (i.e.,       distributing the rotation
                                              rotation threshold)               amount forwards and
                                                                                backwards
   is the moving direction vector at time t


  9/24/2011                                                                                        75
Camera Rotation
                       Computation (3)
                           Fixed Camera




     Real-world data        Synthetic data before       Synthetic data after
                            direction adjustment       direction adjustment

 Illustration of camera direction adjustment for vehicle cameras


9/24/2011                                                                      76   76
Camera Rotation
                        Computation (3)
                     Randomly Rotating Camera
1) Calculate moving direction
2) Adjust directions on turns
3) Randomize direction angles
    Randomly rotate the directions
     at each sample point towards left
     or right
    Rotation amount is inversely
     proportional to the current
     camera speed level
    The rotation amount is
     guaranteed to be less than
     rotation threshold max
 9/24/2011                                      77
Experimental Evaluation (1)
 Goal: Evaluate the effectiveness of the synthetic data generation approach
  through a high level comparison between the real-world and synthetic data.


 Datasets:
    Generated two groups of synthetic data:
          1. Using vehicle camera template
          2. Using passenger camera template
    Both synthetic data groups were created based on the road network of
     Moscow, ID.
 Methodology:
    Analyze and compare the movements and rotations of real-world and
     synthetic datasets.
    Report:
          1. The average and maximum values for speed and rotation
          2. Frequency distribution of different speed and rotation levels
    9/24/2011                                                                  78
Experimental Evaluation (2)
              Comparison of Camera Movement Speed
                                                Maximum speed (km/h)      Average speed (km/h)    StdDev of speed
  Synthetic data with fixed camera                     87.91                     27.14                12.82
  Synthetic data with free camera rotation             87.28                     27.32                13.01
  Real-world data                                      0.564                     27.03                13.68

                                        Characteristics of the camera speed




                    Real-world data                                                       Synthetic data
                             Comparison of camera speed distributions for
                             real-world dataof camera movement speed on map
                                 Illustration and synthetic data with fixed
                             camera
9/24/2011                                                                                                           79
Experimental Evaluation (3)
                               Comparison of Camera Rotation
                                            Maximum rotation(degrees/s)       Average rotation(degrees/s)   StdDev of rotation
Synthetic data with fixed camera                       32.33                             4.64                     7.24
Synthetic data with free camera rotation               55.27                            12.59                     9.35
Real-world data                                        107.30                           11.53                     14.02

                                    Characteristics of the camera rotation ( max =60 degrees)




          Comparison of camera rotation distributions for               Comparison of camera rotation distributions for
          real-world data and synthetic data with fixed                 real-world dataSynthetic data data with random
                                                                                       and synthetic
          camera         Real-world data
                                                                        rotation camera
                                        Illustration of camera rotation on map
     9/24/2011                                                                                                        80
Experimental Evaluation (4)
                                  Performance Issues

   The measured data generation times for different types of datasets and
    parameter settings
 Camera Template    Trajectory   Rotation Pattern   Number of Time to Generate camera   Time to Assign   Total Time
                     Pattern                         Videos         trajectory (s)        Directions
Vehicle camera       Tnetwork     Fixed camera        2,980             124                  39             163
Passenger camera     Tnetwork    Random rotation      2,980             115                  201            316
Pedestrian camera     Tfree      Random rotation      2,970              32                  263            255
Pedestrian camera     Tmixed     Random rotation      2,970             271                  215            486



   The generator can create synthetic datasets in a reasonable amount of time
    with off-the-shelf computational resources




9/24/2011                                                                                                      81
Summary
    Proposed a two step synthetic data generation
            1. Computation of the camera movements
            2. Computation of the camera movements
    Compared the high-level properties of the
     synthetically generated data and those of real-
     world georeferenced video data.
    The synthetic meta-data exhibit equivalent
     characteristics to the real data, and hence can be
     used in a variety of mobile video management
     research.

9/24/2011                                                 82
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           83
VIII.Conclusions
9/24/2011                                   83
Motivation (1)
• Keyword is still the primary input method for
  multimedia search




9/24/2011                                         84
Motivation (2)
• Tech advance in geo-information systems:
      – More comprehensive data
      – Better usability
      – Nicer visualization




9/24/2011                                    85
Motivation (3)
• Tech advance in the manufacturing of smart
  phones
      – Mobile OS, various sensors, large storage, long
        battery life*
• Content-based methods
      – Still difficult to bridge the semantic gap
      – High computation cost



9/24/2011                                                 86
Target (1)
• Bridging the semantic gap: tagging




                            Marina Bay Sands, Marina Bay,
                            Singapore, cloudy, etc.



9/24/2011                                               87
Target (2)
• Bridging the semantic gap: tagging




   Viewable scene model



                                     Marina Bay Sands, Marina Bay,
                                     Singapore, cloudy, etc.


   Geo-information systems
9/24/2011                                                        88
Framework




9/24/2011               89
Visible Object Computation (1)
• Nice mapping between real world and
  simulated 3D world




            Video snapshot   Google earth

9/24/2011                                   90
Visible Object Computation (2)


             video        FoVs



            Real
                           GIS
            World

9/24/2011                               91
Visible Object Computation (3)
• For each FoV, compute the visible objects, and
  their visible angle ranges
• Three types of objects:
      – Front
      – Vertically visible
      – Occluded




9/24/2011                                      92
Visible Object Computation (4)
• Horizontally split the FoV into an number
  atomic sectors that contains a unique list of
  candidate objects
• Vertically check the object visibility




9/24/2011                                         93
Visible Object Computation (4)
• Repeat the process for every FoVs of the video
• Extract textual information
      – ID, name, type, coordinates, center, address,
        description, websites (external links)
      – From OpenStreetMap, GeoDeck
      – Able to expanding the sources (e.g., Wikipedia)




9/24/2011                                                 94
Tag Ranking & Associating (1)
• The visible objects are not equally relevant to
  the video
• More tags are generated with our method
      – SG dataset: 60
      – USC dataset: 49




9/24/2011                                           95
Tag Ranking & Associating (2)
• 6 basic visual criteria
      –     Closeness to the FoV center
      –     Distance to the camera location
      –     Horizontally visible angle range of the object
      –     Vertically visible angle range of the object
      –     Horizontally visible percentage of the object
      –     Vertically visible percentage of the object
• Additional hints from GIS or external sources
      – Special property (e.g., attraction, landmark …)
      – Wikipedia entry
9/24/2011                                                    96
Tag Ranking & Associating (3)
• Exploiting the temporal existence of the
  object
• Associating the tag to a specific segment




9/24/2011                                     97
Evaluation (1)
•   Implement prototype
•   Collect sample videos
•   Compare to YouTube auto-generated tags
•   User study
      – Familiarity to the place
      – Relevance of our tags
      – Relevance of YouTube tags
      – Preference to our of YouTube tags
      – User ranking
9/24/2011                                    98
Evaluation (2)
• User study results




9/24/2011                       99
Prototype
• http://eiger.ddns.comp.nus.edu.sg/~zhijie/Qu
  ery_v3.2.html




9/24/2011                                    100
Outline
I. Introduction & Motivation
II. Scene Modeling & Acquisition
III. Query Processing & Vector Model
IV. Result Presentation
V. Power Management
VI. Synthetic Video Meta-Data Generation
VII.Textual Annotation
                                           101
VIII.Conclusions
9/24/2011                                  101
Conclusions
• Annotation using sensors can provide automatic and
  objective meta-data for indexing and searching.
• Georeferenced video search has a great potential,
  especially in searching user generated videos.
• Many open questions:
     –      Standard format of meta-data
     –      Standard way of embedding meta-data
     –      Index structures of meta-data for fast searching
     –      Supporting new query types
     –      Combining with content based features
     –      Relevance ranking for result presentation
9/24/2011                                                      102
Thank You



            Further information at:
             http://geovid.org
            rogerz@comp.nus.edu.sg
9/24/2011
               seonkim@usc.edu        104
Relevant Publications (1)
    [ACM MM ’08]
    Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim
    Viewable Scene Modeling for Geospatial Video Search
    ACM Multimedia Conference (ACM MM 2008), Oct. 2008.

    [ACM MM ’09]
    Sakire Arslan Ay, Lingyan Zhang, Seon Ho Kim, Ma He, Roger Zimmermann
    GRVS: A Georeferenced Video Search Engine
    ACM Multimedia Conference (ACM MM 2009), Technical Demo, Oct. 2009.

    [MMSJ ’10]
    Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim
    Relevance Ranking in Georeferenced Video Search
    Multimedia Systems Journal, Springer, 2010.

    [MMSys ’11]
    Seon Ho Kim, Hao Jia, Sakie Arslan Ay, Roger Zimmermann
    Energy-Efficient Mobile Video Management using Smartphones
    ACM Multimedia Systems Conference (ACM MMSys 2011), Feb. 2011.
9/24/2011                                                                   105
Relevant Publications (2)
    [ACM MM ’11]
    Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, Roger Zimmermann
    Automatic Tag Generation and Ranking for Sensor-rich Outdoor Videos
    ACM Multimedia Conference (ACM MM 2011), Nov. 2011.

    [ACM MM ’11]
     Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim
    SRV-TAGS: An Automatic TAGging and Search System for Sensor-Rich Outdoor Videos
     ACM Multimedia Conference (ACM MM 2011), Technical Demo, Nov. 2011.

    [ACM MM ’11]
    Hao Jia, Guanfeng Wang, Beomjoo Seo, Roger Zimmermann
    Keyframe Presentation for Browsing of User-generated Videos on Map Interface
     ACM Multimedia Conference (ACM MM 2011), Short Paper, Nov. 2011.

    [ACM MM ’11]
    Beomjoo Seo, Jia Hao, Guanfeng Wang
    Sensor-rich Video Exploration on a Map Interface
     ACM Multimedia Conference (ACM MM 2011), Technical Demo, Nov. 2011.
9/24/2011                                                                          106

More Related Content

Similar to CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug

Harvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth ConstraintHarvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth ConstraintUniversity of Southern California
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationIRJET Journal
 
TB-Survey-2020.pdf
TB-Survey-2020.pdfTB-Survey-2020.pdf
TB-Survey-2020.pdfssuser50a5ec
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...University of Southern California
 
Sensor Observation Service Client for Android Mobile Phones
Sensor Observation Service Client for Android Mobile PhonesSensor Observation Service Client for Android Mobile Phones
Sensor Observation Service Client for Android Mobile PhonesCybera Inc.
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...Edge AI and Vision Alliance
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud ComputingGoa App
 
On demand retrieval of crowdsourced
On demand retrieval of crowdsourcedOn demand retrieval of crowdsourced
On demand retrieval of crowdsourcedmuhammed jassim k
 
ThinSWEClient - Visualising time series data with open source components.
ThinSWEClient - Visualising time series data with open source components.ThinSWEClient - Visualising time series data with open source components.
ThinSWEClient - Visualising time series data with open source components.Arne Bröring
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...University of Southern California
 

Similar to CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug (20)

Harvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth ConstraintHarvesting Crowdsourced Mobile Videos under Bandwidth Constraint
Harvesting Crowdsourced Mobile Videos under Bandwidth Constraint
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
 
TB-Survey-2020.pdf
TB-Survey-2020.pdfTB-Survey-2020.pdf
TB-Survey-2020.pdf
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
 
Parking Lot App
Parking Lot AppParking Lot App
Parking Lot App
 
Sensor Observation Service Client for Android Mobile Phones
Sensor Observation Service Client for Android Mobile PhonesSensor Observation Service Client for Android Mobile Phones
Sensor Observation Service Client for Android Mobile Phones
 
Kinect
KinectKinect
Kinect
 
Kinect
KinectKinect
Kinect
 
Kinect Lab Pt.
Kinect Lab Pt.Kinect Lab Pt.
Kinect Lab Pt.
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
 
DVO FAQ - Architecture Summary
DVO FAQ - Architecture SummaryDVO FAQ - Architecture Summary
DVO FAQ - Architecture Summary
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
J018136669
J018136669J018136669
J018136669
 
On demand retrieval of crowdsourced
On demand retrieval of crowdsourcedOn demand retrieval of crowdsourced
On demand retrieval of crowdsourced
 
DVO FAQ - Streaming Video
DVO FAQ - Streaming VideoDVO FAQ - Streaming Video
DVO FAQ - Streaming Video
 
ThinSWEClient - Visualising time series data with open source components.
ThinSWEClient - Visualising time series data with open source components.ThinSWEClient - Visualising time series data with open source components.
ThinSWEClient - Visualising time series data with open source components.
 
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
Crowdsourcing the Acquisition and Analysis of Mobile Videos for Disaster Resp...
 
VVC Project.pptx
VVC Project.pptxVVC Project.pptx
VVC Project.pptx
 
VVC Project.pptx
VVC Project.pptxVVC Project.pptx
VVC Project.pptx
 

More from cstalks

CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sepcstalks
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Augcstalks
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th Maycstalks
 
CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Marcstalks
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Marcstalks
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Marcstalks
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Marcstalks
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Febcstalks
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jancstalks
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jancstalks
 

More from cstalks (11)

CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sep
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th May
 
CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Mar
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Mar
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Mar
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Mar
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Feb
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jan
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jan
 

Recently uploaded

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Recently uploaded (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug

  • 1. GeoVid Geo-referenced Video Management Roger Zimmermann, Seon Ho Kim, Sakire Arslan Ay, Beomjoo Seo, Jia Hao, Guanfeng Wang, Ma He, Shunkai Fang, Lingyan Zhang, Zhijie Shen National University University of of Singapore Southern California http://geovid.org 9/24/2011 1
  • 2. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 2 VIII.Conclusions 9/24/2011 2
  • 3. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 3 VIII.Conclusions 9/24/2011 3
  • 4. Motivation (1) • Trends – User-generated video content is growing rapidly. – Mobile devices make it easy to capture video. • Challenge – Video is still difficult to manage and search. • Content-based Image Processing – It is very desirable to extract high-level semantic concepts from images and video – However, this is tremendously challenging 9/24/2011 4
  • 5. Motivation (2) • User-Tagging – Laborious and often ambiguous or subjective • Complementary Technique – Automatically add additional sensor information to the video during its collection → Sensor-rich video (we also call it geo-referenced video) – Ex.: location and direction information can now be collected through a number of sensors (e.g., GPS, compass, accelerometer). 9/24/2011 5
  • 6. Motivation (3) • Recent progress in sensors and integration Traditionally: Network Sensors interface + + + Now: Video capturing Various sensors WiFi Handheld mobility 9/24/2011 6
  • 7. Challenges (1) • Capacity constraint of the battery • Wireless bandwidth bottleneck • Searchability of videos Open-domain video content is very difficult to be efficiently and accurately searched 7 9/24/2011 7
  • 8. Challenges (2) • Video and sensor information storage and indexing • Result ranking • Result presentation 8 9/24/2011 8
  • 9. Sensor-Rich Video • Characteristics: – Concurrently collect sensor generated geospatial (and other) contextual data – Automatic: no user-interaction required (real-time tagging) – Data is objective (however, it may be noisy and/or inaccurate) → Generate a time-series of meta-data tags. • Meta-data can be efficiently searched and processed and may allow us to deduce certain properties about the video content. 9/24/2011 9
  • 10. Overview of Approach 1. Viewable scene modeling 2. Video and meta-data acquisition 3. Indexing, querying, and presentation of results 1) 2) 3) d  9/24/2011 10
  • 11. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 11 VIII.Conclusions 9/24/2011 11
  • 12. Viewable Scene Modeling (1) • More accurately describe the video stream through a camera field-of-view • Data collection using sensors – Camera location from GPS, camera direction from digital compass, viewing angle from camera parameters • Details can be found in *ACM MM’08+, *ACM MM’09+ 9/24/2011 13
  • 13. Viewable Scene Modeling (2) Circle Scene Coverage FOV Scene Coverage 9/24/2011 14
  • 14. Modeling Parameters (DB) Attributes Explanation filename Uploaded video file name <Plat,Plng> <Latitude, longitude> coordinate for camera location (read from GPS) altitude The altitude of view point (read from GPS) alpha Camera heading relative with the ground (read from compass) R Viewable distance theta Angular extent for camera field-of view tilt Camera pitch relative with the ground (read from compass) roll Camera roll relative with the ground (read from compass) ltime Local time for the FOV timecode 9/24/201115 9/24/2011 Timecode for the FOV in video (extracted from video) 15
  • 15. Viewable Scene Modeling (3) • Sensor values are sampled at different intervals – GPS: 1 per second – Compass: 40 per second – Video frames: 30 per second • Each frame is associated with the temporarily closest sensor values. • Interpolation can be used. • Optimization is implemented for GPS: position is only measured if movement is more than 10 m. 9/24/2011 16
  • 16. v0.1 Acquisition Prototype  Capture software for ◦ HD video, GPS data stream, & compass data stream GPS Compass 9/24/2011 17
  • 17. v0.2 Acquisition Prototype Setup for data collection: laptop computer; OceanServer OS5000-US compass; Canon VIXIA HV30 camera; Pharos iGPS-500 receiver. 9/24/201118 9/24/2011 18
  • 18. Smartphone Acquisition (1) iPhone App Android App 9/24/2011 19
  • 19. Mobile App Implementation Data format that stores sensor data: JSON (JavaScript Object Video Stream Recorder Notation) Location Receiver Orientation Receiver Data Storage and Synchronization Control Data Uploader Battery Status Monitor 20 9/24/2011 20
  • 20. Smartphone Acquisition (2) • Android App – http://geovid.org/Android/index.html Available for download • iPhone App – http://geovid.org/iphone/index.html Will be submitted to the App Store 9/24/2011 21
  • 21. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 22 VIII.Conclusions 9/24/2011 22
  • 22. Spatio-Temporal Search <-117.010, 46.725> <-117.013, 46.725> <-117.013, Search for the 46.728> videos of <-117.010, 46.728> “Kibbie Dome” <-117.010, 46.725> <-117.013, 46.725> <-117.013, Search for 46.728> videos that <-117.010, 46.728> capture the given . 9/24/2011 . 23 trajectory .
  • 23. Query Execution x time t • Moving cameras: 1 Find relevant Camera location x video segments, but omit irrelevant x segments x Object X x x x x x time t2 9/24/2011 25
  • 24. Ex.: Spatial Range Query Video 1 Video 2 Query area 9/24/2011 26
  • 25. Querying GeoRef Videos • Run spatio-temporal range queries • Extract videos that capture an area of interest: overlapping region 9/24/2011 27
  • 26. Approach – Search (1) • FOV model converts the video search problem into spatial object selection problem • Search only the All objects (n1) overlapping FOVs (not Test on simple approximations (n1 objects) the entire clip) Filter Step • Two step approach, positives (n2) negatives filter and refinement, is common in spatial DBs Test on exact geometry (n2 objects) • Note that refinement Refinement step can be very time Step true positives (n3) false positives consuming in videos 9/24/2011 28
  • 27. Approach – Search (2) Filter step using MBR Minimum bounding rectangle 17% 17% Refinement stepdirectionoverlap Filter step no checks info! MBR has checks overlap FOV Between FOV and query point between MBR and query point. Using MBR, meta-data cannot be fully utilized in filter step. 9/24/2011 29
  • 28. Vector Model Filter step using vector px y V -Vx +Vx Vy py Space transformation p Vx x FOV as vector V -Vy +Vy Camera location and direction can be used in filter step! Potential to be more accurate in filtering. 9/24/2011 30
  • 29. Query Processing – Point Query • Point query – “For a given query point q qx,qy in 2D geo-space, find all video frames that overlap with q.” • Only vectors inside the triangle shaped area in both px-Vx and py-Vy spaces will remain after the filter step. query point 2D geo-space px-Vx py-Vy 9/24/2011 The maximum magnitude of any vector is limited to M 31
  • 30. QP – Point Query with r • Point query with bounded distance r – “For a given query point q qx,qy in 2D geo-space, find all video frames that overlap with q, and that were taken within distance r.” 2D geo-space px-Vx py-Vy 9/24/2011 32
  • 31. QP – Directional Point Query • Directional Point Query – “For a given query point q qx,qy in 2D geo-space, find all video frames taken with the camera pointing in the Northwest direction and overlapping with q.” 2D geo-space px-Vx py-Vy 9/24/2011 33
  • 32. Vector Model – Implementation • So far, we represented FOV as a single vector • Problem: Single-vector model underestimates the coverage of the FOV. 2D geo-space px-Vx py-Vy 9/24/2011 34
  • 33. Vector Model – Implementation • Solution: Introduce an overestimation constant (). Expand the search space by  along the V axis. 2D geo-space px-Vx py-Vy 9/24/2011 35
  • 34. Experimental Results • Implemented smartphone apps with GPS and digital compass; software for recording video synchronized with sensor inputs. • Recorded hundreds of real video clips on the street while driving. • Stored georeferenced video meta-data in a MySQL database . • Implemented User Defined Functions for queries using vector model. • Constructed map-based user interface on the web. 9/24/2011 36
  • 35. Experimental Results • Purpose of experiments – Demonstrate the proof-of-concept, feasibility, and applicability – No emphasis on performance issues • Generate random queries and search overlapping video segments camera position x query point 9/24/2011 Camera positions and query points 37
  • 36. ER – Point Query • Recall: the number of overlapping FOVs returned by the filter step the total number of actually overlapping FOVs • Precision: the number of overlapping FOVs returned in the filter step the total number of all FOVs returned in the filter step 9/24/2011 38
  • 37. ER – Point Query with Distance r 9/24/2011 “Where is the Pizza Hut?” 39
  • 38. ER – Filtering with Vector Model • 1,000 random point queries with 10,652 FOVs in the database • Results from the filter step for the point query with bounded distance of 50 meters vector model with different values of overestimation 9/24/2011 constant () 40
  • 39. ER – Directional Point Query • Results from the filter step for the directional point query with viewing direction 45o±5o • MBR has no info about the direction, so it returns all 30,491 FOVs. • For  ≥ 0.3M, the vector model returns 90% less FOVs in the filter step compared to the MBR. 9/24/2011 (Recall for  = 0.3M is 0.948) 41
  • 40. ER – Directional Range Query • For the directional range query with viewing direction 0 o±5o Using MBR (no direction) Using Vector (direction: North) Our search portal - http://geovid.org Skip 9/24/2011 42
  • 41. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 43 VIII.Conclusions 9/24/2011 43
  • 42. Query Results: Video Segments • Example query: Search for the “University of Idaho Kibbie Dome”. • The query processed based on the viewable scene modeling returns more relevant video segments. 9/24/2011 44
  • 43. Search and Results: 2D http://geovid.org/Query.html 9/24/2011 45
  • 44. 2D: Technologies • LAMP stack (Linux, Apache, MySQL, PHP) • Google Maps API • Ajax, XML • UDF + MySQL • Flowplayer + Wowza Media Server 9/24/2011 46
  • 45. Results of 2D Presentation Challenge • Video is separate from map and requires “mental orientation” (rotation) that is not intuitive. Proposed Solution • Use Google Earth (or other mirror worlds) as a backdrop to overlay the acquired video clips in the correct locations and viewing directions. • Therefore, present the results in 3D. • Follow the path of the camera trajectory. 9/24/2011 47
  • 46. Presentation of Videos [Ni et al. 2009] Sample screenshots 9/24/2011 48
  • 47. Search and Results: 3D 9/24/2011 49
  • 48. 3D: Technologies • LAMP stack (Linux, Apache, MySQL, PHP) • Google Maps / Google Earth API • Ajax, XML, KML • UDF + MySQL • IFRAME Shim • HTML5 Video Techniques (Time Seeking) • 3D Perspective Videos (DrawImage, Canvas) 9/24/2011 50
  • 49. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 51 VIII.Conclusions 9/24/2011 51
  • 50. Transmission of Meta-data and Video Two simple approaches:  Immediate transmission after capturing through wireless network + Immediate availability of the data – Consumes lots of energy and bandwidth  Delayed transmission when a faster network is available – Sacrifices real time access + Low power consumption 9/24/2011 53
  • 51. Power-Efficient Method  Framework to support an efficient mobile video capture and transmission.  Observation: not all collected videos have high priority.  Core idea: separate the small amount of sensor meta- data from the large video content.  Meta-data is transmitted to a server in real-time.  Video content is searchable by viewable scene properties established from meta-data attached to each video.  Video is transmitted in an on-demand manner. 9/24/2011 54
  • 52. System Environment Sensor Meta- data Query Request Video Request Message (VRM) Video Segments Video Content Data Acquisition Data Storage and Query and Indexing Processing Upload Key idea: save considerable battery energy by delaying the costly transmission of the video segments that have not been requested. 55 9/24/2011 55
  • 53. Linear Regression-based Model Parameters of the HTC G1 smartphone used in the power model The overall system power consumption as a function of time t 56 [A. Shye, B. Sholbrock, and G. Memik. Into The Wild: Studying Real User Activity Patterns to Guide Power Optimization for Mobile Architectures. In Micro, 2009.] 9/24/2011 56
  • 54. Validation of Power Model Screenshot of the PowerTutor Power model vs. PowerTutor. [B. Tiwana and L. Zhang. PowerTutor. http://powertutor.org, 2009.] 9/24/2011 57
  • 55. Simulator Architecture Modules Immediate OnDemand 14.3km13.6km Network Evaluation Metrics N AP Topology w Generator AP Layout N node Node Energy ts Consumption Trajectory Execution T Generator Trajectory Plan Engine Power c Video+FOV Model Query Response Dc Generator FOV Scene Plan Latency q Query Transmitted Mq Generator Query List Data h 60 [Brinkhoff. A framework for generating network-based moving objects. 02] 9/24/2011 60
  • 56. Query Model Query workload: a list of query rectangles that are mapped to specific locations h: generate different distributions of queries Spatial query distribution with three different clustering parameter values h h=0 h=0.5 h=1 9/24/2011 61
  • 57. Performance: Without Battery Recharging Closed system where batteries cannot be recharged. Number of nodes alive. Query response latency. Node lifetimes and query response latency with 9/24/2011 N = 2,000 nodes. 62
  • 58. Performance: With Battery Recharging Mobile node density will eventually reach a dynamic equilibrium. Energy consumption and access latency with increasing meta-data upload period. 9/24/2011 63
  • 59. Performance: With Battery Recharging Energy consumption and average query response latency with 64 varying number of access points. 9/24/2011 64
  • 60. Performance: With Battery Recharging Energy consumption and average Total transmitted data size query response latency with as a function of query varying query clustering clustering parameter h. 9/24/2011 parameter h. 65
  • 61. Hybrid Strategy Overall energy consumption and query response latency when using a hybrid strategy with both Immediate and OnDemand as 66 a function of the switching threshold (h=0.5). 9/24/2011 66
  • 62. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 67 VIII.Conclusions 9/24/2011 67
  • 63. Real-World Video Collection “Capture the sensor inputs and fuse them with the video streams”  Recorded 134 video clips using the recording prototype system in Moscow, ID (total 170 mins video).  Videos covered a 6km by 5km region quite uniformly.  Average camera movement speed was 27km/h, and average camera rotation was around 12 degrees/s.  Collected meta-data included 10,652 FOV scenes in total. 9/24/2011 68
  • 64. Real-World Video Collection Challenges The collected real-world video data has not been large enough to evaluate realistic applications on the large scale.  Collecting real-world data requires considerable time and effort. A complementary solution is to synthetically generate georeferenced video meta-data. 9/24/2011 69
  • 65. Synthetic Video Meta-data Generation Input: Camera Template Specification Camera movement computation TIGER/Line The Brinkhoff Algorithm The GSTD Algorithm Files Merge Trajectories network-based mixed free movement movement movement Camera direction computation Calculate Moving Direction Adjust Directions on Randomize Direction Angles Turns Output: Georeferenced Video Meta-data 9/24/2011 70
  • 66. Camera Movement Computation (1) Network-based Movement  Cameras move on a road-network  Adopted the Brinkhoff algorithm for camera trajectory generation  Introduced stops and acceleration/ deceleration events at some road crossings and transitions  Camera accelerates with a constant rate(user defines the acceleration rate)  In a deceleration event reduction in camera speed is simulated based on the Binomial distribution B(n, p) vnext  v prev  n When n=20 and p=0.5 speed is reduced to half at every time instant 9/24/2011 71
  • 67. Camera Movement Computation (2) Free Camera Movement  Cameras move freely.  Improved the GSTD algorithm to generate the camera trajectories with unconstrained movement:  Added speed control mechanism  Camera movement data is generated in geographic coordinate system (i.e., as latitude/longitude coordinates) 9/24/2011 72
  • 68. Camera Movement Compuation (3) Mixed Camera Movement  Cameras sometimes follow the network and sometimes move randomly on an unconstraint path. i. Generate a network based trajectory (Tinit) ii. Randomly select n sub-segments (S1 through Sn) on the trajectory 0|Si|  (Tinit/4) and Nrand= (Si) / |Tinit|) (user defines Nrand) i. Replace Si with Trand(i) ii. Update timestamps 9/24/2011 73
  • 69. Camera Rotation Compuation (1)  Assigning meaningful camera direction angles is one of the novel features of the proposed data generator. Camera direction computation Calculate Moving Direction Adjust Directions on Randomize Direction Angles Turns Output: Georeferenced Video Meta-data Fixed camera: Random rotation camera: 1) Calculate moving direction 1) Calculate moving direction 2) Adjust directions on turns 2) Adjust directions on turns 3) Randomize direction angles 9/24/2011 74
  • 70. Camera Rotation Computation (2) Fixed Camera 1) Calculate moving direction 2) Adjust directions on turns Trajectory Tk t1 t2 Rotation angle from t1 to t2 is Smooth down the rotation by larger than  max (i.e., distributing the rotation rotation threshold) amount forwards and backwards is the moving direction vector at time t 9/24/2011 75
  • 71. Camera Rotation Computation (3) Fixed Camera Real-world data Synthetic data before Synthetic data after direction adjustment direction adjustment  Illustration of camera direction adjustment for vehicle cameras 9/24/2011 76 76
  • 72. Camera Rotation Computation (3) Randomly Rotating Camera 1) Calculate moving direction 2) Adjust directions on turns 3) Randomize direction angles  Randomly rotate the directions at each sample point towards left or right  Rotation amount is inversely proportional to the current camera speed level  The rotation amount is guaranteed to be less than rotation threshold max 9/24/2011 77
  • 73. Experimental Evaluation (1)  Goal: Evaluate the effectiveness of the synthetic data generation approach through a high level comparison between the real-world and synthetic data.  Datasets:  Generated two groups of synthetic data: 1. Using vehicle camera template 2. Using passenger camera template Both synthetic data groups were created based on the road network of Moscow, ID.  Methodology:  Analyze and compare the movements and rotations of real-world and synthetic datasets.  Report: 1. The average and maximum values for speed and rotation 2. Frequency distribution of different speed and rotation levels 9/24/2011 78
  • 74. Experimental Evaluation (2) Comparison of Camera Movement Speed Maximum speed (km/h) Average speed (km/h) StdDev of speed Synthetic data with fixed camera 87.91 27.14 12.82 Synthetic data with free camera rotation 87.28 27.32 13.01 Real-world data 0.564 27.03 13.68 Characteristics of the camera speed Real-world data Synthetic data Comparison of camera speed distributions for real-world dataof camera movement speed on map Illustration and synthetic data with fixed camera 9/24/2011 79
  • 75. Experimental Evaluation (3) Comparison of Camera Rotation Maximum rotation(degrees/s) Average rotation(degrees/s) StdDev of rotation Synthetic data with fixed camera 32.33 4.64 7.24 Synthetic data with free camera rotation 55.27 12.59 9.35 Real-world data 107.30 11.53 14.02 Characteristics of the camera rotation ( max =60 degrees) Comparison of camera rotation distributions for Comparison of camera rotation distributions for real-world data and synthetic data with fixed real-world dataSynthetic data data with random and synthetic camera Real-world data rotation camera Illustration of camera rotation on map 9/24/2011 80
  • 76. Experimental Evaluation (4) Performance Issues  The measured data generation times for different types of datasets and parameter settings Camera Template Trajectory Rotation Pattern Number of Time to Generate camera Time to Assign Total Time Pattern Videos trajectory (s) Directions Vehicle camera Tnetwork Fixed camera 2,980 124 39 163 Passenger camera Tnetwork Random rotation 2,980 115 201 316 Pedestrian camera Tfree Random rotation 2,970 32 263 255 Pedestrian camera Tmixed Random rotation 2,970 271 215 486  The generator can create synthetic datasets in a reasonable amount of time with off-the-shelf computational resources 9/24/2011 81
  • 77. Summary Proposed a two step synthetic data generation 1. Computation of the camera movements 2. Computation of the camera movements Compared the high-level properties of the synthetically generated data and those of real- world georeferenced video data. The synthetic meta-data exhibit equivalent characteristics to the real data, and hence can be used in a variety of mobile video management research. 9/24/2011 82
  • 78. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 83 VIII.Conclusions 9/24/2011 83
  • 79. Motivation (1) • Keyword is still the primary input method for multimedia search 9/24/2011 84
  • 80. Motivation (2) • Tech advance in geo-information systems: – More comprehensive data – Better usability – Nicer visualization 9/24/2011 85
  • 81. Motivation (3) • Tech advance in the manufacturing of smart phones – Mobile OS, various sensors, large storage, long battery life* • Content-based methods – Still difficult to bridge the semantic gap – High computation cost 9/24/2011 86
  • 82. Target (1) • Bridging the semantic gap: tagging Marina Bay Sands, Marina Bay, Singapore, cloudy, etc. 9/24/2011 87
  • 83. Target (2) • Bridging the semantic gap: tagging Viewable scene model Marina Bay Sands, Marina Bay, Singapore, cloudy, etc. Geo-information systems 9/24/2011 88
  • 85. Visible Object Computation (1) • Nice mapping between real world and simulated 3D world Video snapshot Google earth 9/24/2011 90
  • 86. Visible Object Computation (2) video FoVs Real GIS World 9/24/2011 91
  • 87. Visible Object Computation (3) • For each FoV, compute the visible objects, and their visible angle ranges • Three types of objects: – Front – Vertically visible – Occluded 9/24/2011 92
  • 88. Visible Object Computation (4) • Horizontally split the FoV into an number atomic sectors that contains a unique list of candidate objects • Vertically check the object visibility 9/24/2011 93
  • 89. Visible Object Computation (4) • Repeat the process for every FoVs of the video • Extract textual information – ID, name, type, coordinates, center, address, description, websites (external links) – From OpenStreetMap, GeoDeck – Able to expanding the sources (e.g., Wikipedia) 9/24/2011 94
  • 90. Tag Ranking & Associating (1) • The visible objects are not equally relevant to the video • More tags are generated with our method – SG dataset: 60 – USC dataset: 49 9/24/2011 95
  • 91. Tag Ranking & Associating (2) • 6 basic visual criteria – Closeness to the FoV center – Distance to the camera location – Horizontally visible angle range of the object – Vertically visible angle range of the object – Horizontally visible percentage of the object – Vertically visible percentage of the object • Additional hints from GIS or external sources – Special property (e.g., attraction, landmark …) – Wikipedia entry 9/24/2011 96
  • 92. Tag Ranking & Associating (3) • Exploiting the temporal existence of the object • Associating the tag to a specific segment 9/24/2011 97
  • 93. Evaluation (1) • Implement prototype • Collect sample videos • Compare to YouTube auto-generated tags • User study – Familiarity to the place – Relevance of our tags – Relevance of YouTube tags – Preference to our of YouTube tags – User ranking 9/24/2011 98
  • 94. Evaluation (2) • User study results 9/24/2011 99
  • 96. Outline I. Introduction & Motivation II. Scene Modeling & Acquisition III. Query Processing & Vector Model IV. Result Presentation V. Power Management VI. Synthetic Video Meta-Data Generation VII.Textual Annotation 101 VIII.Conclusions 9/24/2011 101
  • 97. Conclusions • Annotation using sensors can provide automatic and objective meta-data for indexing and searching. • Georeferenced video search has a great potential, especially in searching user generated videos. • Many open questions: – Standard format of meta-data – Standard way of embedding meta-data – Index structures of meta-data for fast searching – Supporting new query types – Combining with content based features – Relevance ranking for result presentation 9/24/2011 102
  • 98. Thank You Further information at: http://geovid.org rogerz@comp.nus.edu.sg 9/24/2011 seonkim@usc.edu 104
  • 99. Relevant Publications (1) [ACM MM ’08] Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim Viewable Scene Modeling for Geospatial Video Search ACM Multimedia Conference (ACM MM 2008), Oct. 2008. [ACM MM ’09] Sakire Arslan Ay, Lingyan Zhang, Seon Ho Kim, Ma He, Roger Zimmermann GRVS: A Georeferenced Video Search Engine ACM Multimedia Conference (ACM MM 2009), Technical Demo, Oct. 2009. [MMSJ ’10] Sakire Arslan Ay, Roger Zimmermann, Seon Ho Kim Relevance Ranking in Georeferenced Video Search Multimedia Systems Journal, Springer, 2010. [MMSys ’11] Seon Ho Kim, Hao Jia, Sakie Arslan Ay, Roger Zimmermann Energy-Efficient Mobile Video Management using Smartphones ACM Multimedia Systems Conference (ACM MMSys 2011), Feb. 2011. 9/24/2011 105
  • 100. Relevant Publications (2) [ACM MM ’11] Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, Roger Zimmermann Automatic Tag Generation and Ranking for Sensor-rich Outdoor Videos ACM Multimedia Conference (ACM MM 2011), Nov. 2011. [ACM MM ’11] Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim SRV-TAGS: An Automatic TAGging and Search System for Sensor-Rich Outdoor Videos ACM Multimedia Conference (ACM MM 2011), Technical Demo, Nov. 2011. [ACM MM ’11] Hao Jia, Guanfeng Wang, Beomjoo Seo, Roger Zimmermann Keyframe Presentation for Browsing of User-generated Videos on Map Interface ACM Multimedia Conference (ACM MM 2011), Short Paper, Nov. 2011. [ACM MM ’11] Beomjoo Seo, Jia Hao, Guanfeng Wang Sensor-rich Video Exploration on a Map Interface ACM Multimedia Conference (ACM MM 2011), Technical Demo, Nov. 2011. 9/24/2011 106