ACIVS’12
                            Advanced Concepts for Intelligent Vision Systems
Sept. 4-7 2012, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic


  Angelo Cozzolino, Francesco Flammini, Valentina Galli, Mariangela Lamberti, Giovanni Poggi, Concetta Pragliola

   Evaluating the effects of MJPEG compression on
    Motion Tracking in metro railway surveillance
                                             presented by
                                         Francesco Flammini

                               Ansaldo STS – Innovation & Security Engineer
                                    francesco.flammini@ansaldo-sts.com
Video Content Analytics in transit systems

  •    Transit systems attractive targets for:
        –    Thieves
        –    Vandals
        –    Terrorists

  •    Video surveillance essential for:
        –    Deterrence
        –    Detection
        –    Response
        –    Prosecution

  •    VCA supports Safety & Security Surveillance, especially
       when there are:
        –    High-number of cameras (hundreds to thousands)
        –    Low number of operators

  •    VS with VCA integrated in current PSIM (Physical Security
       Information Management) systems
        –    Pros: superior situation awareness                      Ref.
        –    Cons: possible issues with the number of false alarms
                                                                     Francesco Flammini: Critical Infrastructure
                                                                     Security: Assessment, Prevention, Detection,
  •    Frequent requests of upgrade of legacy CCTV with modern
       VCA systems                                                   Response, 2011 (WIT Press, Southampton, UK,
                                                                     ISBN: 978-1-84564-562-5)
  •    VCA event detection and performance requirements in
       recent tenders are increasingly demanding

ACIVS’12, Francesco Flammini
                                                                                                                    2
Performance evaluation of motion tracking
  • Ground Truth generation by annotation tools
  • GT includes for each frame:
        – Top-left
        – Bottom-right
          coordinates of the so called ‘bounding-boxes’          top
          surrounding objects detected in the scene       left
  • MT metrics defined in the literature to
    measure the temporal and spatial overlap
    by comparison between the Ground Truth
    and Algorithm Result produced by the                                   right
                                                                       bottom
    Motion Tracker, using appropriate
    thresholds
                         False Negative




                         False Positive


ACIVS’12, Francesco Flammini
                                                                                   3
Evaluation method


                                                  Video                        Metrics
 Video Selection               GT Generation                 AR Generation
                                               Compression                   Computation




• Videos have been analyzed by a Motion Tracker identical to the one
  installed in the real metro-railway but without using filters for alarm
  generation
• The Motion Tracker has generated for each compression level an
  AR text file with detected objects, whose information was structured
  coherently with the ones included in the GT

ACIVS’12, Francesco Flammini
                                                                                           4
Video selection
         Concourse - 7 objects                         Platform – simulation of object left behind




                                     4 CIF (720x576)
                                          25 FPS
         Turnstiles – 7 objects             60s        Tunnel portal – train passing, IR lamp
                                     ➩ 1500 frames




ACIVS’12, Francesco Flammini
                                                                                                5
MJPEG video compression
                        C = 1 (Q = 100%)   C ≈ 5 (Q = 50%)   C ≈ 10 (Q = 20%)




                       C ≈ 15 (Q = 10%)    C ≈ 20 (Q = 5%)    C ≈ 25 (Q =1%)




ACIVS’12, Francesco Flammini
                                                                                6
Metrics computation




• For metrics evaluation, we have developed a Matlab program that automatically
  computes the FN and FP metrics. The tool organizes its input data (GT and AR) in
  appropriate arrays, whose number of rows is equal to the number of objects while
  the number of columns is 5, that is:
      – The list of frames in which the object is present (i.e. the track), that is a vector whose
        length is equal to the number of frames of the track
      – Top-left and bottom-right coordinates of the bounding-boxes (4 numbers)
• It is being extended to compute other metrics (e.g. ‘ID change’)

ACIVS’12, Francesco Flammini
                                                                                                     7
Evaluation of results

                                                       •   Fluctuation of
                                                           results due to
                                                           algorithm adaptive
                                                           thresholds
                                                           depending on
                                                           scene
                                                           characteristics (e.g.
                                                           objects size,
                                                           ambient light, etc.)
                        (a)                  (b)
                                                       •   ‘Filtering’ effect of
                                                           the compression
                                                           can counterbalance
                                                           negative effect of
                                                           quality degradation,
                                                           by reducing the
                                                           number of
                                                           detectable objects



                        (c)                  (d)
ACIVS’12, Francesco Flammini
                                                                               8
Evaluation of trends




                               (a)                            (b)

• As expected, tracking performance degrades generally with quality, and this has a
  much relevant impact at higher levels of compression, in particular when the image
  quality threshold is lower than 20%, that is at compression ratios higher than 10
  (corresponding approximately to 4 Mbps bandwidth occupation)

ACIVS’12, Francesco Flammini
                                                                                   9
Main causes of False Negatives

  • Tiling (right) and occlusions
    (down) prevent the tracker
    to ‘hook’ the objects in the
    scene, and thus to track                            (a)


    their trajectory, since their
    IDs change frequently as
    they were different objects
                                            (b)               (c)




                                 (a)              (b)
ACIVS’12, Francesco Flammini
                                                                    10
Main causes of False Positives

                  Glare        Reflections   Camouflage   Large artefacts




ACIVS’12, Francesco Flammini
                                                                            11
Relevance of FP sources w.r.t. compression




                                                   (a)                                              (b)




                                                                              (c)
•    For the Concourse, all FP causes (especially glare) increase considerably with compression, while in Platform
     and Turnstiles the effects of artefacts is largely predominant with respect to other causes, which, however,
     continue to be relevant
•    Tunnel FP are not reported: since there is no real object moving in the scene, they show up only at train
     passage due to the light change in the scene; furthermore, the absence of most chromatic components w.r.t.
     other standard cameras (IR cameras only provide greyscale images) reduces the number of FP causes
     varying with compression levels
ACIVS’12, Francesco Flammini
                                                                                                                     12
Conclusions and future developments
• Performance degradation critical when passing from a 20% till a 1% quality level of
  compressed videos, whereas a 50% reduction on image quality represents a very
  acceptable trade-off (corresponding to ≈ 7 Mbps bandwidth occupation)

• In all the cases in which it is required to go over that ‘conservative’ ratio, it is
  necessary to evaluate how the error sources are affected in the correct detection of
  the objects, according to the specific features of each scene (motion density, light
  sources, camera shots, type of background, etc.)

• The results achieved can provide some guidelines which can be applicable in
  similar scenarios (technologies and contexts), e.g. using more efficient codecs

• Using the same evaluation method in any domain it is possible to:
      – support the design of surveillance systems by fine-tuning the video compression level
        against scene characteristics or other factors, for each camera (especially useful in
        distributed wireless systems)
      – quantify the effect on VCA performance of other quality or noise factors like
          • sensitivity, resolution, frame rate, etc.
          • vibrations, electro-magnetic interference, chromatic distortions, etc.

ACIVS’12, Francesco Flammini
                                                                                                13
Thank you for your kind attention

            Questions?

ACIVS'12 Presentation by Francesco Flammini

  • 1.
    ACIVS’12 Advanced Concepts for Intelligent Vision Systems Sept. 4-7 2012, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic Angelo Cozzolino, Francesco Flammini, Valentina Galli, Mariangela Lamberti, Giovanni Poggi, Concetta Pragliola Evaluating the effects of MJPEG compression on Motion Tracking in metro railway surveillance presented by Francesco Flammini Ansaldo STS – Innovation & Security Engineer francesco.flammini@ansaldo-sts.com
  • 2.
    Video Content Analyticsin transit systems • Transit systems attractive targets for: – Thieves – Vandals – Terrorists • Video surveillance essential for: – Deterrence – Detection – Response – Prosecution • VCA supports Safety & Security Surveillance, especially when there are: – High-number of cameras (hundreds to thousands) – Low number of operators • VS with VCA integrated in current PSIM (Physical Security Information Management) systems – Pros: superior situation awareness Ref. – Cons: possible issues with the number of false alarms Francesco Flammini: Critical Infrastructure Security: Assessment, Prevention, Detection, • Frequent requests of upgrade of legacy CCTV with modern VCA systems Response, 2011 (WIT Press, Southampton, UK, ISBN: 978-1-84564-562-5) • VCA event detection and performance requirements in recent tenders are increasingly demanding ACIVS’12, Francesco Flammini 2
  • 3.
    Performance evaluation ofmotion tracking • Ground Truth generation by annotation tools • GT includes for each frame: – Top-left – Bottom-right coordinates of the so called ‘bounding-boxes’ top surrounding objects detected in the scene left • MT metrics defined in the literature to measure the temporal and spatial overlap by comparison between the Ground Truth and Algorithm Result produced by the right bottom Motion Tracker, using appropriate thresholds False Negative False Positive ACIVS’12, Francesco Flammini 3
  • 4.
    Evaluation method Video Metrics Video Selection GT Generation AR Generation Compression Computation • Videos have been analyzed by a Motion Tracker identical to the one installed in the real metro-railway but without using filters for alarm generation • The Motion Tracker has generated for each compression level an AR text file with detected objects, whose information was structured coherently with the ones included in the GT ACIVS’12, Francesco Flammini 4
  • 5.
    Video selection Concourse - 7 objects Platform – simulation of object left behind 4 CIF (720x576) 25 FPS Turnstiles – 7 objects 60s Tunnel portal – train passing, IR lamp ➩ 1500 frames ACIVS’12, Francesco Flammini 5
  • 6.
    MJPEG video compression C = 1 (Q = 100%) C ≈ 5 (Q = 50%) C ≈ 10 (Q = 20%) C ≈ 15 (Q = 10%) C ≈ 20 (Q = 5%) C ≈ 25 (Q =1%) ACIVS’12, Francesco Flammini 6
  • 7.
    Metrics computation • Formetrics evaluation, we have developed a Matlab program that automatically computes the FN and FP metrics. The tool organizes its input data (GT and AR) in appropriate arrays, whose number of rows is equal to the number of objects while the number of columns is 5, that is: – The list of frames in which the object is present (i.e. the track), that is a vector whose length is equal to the number of frames of the track – Top-left and bottom-right coordinates of the bounding-boxes (4 numbers) • It is being extended to compute other metrics (e.g. ‘ID change’) ACIVS’12, Francesco Flammini 7
  • 8.
    Evaluation of results • Fluctuation of results due to algorithm adaptive thresholds depending on scene characteristics (e.g. objects size, ambient light, etc.) (a) (b) • ‘Filtering’ effect of the compression can counterbalance negative effect of quality degradation, by reducing the number of detectable objects (c) (d) ACIVS’12, Francesco Flammini 8
  • 9.
    Evaluation of trends (a) (b) • As expected, tracking performance degrades generally with quality, and this has a much relevant impact at higher levels of compression, in particular when the image quality threshold is lower than 20%, that is at compression ratios higher than 10 (corresponding approximately to 4 Mbps bandwidth occupation) ACIVS’12, Francesco Flammini 9
  • 10.
    Main causes ofFalse Negatives • Tiling (right) and occlusions (down) prevent the tracker to ‘hook’ the objects in the scene, and thus to track (a) their trajectory, since their IDs change frequently as they were different objects (b) (c) (a) (b) ACIVS’12, Francesco Flammini 10
  • 11.
    Main causes ofFalse Positives Glare Reflections Camouflage Large artefacts ACIVS’12, Francesco Flammini 11
  • 12.
    Relevance of FPsources w.r.t. compression (a) (b) (c) • For the Concourse, all FP causes (especially glare) increase considerably with compression, while in Platform and Turnstiles the effects of artefacts is largely predominant with respect to other causes, which, however, continue to be relevant • Tunnel FP are not reported: since there is no real object moving in the scene, they show up only at train passage due to the light change in the scene; furthermore, the absence of most chromatic components w.r.t. other standard cameras (IR cameras only provide greyscale images) reduces the number of FP causes varying with compression levels ACIVS’12, Francesco Flammini 12
  • 13.
    Conclusions and futuredevelopments • Performance degradation critical when passing from a 20% till a 1% quality level of compressed videos, whereas a 50% reduction on image quality represents a very acceptable trade-off (corresponding to ≈ 7 Mbps bandwidth occupation) • In all the cases in which it is required to go over that ‘conservative’ ratio, it is necessary to evaluate how the error sources are affected in the correct detection of the objects, according to the specific features of each scene (motion density, light sources, camera shots, type of background, etc.) • The results achieved can provide some guidelines which can be applicable in similar scenarios (technologies and contexts), e.g. using more efficient codecs • Using the same evaluation method in any domain it is possible to: – support the design of surveillance systems by fine-tuning the video compression level against scene characteristics or other factors, for each camera (especially useful in distributed wireless systems) – quantify the effect on VCA performance of other quality or noise factors like • sensitivity, resolution, frame rate, etc. • vibrations, electro-magnetic interference, chromatic distortions, etc. ACIVS’12, Francesco Flammini 13
  • 14.
    Thank you foryour kind attention Questions?