Crowd characterization
from computer vision
by Federico
KARAGULIAN
GENERAL NOTES
(Oct. 2023)
Referred to published
paper
https://www.mdpi.com/2413-
8851/7/2/65
Target Square:
Piazza Duca d’Aosta (Milano, Italy)
subway
subway
railway Central Station
subway
camera
(a) (b)
(c)
The YOLO v3 Model
(pre-processing)
64 by (3x3) Conv + ReLU
Input image:
256x256x3 pixel
32 by (1x1) Conv + ReLU
64 by (3x3) Conv + ReLU
Output: 128x128x3 px
Residual
block
128 by (3x3) Conv + ReLU
64 by (1x1) Conv + ReLU
128 by (3x3) Conv + ReLU
Output: 64x64x3 px
256 by (3x3) Conv + ReLU
32 by (3x3) Conv + ReLU
Down sampling
Down sampling
128 by (1x1) Conv + ReLU
256 by (3x3) Conv + ReLU
Output: 32x32x3 px
512 by (3x3) Conv + ReLU
256 by (1x1) Conv + ReLU
512 by (3x3) Conv + ReLU
Output: 16x16x3 px
1024 by (3x3) Conv + ReLU
512 by (1x1) Conv + ReLU
1024 by (3x3) Conv + ReLU Output: 8x8x3 px
Pre-trained weigths
(COCO dataset)
NN Darknet-53
Residual
block
Down sampling
Residual
block
Residual
block
Residual
block
Down sampling
Down sampling
Figure 2
Up-sampling
Up-sampling
1x
2x
8x
8x
4x
Output: 16x16x3 px
Output: 32x32x3 px
small
medium
large
anchor box &
class assignation
64 by (3x3) Conv + ReLU
Input image:
256x256x3 pixel
32 by (1x1) Conv + identity
64 by (3x3) Conv + identity
Output: 128x128x3 px
Residual
block
128 by (3x3) Conv + ReLU
64 by (1x1) Conv + identity
128 by (3x3) Conv + identity
Output: 64x64x3 px
256 by (3x3) Conv + ReLU
32 by (3x3) Conv + ReLU
Down sampling
Down sampling
128 by (1x1) Conv + identity
256 by (3x3) Conv + identity
Output: 32x32x3 px
512 by (3x3) Conv + ReLU
256 by (1x1) Conv + identity
512 by (3x3) Conv + identity
Output: 16x16x3 px
1024 by (3x3) Conv + ReLU
512 by (1x1) Conv + identity
1024 by (3x3) Conv + identity Output: 8x8x3 px
Pre-trained weigths
(COCO dataset)
YOLOv3 - NN Darknet-53
Residual
block
Down sampling
Residual
block
Residual
block
Residual
block
Down sampling
Down sampling
1x
2x
8x
8x
4x
small
medium
large
class
assignation
Postprocessing from YOLOv3 model results
➢ From pixels to meters (from 3D to 2D view using conversion factors that
considered the view angle of the camera, the viewpoint depth of the image and
top view dimensions of the Piazza.
➢ trajectories estimation by tracking each unique identifier and by construction of
time series of locations each minute
➢ Calculation of mean speed along each trajectory (= distance/trajectory duration)
➢ Calculation of “instantaneous” speed
➢ Filtering speed (> 0.3 & < 2.5 m/s)
➢ Calculation of the direction of the trajectory (in degrees) from ORIGIN --> DESTINATION for each ID
To build heatmaps:
➢ Convert position of each trajectory point into a projected coordinate system in meters
(projection epsg:32632, WGS 84/UTM zone 32N)
➢ Definition of a regular grid of 2x2m square cells
➢ Intersection of trajectory data with the grid
➢ Average of speed and direction data within each grid cell at time intervals of 15 minutes
and during morning/evening hours
➢ Calculation of Voronoi areas for each point (not per trajectory) (for density calculation)
➢ Intersection of density data with the grid
➢ Average of voronoi density within each grid cell at time intervals of 15 minutes
and during morning/evening hours
Post-processing
Identification of single ID
(ID == pedestrian)
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
subway
subway
General considerations about trajectories and speed:
Data cleaning
1) if distance between two consecutive traces >=1m,
2) if the distance with the previous point <=1m,
3) if distance with the two next points <=1m
4) if the timestamp difference (in seconds) between two consecutive traces is < 1 second
Filtering angles with criteria to avoid erroneous directions
5) If absolute angle between two consecutive directions is >= 90°, the trace is ignored
Targeting constant directions
6) Filtering data withing 10-30 seconds time accroding to defined directions
7) group_by directions with the highest number of recurrencies
Considerations
Most of trajectories are from the north to south
Most of pedestrians come from the subway located on the top-right side of the Piazza Duca d’Aosta
METRICS
Accuracy: ratio between the number of the DETECTED
pedestrians and the effectively TRACKED persons in each frame
Confidence: how much we are confident the DETECTED object
is a person (pedestrian) from Computer Vision
IoU: Intersection over Union, allows evaluating how similar a
predicted bounding box is to the ground truth box.
(IoU) is known to be a good metric for measuring overlap between two
bounding boxes or masks.
MOTA: Multiple Object Tracking Accuracy
MOTP: Multiple Object Tracking Precision (must be close to zero as
much as possible)
Clustering directions and Speed (morning and evening)
Speed Heat-maps
Spatial representation of identified pedestrians from CV
a) Three main clusters of pedestrian were found
b) Centroid of the clusters were localized in the proximity
of the subway access points (A) and in the middle of
the square (B) Piazza Duca d’Aosta in front of the
central station.
Mean SPEED heatmaps of pedestrians.
Speeds have been computed for every
trajectory of each pedestrian ID by
considering spatial increment within 1 sec.
Speed values were averaged every 15
minutes.
➢ Speed were filtered for considering a minimum
forward speed of 0.3 [m/s] and a maximum
speed of 2.5 [m/s].
➢ High speed values were observed at the centre
of the square while subway access points
showed homogeneous speeds
Density Heat-maps
(with Voronoi
Assumption)
Mean DENSITY heatmaps of pedestrians.
Density was computed as the ratio between the
number of pedestrian
within 2x2m2 cell and the area of the cell weighted
by the number of timestamps.
STATS
Density Heat-maps
(with Voronoi
Assumption)
for technical details contact:
federico.karagulian@enea.it
or
karafede@hotmail.com

Computer Vision

  • 1.
    Crowd characterization from computervision by Federico KARAGULIAN GENERAL NOTES (Oct. 2023) Referred to published paper https://www.mdpi.com/2413- 8851/7/2/65 Target Square: Piazza Duca d’Aosta (Milano, Italy) subway subway railway Central Station subway camera (a) (b) (c)
  • 2.
    The YOLO v3Model (pre-processing)
  • 3.
    64 by (3x3)Conv + ReLU Input image: 256x256x3 pixel 32 by (1x1) Conv + ReLU 64 by (3x3) Conv + ReLU Output: 128x128x3 px Residual block 128 by (3x3) Conv + ReLU 64 by (1x1) Conv + ReLU 128 by (3x3) Conv + ReLU Output: 64x64x3 px 256 by (3x3) Conv + ReLU 32 by (3x3) Conv + ReLU Down sampling Down sampling 128 by (1x1) Conv + ReLU 256 by (3x3) Conv + ReLU Output: 32x32x3 px 512 by (3x3) Conv + ReLU 256 by (1x1) Conv + ReLU 512 by (3x3) Conv + ReLU Output: 16x16x3 px 1024 by (3x3) Conv + ReLU 512 by (1x1) Conv + ReLU 1024 by (3x3) Conv + ReLU Output: 8x8x3 px Pre-trained weigths (COCO dataset) NN Darknet-53 Residual block Down sampling Residual block Residual block Residual block Down sampling Down sampling Figure 2 Up-sampling Up-sampling 1x 2x 8x 8x 4x Output: 16x16x3 px Output: 32x32x3 px small medium large anchor box & class assignation
  • 4.
    64 by (3x3)Conv + ReLU Input image: 256x256x3 pixel 32 by (1x1) Conv + identity 64 by (3x3) Conv + identity Output: 128x128x3 px Residual block 128 by (3x3) Conv + ReLU 64 by (1x1) Conv + identity 128 by (3x3) Conv + identity Output: 64x64x3 px 256 by (3x3) Conv + ReLU 32 by (3x3) Conv + ReLU Down sampling Down sampling 128 by (1x1) Conv + identity 256 by (3x3) Conv + identity Output: 32x32x3 px 512 by (3x3) Conv + ReLU 256 by (1x1) Conv + identity 512 by (3x3) Conv + identity Output: 16x16x3 px 1024 by (3x3) Conv + ReLU 512 by (1x1) Conv + identity 1024 by (3x3) Conv + identity Output: 8x8x3 px Pre-trained weigths (COCO dataset) YOLOv3 - NN Darknet-53 Residual block Down sampling Residual block Residual block Residual block Down sampling Down sampling 1x 2x 8x 8x 4x small medium large class assignation
  • 5.
    Postprocessing from YOLOv3model results ➢ From pixels to meters (from 3D to 2D view using conversion factors that considered the view angle of the camera, the viewpoint depth of the image and top view dimensions of the Piazza. ➢ trajectories estimation by tracking each unique identifier and by construction of time series of locations each minute ➢ Calculation of mean speed along each trajectory (= distance/trajectory duration) ➢ Calculation of “instantaneous” speed ➢ Filtering speed (> 0.3 & < 2.5 m/s) ➢ Calculation of the direction of the trajectory (in degrees) from ORIGIN --> DESTINATION for each ID To build heatmaps: ➢ Convert position of each trajectory point into a projected coordinate system in meters (projection epsg:32632, WGS 84/UTM zone 32N) ➢ Definition of a regular grid of 2x2m square cells ➢ Intersection of trajectory data with the grid ➢ Average of speed and direction data within each grid cell at time intervals of 15 minutes and during morning/evening hours ➢ Calculation of Voronoi areas for each point (not per trajectory) (for density calculation) ➢ Intersection of density data with the grid ➢ Average of voronoi density within each grid cell at time intervals of 15 minutes and during morning/evening hours
  • 6.
  • 7.
    Identification of singleID (ID == pedestrian)
  • 8.
    Person - 1771 Person- 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 Person - 1771 subway subway
  • 10.
    General considerations abouttrajectories and speed: Data cleaning 1) if distance between two consecutive traces >=1m, 2) if the distance with the previous point <=1m, 3) if distance with the two next points <=1m 4) if the timestamp difference (in seconds) between two consecutive traces is < 1 second Filtering angles with criteria to avoid erroneous directions 5) If absolute angle between two consecutive directions is >= 90°, the trace is ignored Targeting constant directions 6) Filtering data withing 10-30 seconds time accroding to defined directions 7) group_by directions with the highest number of recurrencies Considerations Most of trajectories are from the north to south Most of pedestrians come from the subway located on the top-right side of the Piazza Duca d’Aosta
  • 11.
    METRICS Accuracy: ratio betweenthe number of the DETECTED pedestrians and the effectively TRACKED persons in each frame Confidence: how much we are confident the DETECTED object is a person (pedestrian) from Computer Vision IoU: Intersection over Union, allows evaluating how similar a predicted bounding box is to the ground truth box. (IoU) is known to be a good metric for measuring overlap between two bounding boxes or masks. MOTA: Multiple Object Tracking Accuracy MOTP: Multiple Object Tracking Precision (must be close to zero as much as possible)
  • 12.
    Clustering directions andSpeed (morning and evening)
  • 13.
    Speed Heat-maps Spatial representationof identified pedestrians from CV a) Three main clusters of pedestrian were found b) Centroid of the clusters were localized in the proximity of the subway access points (A) and in the middle of the square (B) Piazza Duca d’Aosta in front of the central station. Mean SPEED heatmaps of pedestrians. Speeds have been computed for every trajectory of each pedestrian ID by considering spatial increment within 1 sec. Speed values were averaged every 15 minutes. ➢ Speed were filtered for considering a minimum forward speed of 0.3 [m/s] and a maximum speed of 2.5 [m/s]. ➢ High speed values were observed at the centre of the square while subway access points showed homogeneous speeds
  • 14.
    Density Heat-maps (with Voronoi Assumption) MeanDENSITY heatmaps of pedestrians. Density was computed as the ratio between the number of pedestrian within 2x2m2 cell and the area of the cell weighted by the number of timestamps.
  • 15.
  • 16.
    for technical detailscontact: federico.karagulian@enea.it or karafede@hotmail.com