Generative AI on Enterprise Cloud with NiFi and Milvus
Computer Vision
1. Crowd characterization
from computer vision
by Federico
KARAGULIAN
GENERAL NOTES
(Oct. 2023)
Referred to published
paper
https://www.mdpi.com/2413-
8851/7/2/65
Target Square:
Piazza Duca d’Aosta (Milano, Italy)
subway
subway
railway Central Station
subway
camera
(a) (b)
(c)
3. 64 by (3x3) Conv + ReLU
Input image:
256x256x3 pixel
32 by (1x1) Conv + ReLU
64 by (3x3) Conv + ReLU
Output: 128x128x3 px
Residual
block
128 by (3x3) Conv + ReLU
64 by (1x1) Conv + ReLU
128 by (3x3) Conv + ReLU
Output: 64x64x3 px
256 by (3x3) Conv + ReLU
32 by (3x3) Conv + ReLU
Down sampling
Down sampling
128 by (1x1) Conv + ReLU
256 by (3x3) Conv + ReLU
Output: 32x32x3 px
512 by (3x3) Conv + ReLU
256 by (1x1) Conv + ReLU
512 by (3x3) Conv + ReLU
Output: 16x16x3 px
1024 by (3x3) Conv + ReLU
512 by (1x1) Conv + ReLU
1024 by (3x3) Conv + ReLU Output: 8x8x3 px
Pre-trained weigths
(COCO dataset)
NN Darknet-53
Residual
block
Down sampling
Residual
block
Residual
block
Residual
block
Down sampling
Down sampling
Figure 2
Up-sampling
Up-sampling
1x
2x
8x
8x
4x
Output: 16x16x3 px
Output: 32x32x3 px
small
medium
large
anchor box &
class assignation
4. 64 by (3x3) Conv + ReLU
Input image:
256x256x3 pixel
32 by (1x1) Conv + identity
64 by (3x3) Conv + identity
Output: 128x128x3 px
Residual
block
128 by (3x3) Conv + ReLU
64 by (1x1) Conv + identity
128 by (3x3) Conv + identity
Output: 64x64x3 px
256 by (3x3) Conv + ReLU
32 by (3x3) Conv + ReLU
Down sampling
Down sampling
128 by (1x1) Conv + identity
256 by (3x3) Conv + identity
Output: 32x32x3 px
512 by (3x3) Conv + ReLU
256 by (1x1) Conv + identity
512 by (3x3) Conv + identity
Output: 16x16x3 px
1024 by (3x3) Conv + ReLU
512 by (1x1) Conv + identity
1024 by (3x3) Conv + identity Output: 8x8x3 px
Pre-trained weigths
(COCO dataset)
YOLOv3 - NN Darknet-53
Residual
block
Down sampling
Residual
block
Residual
block
Residual
block
Down sampling
Down sampling
1x
2x
8x
8x
4x
small
medium
large
class
assignation
5. Postprocessing from YOLOv3 model results
➢ From pixels to meters (from 3D to 2D view using conversion factors that
considered the view angle of the camera, the viewpoint depth of the image and
top view dimensions of the Piazza.
➢ trajectories estimation by tracking each unique identifier and by construction of
time series of locations each minute
➢ Calculation of mean speed along each trajectory (= distance/trajectory duration)
➢ Calculation of “instantaneous” speed
➢ Filtering speed (> 0.3 & < 2.5 m/s)
➢ Calculation of the direction of the trajectory (in degrees) from ORIGIN --> DESTINATION for each ID
To build heatmaps:
➢ Convert position of each trajectory point into a projected coordinate system in meters
(projection epsg:32632, WGS 84/UTM zone 32N)
➢ Definition of a regular grid of 2x2m square cells
➢ Intersection of trajectory data with the grid
➢ Average of speed and direction data within each grid cell at time intervals of 15 minutes
and during morning/evening hours
➢ Calculation of Voronoi areas for each point (not per trajectory) (for density calculation)
➢ Intersection of density data with the grid
➢ Average of voronoi density within each grid cell at time intervals of 15 minutes
and during morning/evening hours
8. Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
Person - 1771
subway
subway
9.
10. General considerations about trajectories and speed:
Data cleaning
1) if distance between two consecutive traces >=1m,
2) if the distance with the previous point <=1m,
3) if distance with the two next points <=1m
4) if the timestamp difference (in seconds) between two consecutive traces is < 1 second
Filtering angles with criteria to avoid erroneous directions
5) If absolute angle between two consecutive directions is >= 90°, the trace is ignored
Targeting constant directions
6) Filtering data withing 10-30 seconds time accroding to defined directions
7) group_by directions with the highest number of recurrencies
Considerations
Most of trajectories are from the north to south
Most of pedestrians come from the subway located on the top-right side of the Piazza Duca d’Aosta
11. METRICS
Accuracy: ratio between the number of the DETECTED
pedestrians and the effectively TRACKED persons in each frame
Confidence: how much we are confident the DETECTED object
is a person (pedestrian) from Computer Vision
IoU: Intersection over Union, allows evaluating how similar a
predicted bounding box is to the ground truth box.
(IoU) is known to be a good metric for measuring overlap between two
bounding boxes or masks.
MOTA: Multiple Object Tracking Accuracy
MOTP: Multiple Object Tracking Precision (must be close to zero as
much as possible)
13. Speed Heat-maps
Spatial representation of identified pedestrians from CV
a) Three main clusters of pedestrian were found
b) Centroid of the clusters were localized in the proximity
of the subway access points (A) and in the middle of
the square (B) Piazza Duca d’Aosta in front of the
central station.
Mean SPEED heatmaps of pedestrians.
Speeds have been computed for every
trajectory of each pedestrian ID by
considering spatial increment within 1 sec.
Speed values were averaged every 15
minutes.
➢ Speed were filtered for considering a minimum
forward speed of 0.3 [m/s] and a maximum
speed of 2.5 [m/s].
➢ High speed values were observed at the centre
of the square while subway access points
showed homogeneous speeds
14. Density Heat-maps
(with Voronoi
Assumption)
Mean DENSITY heatmaps of pedestrians.
Density was computed as the ratio between the
number of pedestrian
within 2x2m2 cell and the area of the cell weighted
by the number of timestamps.