SlideShare a Scribd company logo
1 of 43
Single image crowd counting using deep learning models
End semester evaluation of M.Tech. project
By
Pawar Shubham Rajebhau - 2102102013
Under the guidance of
Dr. Vivek Kanhangad
Department of Electrical Engineering
Indian Institute of Technology Indore
Table of contents
1
Introduction
Loss functions
Density map estimation method
Frequency domain approach
Challenges
Crowd counting models
References
Future work
Evaluation metrics
Dataset
Overview of crowd counting methods
Multi-scale model approach
Self-supervised pre-training approach
Introduction
οƒ˜ Single image crowd counting is to estimate the number of objects (people, cars, cells, etc.) in an
unconstrained scene.
οƒ˜ It has important applications in public safety, traffic management, consumer behavior, cell counting,
etc.
οƒ˜ Extensive research has been done in this area, particularly with the use of deep learning.
2
Image- based crowd counting
Challenges
3
References:[1]
Overview of crowd counting methods
The three major crowd counting approaches are detection-based, regression-based, and density map
estimation.
οƒ˜ Based on computer vision techniques.
οƒ˜ Detect individual objects, heads, or body parts and count the total number in the image.
οƒ˜ Accuracy deteriorates in crowded scenes with severe occlusions.
οƒ˜ Requires full identification and outlining of each object, incurring the highest labeling cost.
4
Detection-based
Overview of crowd counting methods
οƒ˜ Estimates the count by directly relating it to the image.
οƒ˜ Achieves higher accuracy than the detection-based approach in crowded scenes.
οƒ˜ Lacks spatial information and interpretability, limiting its use in localization study.
οƒ˜ Does not require annotating individual objects, resulting in a lower annotation cost.
5
Regression-based approach
Overview of crowd counting methods
οƒ˜ Recently emerged as a promising approach.
οƒ˜ Achieves high accuracy for crowded scenes.
οƒ˜ Preserves spatial information of people distribution.
οƒ˜ Requires indicating only the heads of people, resulting in an intermediate labeling cost
between detection-based and regression-based approaches.
6
Density map estimation
Density map estimation method
7
οƒ˜ The annotated density map is a sparse
binary mask.
οƒ˜ Each individual person is marked with
a single dot on their head or forehead.
οƒ˜ The spatial extent of each person is not
provided.
Density map
Shanghai-Tech image
Density map estimation method
οƒ˜ Sparse density maps are converted to dense maps using fixed or variable gaussian kernels
οƒ˜ It helps train the model better than directly using a sparse dot map.
οƒ˜ Fixed standard deviation (sigma) of the Gaussian kernel.
οƒ˜ The sigma value is equal to the distance to the nearest neighbor.
οƒ˜ The sigma value is computed as the average of the distances to the three nearest neighbors, divided
by 10.
8
Sigma value computation methods
Density map estimation method
9
M1 M2 M3
Shanghai-Tech images
Density map estimation method
10
M1 M2 M3
Shanghai-Tech images
Crowd counting models
11
Single-column models Multi-column models
Models pre-trained backbone Single-column model with multiple modules
References:[10]
Loss functions
12
οƒ˜ Predicted density map is a smooth heat map and available ground truth is a dot map
οƒ˜ Choice of loss function depends on how we use the ground truth and predicted density maps
οƒ˜ Loss function improves performance by extracting proper supervisory information from ground
truth
L2 loss
οƒ˜ Pixel wise loss(network adjusts pixel value according to L2 loss considering same pixels in the
ground truth)
οƒ˜ Ground truth dot map converted to smooth heat map using Gaussian kernel.
οƒ˜ L2 loss is sensitive to the choice of variance in the Gaussian kernel
Loss functions
13
Optimum transport loss
οƒ˜ Considers predicted density maps and dot maps as probability distributions and uses balanced
OT to match the shape of the two distributions.
οƒ˜ Sinkhorn algorithm is used to obtain the optimal transport matrix.
Dataset
οƒ˜ ShanghaiTech dataset contains two parts: Part A and Part B.
οƒ˜ Part A contains 482 images (300 for training, 182 for testing) and includes high-density crowds
collected from the Internet.
οƒ˜ Part B contains 716 images (400 for training, 316 for testing) and is captured from busy streets in
urban areas of Shanghai.
οƒ˜ The average resolution of images in ShanghaiTech Part A 589x868 pixels.
οƒ˜ The scenes in Part B are less crowded than those in Part A.
14
Shanghai Tech
Dataset
15
Shanghai Tech
References:[11]
Dataset
16
Shanghai Tech
Shanghai-Tech images
Dataset
οƒ˜ Contains 4,372 images with an average resolution of 1430x910 pixels.
οƒ˜ The dataset was collected from various geographical locations and under diverse conditions.
οƒ˜ Contains a total of 1.51 million dot annotations with an average of 346 dots per image and a
maximum of 25K dots.
οƒ˜ The dataset includes images captured under adverse weather and various illumination conditions,
ensuring improved diversity.
οƒ˜ Provides head-level labels such as dots, approx. bounding box, blur-level, etc. and image-level
labels such as scene type and weather condition.
17
JHU crowd
Dataset
18
JHU crowd
Dataset
19
JHU crowd
Jhu-crowd images
Evaluation metrics
οƒ˜ Determines the accuracy of the estimates
MAE =
1
𝑁 𝑖=1
𝑁
𝐢𝐼𝑖
π‘π‘Ÿπ‘’π‘‘
βˆ’ 𝐢𝐼𝑖
𝑔𝑑
οƒ˜ Indicates the robustness of the estimates
MSE =
1
𝑁 𝑖=1
𝑁
𝐢𝐼𝑖
π‘π‘Ÿπ‘’π‘‘
βˆ’ 𝐢𝐼𝑖
𝑔𝑑 2
N = Number of test images
π‘ͺπ‘°π’Š
𝒑𝒓𝒆𝒅
= π’‘π’“π’†π’…π’Šπ’„π’•π’Šπ’π’ 𝒓𝒆𝒔𝒖𝒍𝒕𝒔, π‘ͺπ‘°π’Š
π’ˆπ’•
= π’ˆπ’“π’π’–π’π’… 𝒕𝒓𝒖𝒕𝒉𝒔
20
Mean Absolute Error (MAE)
Mean Square Error (MSE)
Frequency domain approach
οƒ˜ Fourier transforms used for frequency response of predicted density map and ground truth dot map.
οƒ˜ The dispersed spatial information in the predicted and ground truth density maps is converted to
compact information.
21
Frequency domain approach
22
DFT plots
Frequency domain approach
Using CMTL model
23
image
Cascaded
–
mtl
model
predicted
density
map
DFT
of
Pred.dm
ground
truth
density
map
DFT
of
Pred.dm
Lf
L
Loss = L+𝜷Lf
Frequency domain approach
Before
Sht A
MAE MSE
86.7 132.0
24
Results
After
Sht A
𝜷 MAE MSE
0 85.78 130.92
0.0001 86.02 132.73
0.001 85.00 133.07
0.1 86.81 131.90
Multi-scale model approach
25
3x3
conv,256
3x3
conv,128
1x1
conv
Up
sampled
output
of
vgg-16
Density
map
Base model (DM-count)
DM-count
Reg-layer Density map
Multi-scale model approach
26
3x3
conv,256
3x3
conv,128
1x1
conv
Up
sampled
output
of
vgg-16
Density
map
DM-count plus one extra parallel layer
DM-count+
Reg-layers Density map
5x5
conv,256
5x5
conv,128
Multi-scale model approach
Before
JHU-Full
MAE MSE Epoch Crop-size
54.03 204.38 335 384
27
Results
After
JHU-Full
MAE MSE Epoch Crop-size
58.63 191.01 265 384
52.69 194.65 410 384
52.68 204.69 905 256
Multi-scale model approach
Before
JHU-High nβ‰₯100
MAE MSE Epoch Crop-size
110.63 297.34 245 384
28
Results
After
JHU-High nβ‰₯100
MAE MSE Epoch Crop-size
93.96 282.63 325 384
99.55 284.99 135 384
Multi-scale model approach
29
3x3
conv,256
3x3
conv,128
1x1
conv
Up
sampled
output
of
vgg-16
Density
map
DM-count plus one extra parallel layer
DM-count+
Reg-layers Density map
5x5
conv,256
5x5
conv,128
Pretrained-vgg-16
Multi-scale model approach
30
Vgg -16
13 layers of Vgg-16
Frozen Fine-tuned
Multi-scale model approach
Before
Sha-A
MAE MSE Epoch Crop-size
65.60 103.13 435 256
31
Results
After
Sha-A
MAE MSE Epoch Crop-
size
Frozen
layers
62.11 99.37 290 256 9
65.09 98.38 230 256 9
61.25 102.22 440 256 5
Multi-scale model approach
32
3x3
conv,
256,
dr:
2
1x1
conv
Up
sampled
output
of
vgg-16
Density
map
DM-count dilated convolution layer
DM-count++
Reg-layers Density map
3x3
conv,256
3x3
conv,128
3x3
conv,
128,
dr:
2
Multi-scale model approach
οƒ˜ Dilated convolution uses sparse kernels.
οƒ˜ This enlarges the receptive field without increasing the number of parameters or the
amount of computation.
33
Dilation convolution
References:[12]
Multi-scale model approach
Before
Sha-A
MAE MSE Epoch Crop-size
65.60 103.13 435 256
34
Results
After
Sha-A
MAE MSE Epoch Crop-
size
Frozen
layers
64.58 97.80 390 256 9
67.13 96.67 270 256 9
67.78 100.06 125 256 9
Multi-scale model approach
35
Results
After
JHU-Full
MAE MSE Epoch Crop-size
53.87 190.63 45 384
Before
JHU-Full
MAE MSE Epoch Crop-size
54.03 204.38 335 384
Self-supervised pre-training approach
36
3x3
conv,256
3x3
conv,128
1x1
conv
Up
sampled
output
of
vgg-16
Pred
density
map
DM-count with self-supervised branch
Density map estimation branch
I/P
Image
1x1
conv,
512
Up
sampled
o/p
g.t.
density
map
L1
L2
I/P
Image
L = L1 + Ξ²*L2
Self-supervised pre-training
Self-supervised pre-training approach
37
Supervised Masked Autoencoders
Self-supervised pre-training
Supervised branch
References:[9]
Self-supervised pre-training approach
Before
JHU-High nβ‰₯100
MAE MSE Epoch
110.63 297.34 245
38
Results
After
JHU-High nβ‰₯100
MAE MSE Epoch Ξ²
116.44 255.35 145 0.1
112.83 275.26 125 0.01
Future work
οƒ˜ Adaptive dilation rate models
οƒ˜ Transformer based self–supervised models
39
References
[1] J. Wan, Z. Liu and A. B. Chan, "A Generalized Loss Function for Crowd Counting and
Localization," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2021, pp. 1974-1983, doi: 10.1109/CVPR46437.2021.00201
[2] Z. Ma, X. Wei, X. Hong, and Y. Gong, β€œBayesian loss for crowd count estimation with point
supervision,” in ICCV, 2019, pp. 6142–6151.
[3] Wang, B., Liu, H., Samaras, D., & Nguyen, M. H. (2020). Distribution matching for crowd
counting. Advances in neural information processing systems, 33, 1595-1607.
[4] V. Sindagi and V. Patel, "CNN-Based cascaded multi-task learning of high-level prior and
density estimation for crowd counting," in 2017 14th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 2017 pp. 1-6. doi:
10.1109/AVSS.2017.8078491
[5] Idrees, Haroon, et al. "Composition loss for counting, density map estimation and
localization in dense crowds." Proceedings of the European conference on computer vision
(ECCV). 2018.
[6] Jiang L, Dai B, Wu W, Loy CC. Focal frequency loss for image reconstruction and
synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision 2021
(pp. 13919-13929).
[7] W. Shu, J. Wan, K. C. Tan, S. Kwong and A. B. Chan, "Crowd Counting in the Frequency
Domain," 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2022, pp. 19586-19595, doi: 10.1109/CVPR52688.2022.01900.
40
References
[8] SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example
Focusing Jiwei Chen, Kewei Wang, Wen Su, Zengfu Wang
[9] Liang, Feng, Yangguang Li, and Diana Marculescu. "Supmae: Supervised masked
autoencoders are efficient vision learners." arXiv preprint arXiv:2205.14540 (2022).
[10] Khan, M. A., Menouar, H., & Hamila, R. (2022). Revisiting crowd counting: State-of-the-
art, trends, and future perspectives. Image and Vision Computing, 104597.
[11] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, β€œSingle-image crowd counting via multi-
column convolutional neural network,” in CVPR, 2016, pp. 589–597.
[12] Y. Li, X. Zhang, and D. Chen, β€œCsrnet: Dilated convolutional neural networks for
understanding the highly congested scenes,” in CVPR, 2018, pp. 1091–1100.
42
Thank you

More Related Content

What's hot

Object detection.pptx
Object detection.pptxObject detection.pptx
Object detection.pptxshradhaketkale2
Β 
Log Transformation in Image Processing with Example
Log Transformation in Image Processing with ExampleLog Transformation in Image Processing with Example
Log Transformation in Image Processing with ExampleMustak Ahmmed
Β 
Image restoration and degradation model
Image restoration and degradation modelImage restoration and degradation model
Image restoration and degradation modelAnupriyaDurai
Β 
AGE AND GENDER DETECTION.pptx
AGE AND GENDER DETECTION.pptxAGE AND GENDER DETECTION.pptx
AGE AND GENDER DETECTION.pptxssuserb4a9ba
Β 
One shot scene specific crowd counting
One shot scene specific crowd countingOne shot scene specific crowd counting
One shot scene specific crowd countingmadhobilota
Β 
SPATIAL FILTERING IN IMAGE PROCESSING
SPATIAL FILTERING IN IMAGE PROCESSINGSPATIAL FILTERING IN IMAGE PROCESSING
SPATIAL FILTERING IN IMAGE PROCESSINGmuthu181188
Β 
Face Detection.pptx
Face Detection.pptxFace Detection.pptx
Face Detection.pptxTorshaSett
Β 
Digital Image Fundamentals - II
Digital Image Fundamentals - IIDigital Image Fundamentals - II
Digital Image Fundamentals - IIHemantha Kulathilake
Β 
6.frequency domain image_processing
6.frequency domain image_processing6.frequency domain image_processing
6.frequency domain image_processingNashid Alam
Β 
Introduction to Image Processing with MATLAB
Introduction to Image Processing with MATLABIntroduction to Image Processing with MATLAB
Introduction to Image Processing with MATLABSriram Emarose
Β 
Image segmentation
Image segmentationImage segmentation
Image segmentationGayan Sampath
Β 
Lossless predictive coding in Digital Image Processing
Lossless predictive coding in Digital Image ProcessingLossless predictive coding in Digital Image Processing
Lossless predictive coding in Digital Image Processingpriyadharshini murugan
Β 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
Β 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image CompressionMathankumar S
Β 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSDThomas Delteil
Β 
CV_1 Introduction of Computer Vision and its Application
CV_1 Introduction of Computer Vision and its ApplicationCV_1 Introduction of Computer Vision and its Application
CV_1 Introduction of Computer Vision and its ApplicationKhushali Kathiriya
Β 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extractionRushin Shah
Β 

What's hot (20)

Object detection.pptx
Object detection.pptxObject detection.pptx
Object detection.pptx
Β 
Log Transformation in Image Processing with Example
Log Transformation in Image Processing with ExampleLog Transformation in Image Processing with Example
Log Transformation in Image Processing with Example
Β 
Image restoration and degradation model
Image restoration and degradation modelImage restoration and degradation model
Image restoration and degradation model
Β 
AGE AND GENDER DETECTION.pptx
AGE AND GENDER DETECTION.pptxAGE AND GENDER DETECTION.pptx
AGE AND GENDER DETECTION.pptx
Β 
One shot scene specific crowd counting
One shot scene specific crowd countingOne shot scene specific crowd counting
One shot scene specific crowd counting
Β 
SPATIAL FILTERING IN IMAGE PROCESSING
SPATIAL FILTERING IN IMAGE PROCESSINGSPATIAL FILTERING IN IMAGE PROCESSING
SPATIAL FILTERING IN IMAGE PROCESSING
Β 
Face Detection.pptx
Face Detection.pptxFace Detection.pptx
Face Detection.pptx
Β 
Image segmentation
Image segmentationImage segmentation
Image segmentation
Β 
Digital Image Fundamentals - II
Digital Image Fundamentals - IIDigital Image Fundamentals - II
Digital Image Fundamentals - II
Β 
6.frequency domain image_processing
6.frequency domain image_processing6.frequency domain image_processing
6.frequency domain image_processing
Β 
Introduction to Image Processing with MATLAB
Introduction to Image Processing with MATLABIntroduction to Image Processing with MATLAB
Introduction to Image Processing with MATLAB
Β 
Image compression models
Image compression modelsImage compression models
Image compression models
Β 
Image segmentation
Image segmentationImage segmentation
Image segmentation
Β 
Computer graphics
Computer graphics   Computer graphics
Computer graphics
Β 
Lossless predictive coding in Digital Image Processing
Lossless predictive coding in Digital Image ProcessingLossless predictive coding in Digital Image Processing
Lossless predictive coding in Digital Image Processing
Β 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
Β 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image Compression
Β 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
Β 
CV_1 Introduction of Computer Vision and its Application
CV_1 Introduction of Computer Vision and its ApplicationCV_1 Introduction of Computer Vision and its Application
CV_1 Introduction of Computer Vision and its Application
Β 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
Β 

Similar to crowd counting.pptx

CenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterCenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterYunming Zhang
Β 
Adaptive lifting based image compression scheme using interactive artificial ...
Adaptive lifting based image compression scheme using interactive artificial ...Adaptive lifting based image compression scheme using interactive artificial ...
Adaptive lifting based image compression scheme using interactive artificial ...csandit
Β 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020RohanLekhwani
Β 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...IRJET Journal
Β 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites imagesYoussefKitane
Β 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUEScscpconf
Β 
Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...
Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...
Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...IJTET Journal
Β 
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATIONA DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATIONcscpconf
Β 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
Β 
Stereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud dataStereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud dataTELKOMNIKA JOURNAL
Β 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLlauratoni4
Β 
Boosting ced using robust orientation estimation
Boosting ced using robust orientation estimationBoosting ced using robust orientation estimation
Boosting ced using robust orientation estimationijma
Β 
Stereo matching based on absolute differences for multiple objects detection
Stereo matching based on absolute differences for multiple objects detectionStereo matching based on absolute differences for multiple objects detection
Stereo matching based on absolute differences for multiple objects detectionTELKOMNIKA JOURNAL
Β 
Path loss prediction
Path loss predictionPath loss prediction
Path loss predictionNguyen Minh Thu
Β 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
Β 
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATIONA DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATIONcsandit
Β 
Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...Tuan Q. Pham
Β 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
Β 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
Β 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
Β 

Similar to crowd counting.pptx (20)

CenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterCenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-Poster
Β 
Adaptive lifting based image compression scheme using interactive artificial ...
Adaptive lifting based image compression scheme using interactive artificial ...Adaptive lifting based image compression scheme using interactive artificial ...
Adaptive lifting based image compression scheme using interactive artificial ...
Β 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020
Β 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Β 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
Β 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
Β 
Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...
Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...
Supervised Blood Vessel Segmentation in Retinal Images Using Gray level and M...
Β 
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATIONA DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
Β 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
Β 
Stereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud dataStereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud data
Β 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RL
Β 
Boosting ced using robust orientation estimation
Boosting ced using robust orientation estimationBoosting ced using robust orientation estimation
Boosting ced using robust orientation estimation
Β 
Stereo matching based on absolute differences for multiple objects detection
Stereo matching based on absolute differences for multiple objects detectionStereo matching based on absolute differences for multiple objects detection
Stereo matching based on absolute differences for multiple objects detection
Β 
Path loss prediction
Path loss predictionPath loss prediction
Path loss prediction
Β 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
Β 
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATIONA DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
Β 
Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...
Β 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Β 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
Β 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
Β 

Recently uploaded

Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
Β 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
Β 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
Β 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
Β 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
Β 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
Β 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
Β 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
Β 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
Β 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
Β 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
Β 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 

Recently uploaded (20)

Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
Β 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Β 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
Β 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Β 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
Β 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Β 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Β 
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCRβ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
Β 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
Β 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Β 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
Β 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Β 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
Β 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Β 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
Β 

crowd counting.pptx

  • 1. Single image crowd counting using deep learning models End semester evaluation of M.Tech. project By Pawar Shubham Rajebhau - 2102102013 Under the guidance of Dr. Vivek Kanhangad Department of Electrical Engineering Indian Institute of Technology Indore
  • 2. Table of contents 1 Introduction Loss functions Density map estimation method Frequency domain approach Challenges Crowd counting models References Future work Evaluation metrics Dataset Overview of crowd counting methods Multi-scale model approach Self-supervised pre-training approach
  • 3. Introduction οƒ˜ Single image crowd counting is to estimate the number of objects (people, cars, cells, etc.) in an unconstrained scene. οƒ˜ It has important applications in public safety, traffic management, consumer behavior, cell counting, etc. οƒ˜ Extensive research has been done in this area, particularly with the use of deep learning. 2 Image- based crowd counting
  • 5. Overview of crowd counting methods The three major crowd counting approaches are detection-based, regression-based, and density map estimation. οƒ˜ Based on computer vision techniques. οƒ˜ Detect individual objects, heads, or body parts and count the total number in the image. οƒ˜ Accuracy deteriorates in crowded scenes with severe occlusions. οƒ˜ Requires full identification and outlining of each object, incurring the highest labeling cost. 4 Detection-based
  • 6. Overview of crowd counting methods οƒ˜ Estimates the count by directly relating it to the image. οƒ˜ Achieves higher accuracy than the detection-based approach in crowded scenes. οƒ˜ Lacks spatial information and interpretability, limiting its use in localization study. οƒ˜ Does not require annotating individual objects, resulting in a lower annotation cost. 5 Regression-based approach
  • 7. Overview of crowd counting methods οƒ˜ Recently emerged as a promising approach. οƒ˜ Achieves high accuracy for crowded scenes. οƒ˜ Preserves spatial information of people distribution. οƒ˜ Requires indicating only the heads of people, resulting in an intermediate labeling cost between detection-based and regression-based approaches. 6 Density map estimation
  • 8. Density map estimation method 7 οƒ˜ The annotated density map is a sparse binary mask. οƒ˜ Each individual person is marked with a single dot on their head or forehead. οƒ˜ The spatial extent of each person is not provided. Density map Shanghai-Tech image
  • 9. Density map estimation method οƒ˜ Sparse density maps are converted to dense maps using fixed or variable gaussian kernels οƒ˜ It helps train the model better than directly using a sparse dot map. οƒ˜ Fixed standard deviation (sigma) of the Gaussian kernel. οƒ˜ The sigma value is equal to the distance to the nearest neighbor. οƒ˜ The sigma value is computed as the average of the distances to the three nearest neighbors, divided by 10. 8 Sigma value computation methods
  • 10. Density map estimation method 9 M1 M2 M3 Shanghai-Tech images
  • 11. Density map estimation method 10 M1 M2 M3 Shanghai-Tech images
  • 12. Crowd counting models 11 Single-column models Multi-column models Models pre-trained backbone Single-column model with multiple modules References:[10]
  • 13. Loss functions 12 οƒ˜ Predicted density map is a smooth heat map and available ground truth is a dot map οƒ˜ Choice of loss function depends on how we use the ground truth and predicted density maps οƒ˜ Loss function improves performance by extracting proper supervisory information from ground truth L2 loss οƒ˜ Pixel wise loss(network adjusts pixel value according to L2 loss considering same pixels in the ground truth) οƒ˜ Ground truth dot map converted to smooth heat map using Gaussian kernel. οƒ˜ L2 loss is sensitive to the choice of variance in the Gaussian kernel
  • 14. Loss functions 13 Optimum transport loss οƒ˜ Considers predicted density maps and dot maps as probability distributions and uses balanced OT to match the shape of the two distributions. οƒ˜ Sinkhorn algorithm is used to obtain the optimal transport matrix.
  • 15. Dataset οƒ˜ ShanghaiTech dataset contains two parts: Part A and Part B. οƒ˜ Part A contains 482 images (300 for training, 182 for testing) and includes high-density crowds collected from the Internet. οƒ˜ Part B contains 716 images (400 for training, 316 for testing) and is captured from busy streets in urban areas of Shanghai. οƒ˜ The average resolution of images in ShanghaiTech Part A 589x868 pixels. οƒ˜ The scenes in Part B are less crowded than those in Part A. 14 Shanghai Tech
  • 18. Dataset οƒ˜ Contains 4,372 images with an average resolution of 1430x910 pixels. οƒ˜ The dataset was collected from various geographical locations and under diverse conditions. οƒ˜ Contains a total of 1.51 million dot annotations with an average of 346 dots per image and a maximum of 25K dots. οƒ˜ The dataset includes images captured under adverse weather and various illumination conditions, ensuring improved diversity. οƒ˜ Provides head-level labels such as dots, approx. bounding box, blur-level, etc. and image-level labels such as scene type and weather condition. 17 JHU crowd
  • 21. Evaluation metrics οƒ˜ Determines the accuracy of the estimates MAE = 1 𝑁 𝑖=1 𝑁 𝐢𝐼𝑖 π‘π‘Ÿπ‘’π‘‘ βˆ’ 𝐢𝐼𝑖 𝑔𝑑 οƒ˜ Indicates the robustness of the estimates MSE = 1 𝑁 𝑖=1 𝑁 𝐢𝐼𝑖 π‘π‘Ÿπ‘’π‘‘ βˆ’ 𝐢𝐼𝑖 𝑔𝑑 2 N = Number of test images π‘ͺπ‘°π’Š 𝒑𝒓𝒆𝒅 = π’‘π’“π’†π’…π’Šπ’„π’•π’Šπ’π’ 𝒓𝒆𝒔𝒖𝒍𝒕𝒔, π‘ͺπ‘°π’Š π’ˆπ’• = π’ˆπ’“π’π’–π’π’… 𝒕𝒓𝒖𝒕𝒉𝒔 20 Mean Absolute Error (MAE) Mean Square Error (MSE)
  • 22. Frequency domain approach οƒ˜ Fourier transforms used for frequency response of predicted density map and ground truth dot map. οƒ˜ The dispersed spatial information in the predicted and ground truth density maps is converted to compact information. 21
  • 24. Frequency domain approach Using CMTL model 23 image Cascaded – mtl model predicted density map DFT of Pred.dm ground truth density map DFT of Pred.dm Lf L Loss = L+𝜷Lf
  • 25. Frequency domain approach Before Sht A MAE MSE 86.7 132.0 24 Results After Sht A 𝜷 MAE MSE 0 85.78 130.92 0.0001 86.02 132.73 0.001 85.00 133.07 0.1 86.81 131.90
  • 27. Multi-scale model approach 26 3x3 conv,256 3x3 conv,128 1x1 conv Up sampled output of vgg-16 Density map DM-count plus one extra parallel layer DM-count+ Reg-layers Density map 5x5 conv,256 5x5 conv,128
  • 28. Multi-scale model approach Before JHU-Full MAE MSE Epoch Crop-size 54.03 204.38 335 384 27 Results After JHU-Full MAE MSE Epoch Crop-size 58.63 191.01 265 384 52.69 194.65 410 384 52.68 204.69 905 256
  • 29. Multi-scale model approach Before JHU-High nβ‰₯100 MAE MSE Epoch Crop-size 110.63 297.34 245 384 28 Results After JHU-High nβ‰₯100 MAE MSE Epoch Crop-size 93.96 282.63 325 384 99.55 284.99 135 384
  • 30. Multi-scale model approach 29 3x3 conv,256 3x3 conv,128 1x1 conv Up sampled output of vgg-16 Density map DM-count plus one extra parallel layer DM-count+ Reg-layers Density map 5x5 conv,256 5x5 conv,128 Pretrained-vgg-16
  • 31. Multi-scale model approach 30 Vgg -16 13 layers of Vgg-16 Frozen Fine-tuned
  • 32. Multi-scale model approach Before Sha-A MAE MSE Epoch Crop-size 65.60 103.13 435 256 31 Results After Sha-A MAE MSE Epoch Crop- size Frozen layers 62.11 99.37 290 256 9 65.09 98.38 230 256 9 61.25 102.22 440 256 5
  • 33. Multi-scale model approach 32 3x3 conv, 256, dr: 2 1x1 conv Up sampled output of vgg-16 Density map DM-count dilated convolution layer DM-count++ Reg-layers Density map 3x3 conv,256 3x3 conv,128 3x3 conv, 128, dr: 2
  • 34. Multi-scale model approach οƒ˜ Dilated convolution uses sparse kernels. οƒ˜ This enlarges the receptive field without increasing the number of parameters or the amount of computation. 33 Dilation convolution References:[12]
  • 35. Multi-scale model approach Before Sha-A MAE MSE Epoch Crop-size 65.60 103.13 435 256 34 Results After Sha-A MAE MSE Epoch Crop- size Frozen layers 64.58 97.80 390 256 9 67.13 96.67 270 256 9 67.78 100.06 125 256 9
  • 36. Multi-scale model approach 35 Results After JHU-Full MAE MSE Epoch Crop-size 53.87 190.63 45 384 Before JHU-Full MAE MSE Epoch Crop-size 54.03 204.38 335 384
  • 37. Self-supervised pre-training approach 36 3x3 conv,256 3x3 conv,128 1x1 conv Up sampled output of vgg-16 Pred density map DM-count with self-supervised branch Density map estimation branch I/P Image 1x1 conv, 512 Up sampled o/p g.t. density map L1 L2 I/P Image L = L1 + Ξ²*L2 Self-supervised pre-training
  • 38. Self-supervised pre-training approach 37 Supervised Masked Autoencoders Self-supervised pre-training Supervised branch References:[9]
  • 39. Self-supervised pre-training approach Before JHU-High nβ‰₯100 MAE MSE Epoch 110.63 297.34 245 38 Results After JHU-High nβ‰₯100 MAE MSE Epoch Ξ² 116.44 255.35 145 0.1 112.83 275.26 125 0.01
  • 40. Future work οƒ˜ Adaptive dilation rate models οƒ˜ Transformer based self–supervised models 39
  • 41. References [1] J. Wan, Z. Liu and A. B. Chan, "A Generalized Loss Function for Crowd Counting and Localization," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1974-1983, doi: 10.1109/CVPR46437.2021.00201 [2] Z. Ma, X. Wei, X. Hong, and Y. Gong, β€œBayesian loss for crowd count estimation with point supervision,” in ICCV, 2019, pp. 6142–6151. [3] Wang, B., Liu, H., Samaras, D., & Nguyen, M. H. (2020). Distribution matching for crowd counting. Advances in neural information processing systems, 33, 1595-1607. [4] V. Sindagi and V. Patel, "CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting," in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 2017 pp. 1-6. doi: 10.1109/AVSS.2017.8078491 [5] Idrees, Haroon, et al. "Composition loss for counting, density map estimation and localization in dense crowds." Proceedings of the European conference on computer vision (ECCV). 2018. [6] Jiang L, Dai B, Wu W, Loy CC. Focal frequency loss for image reconstruction and synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision 2021 (pp. 13919-13929). [7] W. Shu, J. Wan, K. C. Tan, S. Kwong and A. B. Chan, "Crowd Counting in the Frequency Domain," 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 19586-19595, doi: 10.1109/CVPR52688.2022.01900. 40
  • 42. References [8] SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing Jiwei Chen, Kewei Wang, Wen Su, Zengfu Wang [9] Liang, Feng, Yangguang Li, and Diana Marculescu. "Supmae: Supervised masked autoencoders are efficient vision learners." arXiv preprint arXiv:2205.14540 (2022). [10] Khan, M. A., Menouar, H., & Hamila, R. (2022). Revisiting crowd counting: State-of-the- art, trends, and future perspectives. Image and Vision Computing, 104597. [11] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, β€œSingle-image crowd counting via multi- column convolutional neural network,” in CVPR, 2016, pp. 589–597. [12] Y. Li, X. Zhang, and D. Chen, β€œCsrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in CVPR, 2018, pp. 1091–1100. 42