One Shot Scene Specific Crowd
Counting
Mohammad Asiful Hossain, Mahesh Kumar K, Mehrdad Hosseinzadeh, Omit
Chanda and Yang Wang
Presented by
Hafsa Moontari Ali
ID: 7880835
Course Title: Research Methodology
Winter 2020, University of Manitoba
What is Crowd Counting?
●A technique used to count or estimate the number of people in a crowd.
●The most solution to crowd counting is to actually count the number of people from a
crowd.
●But it becomes difficult when the images of crowd are captured from open areas such as
streets or parks.
2
Sample Image
3
The problem becomes harder...
4
Where are the potential application areas?
●Urban planning
●Surveillance
●Traffic monitoring
●Geo-political analysis
5
Contribution of this paper
●Addressed a novel problem, “one-shot scene-specific crowd counting”.
●Generated a crowd counting model using Deep Learning.
●Significantly outperformed baseline methods.
6
Proposed Approach
●A density map is predicted from the
input static image.
●Each pixel of the density map
indicates the crowd density at the
corresponding location in the image.
●
●Crowd counting is obtained by
summing the entries of the density
map.
7
Model Architecture
●Dilated Convolutional Neural Networks(CSRNet) architecture is used as backbone.
●It employs convolutional neural network to extract features.
●Dilated convolutional neural network generates output from the features.
●The split of encoder/decoder is flexible and application specific.
8
Model Architecture
Figure 1: One-shot scene-specific adaptation using CSRNet
9
Model Learning
●During training, a collection of labeled training images are used.
●Each scene might correspond to a camera fixed at a particular location.
●It is assumed that each scene has same number of N training images.
●The model can be generalized where different scenes have different number of training
images.
●During training, the model learns the parameters of the encoder network.
10
One Shot Scene Specific Adaptation
●During testing, the crowd counting algorithm is deployed in a specific target scene.
●In this paper, one-shot learning is applied by fine tuning the decoder network.
●The distance between predicted density map and ground truth density map is considered
as loss function.
●Fine tuning is done by computing the gradient of the distance.
●The model is effectively tuned to the target scene.
11
Experimental Results
T
Table 1: Comparison of the performance (MAE and MSE) of our approach and the baselines on the WorldExpo’10
dataset and Trancos dataset. For “ours” and “simple fine-tuning”, either using the last layer or the last two layers of
CSRNet as the decoder are considered.
12
Cross Dataset Testing
Table 2: Performance in the cross-dataset testing with the same(a,b) and different (c,d) object. “W”, “U”, “M” and
“T” are used to denote WorldExpo’10, UCSD, Mall, Trancos, respectively.
13
Future Work
●This paper attempts to deploy a crowd counting model in real-world application.
●In future, this approach can be extended to few shot learning.
●Meta learning, meta-auxiliary learning can also be employed.
●This approach can be extended for unsupervised learning.
14
15
Any Question?

One shot scene specific crowd counting

  • 1.
    One Shot SceneSpecific Crowd Counting Mohammad Asiful Hossain, Mahesh Kumar K, Mehrdad Hosseinzadeh, Omit Chanda and Yang Wang Presented by Hafsa Moontari Ali ID: 7880835 Course Title: Research Methodology Winter 2020, University of Manitoba
  • 2.
    What is CrowdCounting? ●A technique used to count or estimate the number of people in a crowd. ●The most solution to crowd counting is to actually count the number of people from a crowd. ●But it becomes difficult when the images of crowd are captured from open areas such as streets or parks. 2
  • 3.
  • 4.
  • 5.
    Where are thepotential application areas? ●Urban planning ●Surveillance ●Traffic monitoring ●Geo-political analysis 5
  • 6.
    Contribution of thispaper ●Addressed a novel problem, “one-shot scene-specific crowd counting”. ●Generated a crowd counting model using Deep Learning. ●Significantly outperformed baseline methods. 6
  • 7.
    Proposed Approach ●A densitymap is predicted from the input static image. ●Each pixel of the density map indicates the crowd density at the corresponding location in the image. ● ●Crowd counting is obtained by summing the entries of the density map. 7
  • 8.
    Model Architecture ●Dilated ConvolutionalNeural Networks(CSRNet) architecture is used as backbone. ●It employs convolutional neural network to extract features. ●Dilated convolutional neural network generates output from the features. ●The split of encoder/decoder is flexible and application specific. 8
  • 9.
    Model Architecture Figure 1:One-shot scene-specific adaptation using CSRNet 9
  • 10.
    Model Learning ●During training,a collection of labeled training images are used. ●Each scene might correspond to a camera fixed at a particular location. ●It is assumed that each scene has same number of N training images. ●The model can be generalized where different scenes have different number of training images. ●During training, the model learns the parameters of the encoder network. 10
  • 11.
    One Shot SceneSpecific Adaptation ●During testing, the crowd counting algorithm is deployed in a specific target scene. ●In this paper, one-shot learning is applied by fine tuning the decoder network. ●The distance between predicted density map and ground truth density map is considered as loss function. ●Fine tuning is done by computing the gradient of the distance. ●The model is effectively tuned to the target scene. 11
  • 12.
    Experimental Results T Table 1:Comparison of the performance (MAE and MSE) of our approach and the baselines on the WorldExpo’10 dataset and Trancos dataset. For “ours” and “simple fine-tuning”, either using the last layer or the last two layers of CSRNet as the decoder are considered. 12
  • 13.
    Cross Dataset Testing Table2: Performance in the cross-dataset testing with the same(a,b) and different (c,d) object. “W”, “U”, “M” and “T” are used to denote WorldExpo’10, UCSD, Mall, Trancos, respectively. 13
  • 14.
    Future Work ●This paperattempts to deploy a crowd counting model in real-world application. ●In future, this approach can be extended to few shot learning. ●Meta learning, meta-auxiliary learning can also be employed. ●This approach can be extended for unsupervised learning. 14
  • 15.