Optic Flow
Brightness Constancy Constraints
Aperture Problem
Regularization and Smoothness Constraints
Lucas-Kanade algorithm
Focus of Expansion (FOE)
Discrete Optimization for Optical Flow
Large Displacement Optical Flow: Descriptor Matching
DeepFlow: Large displ. optical flow with deep matching
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
Optical Flow with Piecewise Parametric Model
Flow Fields: Dense Correspondence Fields for Accurate Large Displacement Optical Flow Estimation
Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids
FlowNet: Learning Optical Flow with Convol. Networks
Deep Discrete Flow
Optical Flow Estimation using a Spatial Pyramid Network
A Large Dataset to Train ConvNets for Disparity, Optical Flow, and Scene Flow Estimation
DeMoN: Depth and Motion Network for Learning Monocular Stereo
Unsupervised Learning of Depth and Ego-Motion from Video
Appendix A: A Database and Evaluation Methodology for Optical Flow
Appendix B: Learning and optimization
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
Hyperspectral images can be efficiently compressed
through a linear predictive model, as for example the one
used in the SLSQ algorithm. In this paper we exploit this
predictive model on the AVIRIS images by individuating,
through an off-line approach, a common subset of bands, which
are not spectrally related with any other bands. These bands
are not useful as prediction reference for the SLSQ 3-D
predictive model and we need to encode them via other
prediction strategies which consider only spatial correlation.
We have obtained this subset by clustering the AVIRIS bands
via the clustering by compression approach. The main result
of this paper is the list of the bands, not related with the
others, for AVIRIS images. The clustering trees obtained for
AVIRIS and the relationship among bands they depict is also
an interesting starting point for future research.
Optic Flow
Brightness Constancy Constraints
Aperture Problem
Regularization and Smoothness Constraints
Lucas-Kanade algorithm
Focus of Expansion (FOE)
Discrete Optimization for Optical Flow
Large Displacement Optical Flow: Descriptor Matching
DeepFlow: Large displ. optical flow with deep matching
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
Optical Flow with Piecewise Parametric Model
Flow Fields: Dense Correspondence Fields for Accurate Large Displacement Optical Flow Estimation
Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids
FlowNet: Learning Optical Flow with Convol. Networks
Deep Discrete Flow
Optical Flow Estimation using a Spatial Pyramid Network
A Large Dataset to Train ConvNets for Disparity, Optical Flow, and Scene Flow Estimation
DeMoN: Depth and Motion Network for Learning Monocular Stereo
Unsupervised Learning of Depth and Ego-Motion from Video
Appendix A: A Database and Evaluation Methodology for Optical Flow
Appendix B: Learning and optimization
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
Hyperspectral images can be efficiently compressed
through a linear predictive model, as for example the one
used in the SLSQ algorithm. In this paper we exploit this
predictive model on the AVIRIS images by individuating,
through an off-line approach, a common subset of bands, which
are not spectrally related with any other bands. These bands
are not useful as prediction reference for the SLSQ 3-D
predictive model and we need to encode them via other
prediction strategies which consider only spatial correlation.
We have obtained this subset by clustering the AVIRIS bands
via the clustering by compression approach. The main result
of this paper is the list of the bands, not related with the
others, for AVIRIS images. The clustering trees obtained for
AVIRIS and the relationship among bands they depict is also
an interesting starting point for future research.
This paper presents a trifocal Rotman Lens Design
approach. The effects of focal ratio and element spacing on
the performance of Rotman Lens are described. A three beam
prototype feeding 4 element antenna array working in L-band
has been simulated using RLD v1.7 software. Simulated
results show that the simulated lens has a return loss of –
12.4dB at 1.8GHz. Beam to array port phase error variation
with change in the focal ratio and element spacing has also
been investigated.
WAVELET DECOMPOSITION AND ALPHA STABLE FUSIONsipij
This article gives a new method of fusing multifocal images combining the Laplacian pyramid and the wavelet decomposition using the stable distance alpha as a selection rule. We start by decomposing multifocal images into several pyramid levels, then applying the wavelet decomposition to each level. the originality of this work is to use the stable distance alpha to fuse the wavelet images at each level of the Pyramid. To obtain the final fused image, we reconstructed the combined image at each level of the pyramid. We compare our method to other existing methods in the literature and we deduce that it is almost better.
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...IJERA Editor
Due to the degradation of observed image the noisy, blurred, Distorted image can be occurred .for restoring image information we propose the sparse representations by conventional modelsmay not be accurate enough for a faithful reconstruction of the original image. To improve the performance of sparse representation-based image restoration,In this method the sparse coding noise is added for image restoration, due to this image restoration the sparse coefficients of original image can be detected. The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse representation model,for denoising the image here we use the Histogram clipping method by using histogram based sparse representation effectively reduce the noise.and also implement the TMR filter for Quality image.various types of image restoration problems, including denoising, deblurring and super-resolution, validate the generality and state-of-the-art performance of the proposed algorithm.
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...Ankit Thiranh
In this paper, a design is proposed for a compressive light field camera which will allow to recover light fields with higher resolution from a single image. Also, various other useful applications for light field atoms are discussed, including 4D light field compression and denoising.
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...CSCJournals
Image restorations with the Richardson-Lucy algorithm suffer the usual drawback imposed by the constraint of a constant Point Spread Function - PSF as unfolding function. Indeed, even when the image exhibits a constant spatial resolution over its whole surface, an important aspect is that as the iterations advance, the overall resolution is improved while the PSF remains constant. This work proposes an algorithm which restores images by the Richardson-Lucy (RL) algorithm, using however, a varying PSF as the iterations proceed. For this purpose, the PSF width is reduced to cope with the last-achieved image resolution and the next iteration would be carried out with it. The process is repeated until the PSF does not change significantly. A main point in this procedure is how to evaluate the PSF tied to the image resolution. In this work this is performed on the grounds that the global contrast increases with the resolution improvement, for many gray pixels migrate towards darker or brighter regions. Hence, deconvolving an image with a steadily increasing PSF width, somewhere a maximum global contrast would be reached, corresponding to the best PSF. Synthetic, as well as experimental images deconvolved with the proposed technique, outperform the final quality of the same ones treated with the original Richardson-Lucy algorithm for any number of iterations. The algorithm and ancillary procedures have been embedded into an ad hoc written Fortran 90 program capable to generate synthetic images and process them and the real ones.
This paper presents a trifocal Rotman Lens Design
approach. The effects of focal ratio and element spacing on
the performance of Rotman Lens are described. A three beam
prototype feeding 4 element antenna array working in L-band
has been simulated using RLD v1.7 software. Simulated
results show that the simulated lens has a return loss of –
12.4dB at 1.8GHz. Beam to array port phase error variation
with change in the focal ratio and element spacing has also
been investigated.
WAVELET DECOMPOSITION AND ALPHA STABLE FUSIONsipij
This article gives a new method of fusing multifocal images combining the Laplacian pyramid and the wavelet decomposition using the stable distance alpha as a selection rule. We start by decomposing multifocal images into several pyramid levels, then applying the wavelet decomposition to each level. the originality of this work is to use the stable distance alpha to fuse the wavelet images at each level of the Pyramid. To obtain the final fused image, we reconstructed the combined image at each level of the pyramid. We compare our method to other existing methods in the literature and we deduce that it is almost better.
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...IJERA Editor
Due to the degradation of observed image the noisy, blurred, Distorted image can be occurred .for restoring image information we propose the sparse representations by conventional modelsmay not be accurate enough for a faithful reconstruction of the original image. To improve the performance of sparse representation-based image restoration,In this method the sparse coding noise is added for image restoration, due to this image restoration the sparse coefficients of original image can be detected. The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse representation model,for denoising the image here we use the Histogram clipping method by using histogram based sparse representation effectively reduce the noise.and also implement the TMR filter for Quality image.various types of image restoration problems, including denoising, deblurring and super-resolution, validate the generality and state-of-the-art performance of the proposed algorithm.
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...Ankit Thiranh
In this paper, a design is proposed for a compressive light field camera which will allow to recover light fields with higher resolution from a single image. Also, various other useful applications for light field atoms are discussed, including 4D light field compression and denoising.
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...CSCJournals
Image restorations with the Richardson-Lucy algorithm suffer the usual drawback imposed by the constraint of a constant Point Spread Function - PSF as unfolding function. Indeed, even when the image exhibits a constant spatial resolution over its whole surface, an important aspect is that as the iterations advance, the overall resolution is improved while the PSF remains constant. This work proposes an algorithm which restores images by the Richardson-Lucy (RL) algorithm, using however, a varying PSF as the iterations proceed. For this purpose, the PSF width is reduced to cope with the last-achieved image resolution and the next iteration would be carried out with it. The process is repeated until the PSF does not change significantly. A main point in this procedure is how to evaluate the PSF tied to the image resolution. In this work this is performed on the grounds that the global contrast increases with the resolution improvement, for many gray pixels migrate towards darker or brighter regions. Hence, deconvolving an image with a steadily increasing PSF width, somewhere a maximum global contrast would be reached, corresponding to the best PSF. Synthetic, as well as experimental images deconvolved with the proposed technique, outperform the final quality of the same ones treated with the original Richardson-Lucy algorithm for any number of iterations. The algorithm and ancillary procedures have been embedded into an ad hoc written Fortran 90 program capable to generate synthetic images and process them and the real ones.
Compact cameras that can capture RAW-format images are somewhat of an interesting bunch. They may not be priced at a premium like high-end point-and-shoots, yet they have the same advanced features such as manual exposure control.
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOScsandit
There is an increasing demand on identifying the sharp and the blur photos from a burst of series or a mass of collection. Subjective assessment on image blurriness takes account of not only pixel variation but also the region of interest and the scene type. It makes measuring image sharpness in line with visual perception very challenging. In this paper, we devise a noreference image sharpness metric, which combines a set of gradient-based features adept in estimating Gaussian blur, out-of-focus blur and motion blur respectively. We propose a datasetadaptive logistic regression to build the metric upon multiple datasets, where over half of the samples are realistic blurry photos. Cross validation confirms that our metric outperforms thestate- of-the-art methods on the datasets with a total of 1577 images. Moreover, our metric is very fast, suitable for parallelization, and has the potential of running on mobile or embedded devices.
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkCSCJournals
Image denoising is an interesting inverse problem. By denoising we mean finding a clean image, given a noisy one. In this paper, we propose a novel image denoising technique based on the generalized k density model as an extension to the probabilistic framework for solving image denoising problem. The approach is based on using overcomplete basis dictionary for sparsely representing the image under interest. To learn the overcomplete basis, we used the generalized k density model based ICA. The learned dictionary used after that for denoising speech signals and other images. Experimental results confirm the effectiveness of the proposed method for image denoising. The comparison with other denoising methods is also made and it is shown that the proposed method produces the best denoising effect.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Radar reflectance model for the extraction of height from shape from shading ...eSAT Journals
Abstract
The shape-from-shading (SFS) technique deals with the recovery of shape of an object through a gradual variation of shading
encoded in the image. Most SFS approaches have assumed Lambertian surface to extract DEM from individual images. The
quality of the derived DEM from radar SFS in particular, depends on the appropriate radar reflectance model, which relates the
radar backscatter to the surface normal.This paper will focus on a new reflectance model for relatingthe radar SAR backscatter
coefficient values to surface normal orientation. An iterative minimization SFS algorithm was implemented using this radar
reflectance model to derive the height measurements.The most important key of derivation of the surface height using this model is
forward and inverse Fast Fourier Transform (FFT). The model performance was evaluated on RADARSAT-1 image using both
graphical and statistical analysis. Root mean square error (RMSE) and coefficient of determination (R2) were used as evaluation
criteria for the model performance. The model has shown good performance in reconstructing surface heights from RADARSAT-1
imagery. It gave 17.47m and 97.2% for RMSE and R2, respectively.
Keywords:3-D, SFS, Remote Sensing, Radar Remote Sensing, Satellite Images, SAR Imageries.
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight Portraits for Background Replacement
Abstract:
Given a portrait and an arbitrary high dynamic range lighting environment, our framework uses machine learning to composite the subject into a new scene, while accurately modeling their appearance in the target illumination condition. We estimate a high quality alpha matte, foreground element, albedo map, and surface normals, and we propose a novel, per-pixel lighting representation within a deep learning framework.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
(Prefer mailing. Call in emergency )
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
help.mbaassignments@gmail.com
or
call us at : 08263069601
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Research 101 - Paper Writing with LaTeXJia-Bin Huang
Paper Writing with LaTeX
PDF: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pdf
PPTX: https://filebox.ece.vt.edu/~jbhuang/slides/Research%20101%20-%20Paper%20Writing%20with%20LaTeX.pptx
Computer vision techniques can be seen in various aspects in our daily life with tremendous impacts. This slides aim at introducing basic concepts of computer vision and applications for the general public.
Download link: https://uofi.box.com/shared/static/24vy7aule67o4g6djr83hzurf5a9lfp6.pptx
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang
Self-similarity based super-resolution (SR) algorithms are able to produce visually pleasing results without extensive training on external databases. Such algorithms exploit the statistical prior that patches in a natural image tend to recur within and across scales of the same image. However, the internal dictionary obtained from the given image may not always be sufficiently expressive to cover the textural appearance variations in the scene. In this paper, we extend self-similarity based SR to overcome this drawback. We expand the internal patch search space by allowing geometric variations. We do so by explicitly localizing planes in the scene and using the detected perspective geometry to guide the patch search process. We also incorporate additional affine transformations to accommodate local shape variations. We propose a compositional model to simultaneously handle both types of transformations. We extensively evaluate the performance in both urban and natural scenes. Even without using any external training databases, we achieve significantly superior results on urban scenes, while maintaining comparable performance on natural scenes as other state-of-the-art SR algorithms.
http://bit.ly/selfexemplarsr
General principles and tricks for writing fast MATLAB code.
Powerpoint slides: https://uofi.box.com/shared/static/yg4ry6s1c9qamsvk6sk7cdbzbmn2z7b8.pptx
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Jia-Bin Huang
We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.
Project page: https://sites.google.com/site/jbhuang0604/publications/struct_completion
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Jia-Bin Huang
Jia-Bin Huang, Qin Cai, Zicheng Liu, Narendra Ahuja, and Zhengyou Zhang
Towards Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning From Simulation
Proceedings of ACM Symposium on Eye Tracking Research & Applications (ETRA), 2014
ETRA 2014 Best Paper Award
In this paper, we describe a new interactive image completion system that allows users to easily specify various forms of mid-level structures in the image. Our system supports the specification of four basic symmetric types: reflection, translation, rotation, and glide. The user inputs are automatically converted into guidance maps that encode
possible candidate shifts and, indirectly, local transformations of rotation and scale. These guidance maps are used in conjunction with a color matching cost for image
completion. We show that our system is capable of handling a variety of challenging examples.
http://www.jiabinhuang.com/
Here is my updated CV using the ModernCV template (http://www.latextemplates.com/template/moderncv-cv-and-cover-letter).
You can find the Tex source file in (https://dl.dropbox.com/u/2810224/Homepage/resume/modern%20style.rar)
Real-Time Face Detection, Tracking, and Attributes Recognition
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
1. Moving Cast Shadow Detection using Physics-based Features
Jia-Bin Huang and Chu-Song Chen
Institute of Information Science, Academia Sinica, Taipei, Taiwan
jbhuang0604@gmail.com, song@iis.sinica.edu.tw
Abstract ground regions is reduced. Hence, shadow points are ex-
pected to have lower luminance but similar chromaticity
Cast shadows induced by moving objects often cause se- values.
rious problems to many vision applications. We present in There have been many works dedicated to detecting
this paper an online statistical learning approach to model cast shadows. Most of them are based on the assumption
the background appearance variations under cast shadows. that shadow pixels should have lower luminance and the
Based on the bi-illuminant (i.e. direct light sources and am- same chrominance as the corresponding background (i.e.
bient illumination) dichromatic reflection model, we derive the RGB values of shadow pixels will fall on the line be-
physics-based color features under the assumptions of con- tween the illuminated value and the origin in the RGB color
stant ambient illumination and light sources with common space). This linear attenuation property has been employed
spectral power distributions. We first use one Gaussian in different colors spaces like RGB [2], HSV [1], YUV [12],
Mixture Model (GMM) to learn the color features, which and c1 c2 c3 [11]. Besides, other shadow-induced features
are constant regardless of the background surfaces or il- like edge or gradient information extracted from the spatial
luminant colors in a scene. Then, we build up one pixel- domain have also been used to detect cast shadows [14, 16].
based GMM for each pixel to learn the local shadow fea- The major limitation of these algorithms is that they often
tures. To overcome the slow convergence rate in the con- require explicit tuning of a large set of parameters for each
ventional GMM learning, we update the pixel-based GMMs new scene. Thus, they are inappropriate for on-line applica-
through confidence-rated learning. The proposed method tions.
can rapidly learn model parameters in an unsupervised
To adapt to environment changes, statistical learning-
way and adapt to illumination conditions or environment
based approaches have been developed to learn and remove
changes. Furthermore, we demonstrate that our method is
cast shadows [9, 5, 4]. However, the linear proportionality
robust to scenes with few foreground activities and videos
assumption may not always hold in a real-world environ-
captured at low or unsteady frame rates.
ment. For instance, in an outdoor scene, the light sources
may consist of direct sunlight, diffused light scattered by the
sky, and other colored light from nearby surfaces (i.e. color
1. Introduction bleeding). These light sources may have different spectral
Extracting moving objects from video sequences is at the power distributions (SPDs). Therefore, the RGB values of
core of various vision applications, including visual surveil- a shadow pixel may not attenuate linearly.
lance, contend-based video coding, and human-computer Little attention has been paid to the non-proportionality
interaction, etc. One of the most challenging problems of attenuation problem before. Nadami and Bhanu [8] ad-
extracting moving objects is detecting and removing mov- dressed the non-linearity by using a dichromatic reflection
ing cast shadows. When performing background subtrac- model to account for both the sun and the sky illumina-
tion, cast shadows are often misclassified as parts of fore- tions in an outdoor environment. Recently, a more general
ground objects, distorting the estimation of shape and color shadow model was presented in [6], which introduced an
properties of target objects. The distortion caused by cast ambient illumination term that determines the direction in
shadows may hinder subsequent vision algorithms, such as the RGB color space along which the shaded background
tracking and recognition. values can be found. Since the ambient term may have a
Cast shadows are caused by the occlusion of light different SPD from the incident light sources, the values of
sources. When foreground objects cast shadows on back- shadow pixels may not decrease proportionally. Nonpara-
ground surfaces, the light sources are partially or entirely metric density estimation was used to model surface varia-
blocked, and thus the total energy incident at the back- tion under cast shadows in an unsupervised way. By provid-
2. ing a better description of cast shadows, the shadow model 2. Physics-Based Shadow Model
in [6] provided improved performance over the previous ap-
proaches which used linear models. 2.1. Bi-illuminant Dichromatic Reflection Model
There are three terms in the Shafer’s model [13]: body
However, these learning-based approaches [9, 5, 4, 6] reflection, surface reflection, and a constant ambient term.
may suffer from insufficient training samples since the sta- Each of the two reflection types can be decomposed into
tistical models are learned from background surface varia- chromatic and achromatic parts: 1) composition: a relative
tion under cast shadows. Unlike obtaining samples in ev- SPD cb or cs which depends only on wavelength, 2) mag-
ery frame in background modeling, shadows may not ap- nitude: a geometric scale factor mb or ms which depends
pear at the same pixel in each frame. A single pixel should only on geometry. Given a scene geometry, the radiance in
be shadowed many times till its estimated parameters con- the direction (θe , φe ) can be expressed as
verge, while the illumination conditions should be stable.
I(θe , φe , λ) = mb (θe , φe )cb (λ)+ms (θe , φe )cs (λ)+cd (λ),
Therefore, this kind of pixel-based shadow models require
(1)
a longer period of training time when foreground activities
where cd is the constant ambient term.
are rare. This problem becomes more serious when video
While this model included a term to account for ambi-
sequences are captured at a low or unsteady frame rate that
ent illumination, the model did not separate it into body and
depends on the transmission conditions.
surface reflection. Recently, Maxwell et al. [7] proposed
the BIDR model, which contains four terms: two types of
In this paper, we characterize cast shadows with “global” reflection for both the direct light sources and ambient illu-
parameters for a scene. Based on the bi-illuminant dichro- mination. Then, the BIDR model is of the form:
matic reflection model (BIDR) [7], we first derive nor-
malized spectral ratio as our color features under the as- I(θe , φe , λ) = mb (θe , φe , θi , φi )cb (λ)ld (θL , φL , λ) (2)
sumptions of constant ambient illumination and direct light +ms (θe , φe , θi , φi )cs (λ)ld (θL , φL , λ)
sources with a common SPD. The normalized spectral ra-
tio remains constant regardless of different background sur- +cb (λ) mb (θe , φe , θi , φi )la (θL , φL , λ)dθi dφi
θi ,φi
faces and illumination conditions. We then model the color
features extracted from all moving pixels using a single +cs (λ) ms (θe , φe , θi , φi )la (θL , φL , λ)dθi dφi ,
Gaussian Mixture Model (GMM). To further improve the θi ,φi
differentiating ability for cast shadows having similar col-
ors to background, we use a pixel-based GMM to describe where (θL , φL ) is the direction of the direct light source
the gradient intensity distortion for each pixel. We update relative to the local surface normal, and (θe , φe ) and (θi , φi )
the pixel-based GMMs using the confidence predicted from are the angles of emittance and incidence, respectively. In
the global GMM through confidence-rated learning to ac- this reflection model, the range of mb , ms , cb , and cs all
celerate convergence rates. Contributions are presented in lie in [0, 1] (we refer reader to [7] for further details of the
two key aspects. Firstly, with the global shadow model deviation of BIDR model.)
learned from physics-based features, our approach does not With a specific geometry and representing the two am-
require numerous foreground activities or high frame rates bient integrals as Mab (λ) and Mas (λ), we can simplify the
to learn the shadow model parameters. This makes the pro- BIDR model to
posed method more practical than existing works using only
I(λ) = cb (λ)[mb ld (λ) + Mab (λ)] (3)
pixel-based models. Secondly, the proposed confidence-
rated learning can be used for fast learning of local features +cs (λ)[ms ld (λ) + Mas (λ)].
in pixel-based models. We provide a principled scheme for
the local and global features to collaborate with each other. Considering only matte surfaces, we can ignore the lat-
ter part of (3). To describe the appearance of cast shadows
on a background surface, we multiply the direct illumina-
The remainder of this paper is organized as follows. tion with an attenuation factor α ∈ [0, 1], which indicates
We briefly describe in Section 2 the dichromatic reflection the unoccluded proportion of the direct light. We assume
model [13] and its extension BIDR. In Section 3, we present that all direct light sources have a common SPD with dif-
the proposed learning approach. The posterior probability ferent power factor and the ambient illumination is constant
of cast shadows and foreground are developed in Section 4. over lit and shaded regions (see Fig. 1). This gives us the
Both visual and quantitative results are shown in Section 5 simplified form of the BIDR model
to verify the performance of our method and the robustness
to few foreground activities. Section 6 concludes this paper. I(λ) = αmb cb (λ)ld (λ) + cb (λ)Mab (λ). (4)
3. (a) (e)
Figure 1. The contribution of all direct light sources and ambient
illuminance. The shadow values SD are expected to be observed
along the line between background value BG and the constant
ambient term BGA . Note that the shadow values do not necessary
to be proportional to the direction of background values. (b) (f)
The camera sensor response gi at the pixel level
can be obtained through the spectral projection gi =
Fi (λ)I(λ)dλ, where Fi (λ), i ∈ {R, G, B} is the sensor
spectral sensitivities and λ denotes the wavelength. By ap-
(c) (g)
plying the linearity of spectral projection, we have
gi = αFi mb ci ld + Fi ci Mab , i ∈ {R, G, B}
b
i
b
i
(5)
The formulation of gi defines a line segment in the RGB
color space varied between two ends: shadowed pixel (α =
0) to fully lit pixel (α = 1).
(d) (h)
2.2. Extracting Useful Features Figure 2. The color feature value distribution in various environ-
ments and illumination conditions. (a)-(d) Frame with cast shad-
2.2.1 Spectral Ratio ows. (e)-(h) The corresponding feature value distribution in the
To extract color features that are constant and independent S1 S2 S3 space. Note that the feature values extracted from differ-
ent background surface generally follow a line.
to different background surfaces, we need to identify mea-
surements that are invariant to illumination attenuation fac-
tor α, geometry shading factor mb , and the chromatic aspect To address this problem, we introduce γ =
of body reflection ci . We calculate the ratio of illuminants
b [γR , γG , γB ]T by subtracting each element of S by
to be the spectral ratio S = [SR , SG , SB ]T using α
1−α :
i
α Mab
SDi αFi mb ci ld + Fi ci Mab
b
i
b
i
γi = Si − = . (7)
Si = = i li
(6) 1−α i
(1 − α)mb ld
BGi − SDi (1 − α)Fi mb cb d
α i
Mab Similarly, we can obtain the normalized spectral ratio γ with
ˆ
= + i
, i ∈ {R, G, B}. higher accuracy by factoring out (1−α)mb through normal-
1 − α (1 − α)mb ld
ization:
i
If the shaded regions received only the ambient illumination Mab 1
γi =
ˆ i |γ|
(8)
(i.e. all direct light sources are blocked: α = 0), the first (1 − α)mb ld
Mi
term in (6) disappears and Si = mbab . We can then derive
i R G B
ld 1 Mab 2 Mab Mab
features invariant to mb by normalizing Si with its length |γ| = ( R
) + ( G )2 + ( B )2 (9)
(1 − α)mb ld ld ld
|S| since the mb term can be extracted from the normaliza-
tion constant. However, this assumption does not hold in We validate the proposed physics-based color feature by
real-world environments. Take an indoor scene as an exam- observing its distributions from cast shadows in various en-
ple, where there are usually multiple light sources. When a vironments and illumination conditions. Fig. 2 (a) and its
foreground object occludes one or some of the light sources, reference background images are by courtesy of Maxwell
there is still energy from the remaining light sources inci- et al. [7]. Fig. 2 (b)-(d) show frames from the benchmark
dent to this surface. Consequently, assuming the attenua- sequences provided in the Prati et al.’s survey paper [10].
tion factor to be zero will induce bias in estimating the ratio We manually label shadow pixels in the given images, and
between two illuminants. then extract the spectral ratio Si , i ∈ {R, G, B} for each
4. pixel in shadow regions using the given image and the ref-
erence background models. The resultant feature distribu-
tions are presented in Fig. 2 (e)-(h). The feature values ex-
tracted from different background surfaces roughly follow
a straight line in the SR SG SB space. Therefore, we can
use the direction of the line as our color feature, which is
roughly the same for all shadow pixels. The direction of the
line in the SR SG SB space can be characterized with two
angles, the zenith and azimuth in the spherical coordinate,
which correspond to the normalized spectral ratio. From the
feature distributions in Fig. 2 (e)-(h), we also observe that
larger feature values tend to be unstable and deviate from Figure 3. The weak shadow detector. The observation will be con-
the major direction of most feature values. This is because sidered as potential shadow point if it falls into the gray area. The
there is not sufficient difference between the shaded and lit weak shadow detector contains three parameters: maximum al-
pixel value to robustly measure the orientation. In addi- lowed color shift, and minimal and maximal illumination attenua-
α
tion, the value of (1−α) in the scene can be easily estimated tion.
by intersecting the fitted line with the line passing through
(1,1,1) and the origin. the luminance values, the potential shadow values should
fall into the conic volume around the corresponding back-
2.2.2 Gradient Intensity Distortion ground color. The weak shadow detector is illustrated in Fig
3, where values of cast shadows are expected to fall into the
Now we have derived color features that are invariant to dif-
gray conic region. Pixel values that fall into the gray conic
ferent background surfaces. This low dimensional (2D) fea-
region are considered as potential shadow samples. These
tures, however, might fail to distinguish foreground with
samples are then used to learn the global shadow model for
colors similar to background from cast shadows. Thus,
the scene and the local shadow model for each pixel.
other shadow-induced properties like edge or gradient in-
formation may be used to further describe the background
appearance variation under cast shadows. In this paper, we 3.2. Global Shadow Model
just use a simple gradient intensity distortion as our local Using the background surface invariant color features, a
features to demonstrate the improvement by incorporating global shadow model is learned for the whole scene. Here,
additional local features. we model the background color information by the well-
For a given pixel p, we define the gradient intensity dis- known GMM [15] in the RGB color space. For every frame,
tortion ωp as we obtain potential shadow points by applying the weak
shadow detector on moving pixels, which are identified via
ωp = | (BG)p | − | (F )p |, (10)
background subtraction. We then use one GMM to learn
where BG and F are luminance channels of the background the normalized spectral ratio r = [γˆ , γˆ ]T in the scene.
ˆ R G
image and current frame, and (·) is the gradient operator. Note that the reason why we only use two of the three di-
mensional features is that the third component is redundant
since γˆ 2 + γˆ 2 + γˆ 2 = 1. The normalized spectral ratio
R G B
3. Learning Cast Shadows in the whole scene is modeled by K Gaussian distributions
In this section, we show how to build models for cast with mean vector µk and full covariance matrix Σk . Then,
shadows in an unsupervised way. Here, we use GMM to the probability of the normalized spectral ratio r is given
ˆ
learn the background surface variation over time. It is also by:
possible to use other statistical learning method such as ker- K
nel density estimation. p(ˆ|µ, Σ) =
r πk Gk (ˆ, µk , Σk ),
r (11)
k=1
3.1. Weak Shadow Detector
where µ, Σ denote all parameters of the K Gaussians, πk is
To model the cast shadows, impossible shadow sam- the mixing weight, and Gk is the kth Gaussian probability
ples that belong to background or foreground (e.g. color distribution. We use the Expectation-Maximization (EM)
values that are the same as or brighter than background algorithm to estimate the parameters in the GMM. The es-
values) should be excluded. Therefore, we apply a weak timated parameters in the current frame are propagated to
shadow detector that evaluates every moving pixel to filter next frame, so that the EM algorithm can converge quickly.
out some impossible samples. Since cast shadows reduce Since the light sources are usually stable in the scene, we
5. α
find that it is sufficient for the estimated parameters to con- obtained by subtracting 1−α from Si . From the feature dis-
verge with a single EM iteration at each frame. tribution of S, our aim is to find the location of point t such
that t passes through both the lines passing through the ori-
3.3. Local Shadow Model gin with direction vector (1,1,1) and the line that fits the
observations S. This estimation can be achieved using the
Besides using color features in the global shadow model,
robust fitting method that is less sensitive than ordinary least
we build GMM for each pixel to learn the gradient intensity
squares to large changes (outliers). In addition, we perform
distortion under cast shadows similar to background mod- α
the recursive linear regression to update the estimated 1−α
eling [15]. For a given pixel p, its gradient feature value ωp
value adaptively. For simplicity, the attenuation factor α is
is sampled and learned whenever it is a potential shadow
assumed the same for every pixel.
pixel. However, as we mentioned before, pixel-based mod-
els often suffer from insufficient training data because the
samples are not available at the pixel every frame. 4. Cast Shadow and Foreground Posterior
To address this problem, we adopt the confidence-rated Probabilities
learning to improve the convergence rate of the local model
In this section, we present how to derive the posterior
parameters. The basic idea is that each sample is weighted
probabilities of cast shadows and foreground given the ob-
with different importance computed from the global shadow
served sample xp in the RGB color space by using the pro-
model. For example, if we update the model using a poten-
posed global and local shadow models.
tial shadow point whose color features matched with the
global shadow model, then we think this sample is rela- 4.1. Cast Shadow Posterior
tively more important than others. In this way, the learning
process of the local shadow model is guided by the global The shadow posterior is first computed by decomposing
shadow model. P (SD|xp )) over the (BG, FS) domain, where F S indicates
moving pixels (real foreground and cast shadows). Since
3.4. Confidence-Rated Gaussian Mixture Learning P (SD|xp , BG) = 0, the decomposition gives
We present an effective Gaussian mixture learning algo- P (SD|xp ) = P (SD|xp , F S)P (F S|xp ), (14)
rithm to overcome some drawbacks in conventional GMM
learning approach. Let ρπ and ρG be the learning rates for where P (F S|xp ) = 1 − P (BG|xp ) can be di-
the mixing weight and the Gaussian parameters (means and rectly computed from the background model. Sec-
covariances) in the local shadow model, respectively. The ond, we remove pixels that are definitely foreground
updating scheme follows the the formulation of the combi- (i.e. pixels that are rejected by the weak shadow de-
nation of incremental EM learning and recursive filter [3]: tector) and consider only potential shadow points (PS):
P (SD|xp , F S) = P (SD|xp , F S, P S). Then, we decom-
1 − ρdef ault pose P (SD|xp , F S, P S) into two parts: Na , and Nna ,
ρπ = C(ˆ ) ∗ (
γ K
) + ρdef ault (12)
j=1 cj which stand for color features that are associated with the
1 − ρdef ault normalized spectral ratio or not, respectively. If the color
ρG = C(ˆ ) ∗ (
γ ) + ρdef ault , (13) features do not associate with the working states of the
ck
GMM, then the probability of belonging to shadow equals
where ck is the number of matches of the kth Gaussian state, to zero. Therefore, we have
and ρdef ault is a small constant, which is 0.005 in our ex-
periments. The two types of learning rates are controlled P (SD|xp , F S, P S) = P (SD|xp , F S, P S, Na )∗ (15)
by a confidence value C(ˆ ), which indicates how confident
γ P (Na |xp , F S, P S)
the sample belongs to the shadow. Observations with higher
confidence will then converge faster than those with lower Here, the gradient intensity distortion ωp and the color fea-
confidence. ture γp are the sufficient statistics for xp in the first and sec-
ˆ
ond part of (15). The posterior probability of cast shadow
3.5. Attenuation Factor Estimation can thus be computed.
From the RGB values we observed in the current frame
4.2. Foreground Posterior
and the reference background image, we can only compute
the value of S, which may introduce bias in estimating nor- Computing foreground posterior probability is much eas-
malized spectral ratios. Therefore, the estimation of atten- ier. Given a pixel p, we first compute the background pos-
uation factor α is required for accurate shadow modeling. terior P (BG|xp ) from the background model. Then, we
We can see that in (7) the value of γi , i ∈ {R, G, B} is can obtain shadow posterior probability P (SD|xp ) using
6. the learned shadow models. Based on the probability the- 5. Experimental Results
ory, the foreground posterior can be obtained as:
We present the visual results from challenging video se-
P (F G|xp ) = 1 − P (BG|xp ) − P (SD|xp ). (16) quences captured in various environments, including both
indoor and outdoor scenes. We also compare the quantita-
tive accuracy of the proposed method in several videos with
4.3. Summary
other approaches when the results are available. Previous
Algorithms 1 and 2 summarize in pseudocode the learn- approaches using statistical model have higher success rate
ing and detection processes of the proposed algorithm. Our in detecting cast shadow when numerous foreground activi-
method can be attached to other moving object detection ties are present. However, we show that our method can deal
programs as an independent module. Shadow detection pro- with the situation that cast shadows first appear in complex
cess is only applied to moving pixels detected by the back- scenes and unknown illumination conditions as well as rare
ground model and the learning process occurs only when foreground activity.
these moving pixels are considered as shadow candidates.
Consequently, the proposed algorithm is practical, it does 5.1. Qualitative Results
not introduce heavy computational burden and can work ef-
fectively to detect shadows. In Figure 4, we show sample cast shadow detection re-
sults from four video sequences. The first three sequence:
Laboratory, Intelligent Room, and Highway I are part of the
Algorithm 1: Learning Process
benchmark sequences for validating shadow detection al-
At time t, gorithm. The last one Hallway is taken from [6]. Figure
for each pixel p in the frame do 4 (a) shows one frame selected from the video, where cast
if P (BG|xt (p)) < 0.5 then shadows are present in the scene. The background poste-
if pixel p satisfies shadow property then rior probability is presented in Figure 4(b), where the dark
-Compute normalized spectral ratio γp
ˆ region indicates the less probability of belonging to back-
-Compute gradient intensity distortion ωp ground. From Figure 4(c)(d), we show the confidence map
-Update local shadow model at pixel p of global shadow model and the posterior probability of cast
using confidence value C(ˆ ) through
γ shadows, respectively. In Fig. 4(e) we show the probability
confidence-rated learning values of belonging to foreground objects. We can see that
end in these video sequences, the proposed algorithm is capa-
end ble of detecting cast shadows without misclassifying fore-
end ground as shadows. Note that in the second video, Intelli-
Run one EM iteration to estimate the parameters of gent Room, the man just walks in the room by once. Thus,
global shadow model using the collected color there is no chance for the pixel-based shadow model to learn
features. its parameters. The use global shadow model enables us to
detect shadows first appear in the scene.
To verify the effectiveness of the proposed method, the
results presented here are raw data and without any post-
Algorithm 2: Detection Process processing. We can obtain binary results simply with
At time t, thresholding the foreground posterior values P (F G|xp ).
for each pixel p ∈ P do The posterior probabilities can also be incorporated with
-Obtain background posterior P (BG|xp ) from context model that use spatial and temporal coherence to
background modeling improve the segmentation accuracy.
-Compute shadow posterior P (SD|xp )(eq. 14)
-Compute foreground posterior P (F G|xp )(eq. 16) 5.2. Quantitative Results
if P (F G|xp ) > P (SD|xp )&P (F G|xp ) >
P (BG|xp ) then The quantitative evaluation follows the method proposed
Label pixel p as foreground by Prati et al. [10]. There are two defined metrics for eval-
else uating the performance of cast shadow detection algorithm:
Label pixel p as background shadow detection rate η and shadow discrimination rate ξ.
end The formulations of the two metrics are as follows:
end
T PS T PF
η= ;ξ = , (17)
T P S + F NS T PF + F NF
7. (a) (b) (c) (d) (e)
Figure 4. Sample visual results of detecting cast shadows in various environment. (a) Frame from video sequence. (b) Background posterior
probability P (BG|xp ). (c) Confidence map predicted by the global shadow model. (d) The shadow posterior probability P (SD|xp ). (e)
The foreground posterior probability P (F G|xp ).
(a) (b) (c)
(d) (e) (f) Figure 6. Quantitative results of the Intelligent Room sequence.
Figure 5. The effect of confidence-rated Gaussian mixture learn- Shadow detection and shadow discriminative rate are calculated
ing. The mean maps of local shadow model are taken at the 100th under different frame rates settings.
frame (the first row) and 1000th frame (the second row) (a)(d)The
background image. (b)(e) The mean map of the most important
Gaussian in the mixture with confidence-rated learning. (c)(f)The 5.3. Fast Learning of local Shadow Model
mean map without using confidence-rated learning.
We demonstrate the effect of using confidence-rated
learning in Figure 5. In this experiment, we learn the local
shadow model in two traffic scenes: Highway I and High-
way II. Figure 5(a)(d) show the background model of these
where the subscript S stands for shadow and F for fore- two outdoor scene. With confidence-rated learning, we ob-
ground, and T P and F N denote true positive and false neg- tain the mean value of the most important Gaussian (i.e.
ative, respectively. The T PF is the number of ground-truth with highest mixing weight and smallest variance) in Fig-
points of the foreground objects minus the number of points ures 5(b)(e). We can see that the local models of gradient
detected as shadows, but belonging to foreground objects. intensity distortion under cast shadows are well constructed.
On the other hand, if we learn the local shadow model
We show the quantitative results in Table 1. Note that following conventional Gaussian mixture learning method,
results of other’s approaches are taken directly from [6][4]. then we obtain the results in Figure 5(c)(f), in which the
8. Table 1. Quantitative results on surveillance sequences
Sequence Highway I Highway II Hallway
Method η% ξ% η% ξ% η% ξ%
Proposed 70.83 82.37 76.50 74.51 82.05 90.47
Kernel [6] 70.50 84.40 68.40 71.20 72.40 86.70
LGf [4] 72.10 79.70 - - - -
GMSM [5] 63.30 71.30 58.51 44.40 60.50 87.00
models are still not built due to the long training time and Trans. PAMI, 25(10):1337–1342, 2003. 1
disturbance by foreground objects. [2] T. Horprasert, D. Harwood, and L. S. Davis. A statistical
approach for real-time robust background subtraction and
5.4. Handling Scene with Few Foreground Activities shadow detection. In ICCV Frame-rate Workshop, 1999. 1
[3] D. S. Lee. Effective gaussian mixture learning for video
Here we use the benchmark sequence, Intelligent Room, background subtraction. IEEE Trans. PAMI, 27(5):827–832,
to demonstrate the robustness of our approach to videos 2005. 5
captured at low frame rates and scenes with few foreground [4] Z. Liu, K. Huang, T. Tan, and L. Wang. Cast shadow removal
activities. Downsampled sequences with lower frame rates combining local and global features. In CVPR, pages 1–8,
are obtained by taking a sample from the original image se- 2007. 1, 2, 7, 8
quence for every M ∈ {2, 3, 4} samples. Thus, we have [5] N. Martel-Brisson and A. Zaccarin. Learning and removing
sequences with 10, 5, 3.33, and 2.5 frame rates, respec- cast shadows through a multidistribution approach. IEEE
tively. Figure 6 shows the quantitative results on these se- Trans. PAMI, 29(7):1133–1146, 2007. 1, 2, 8
quences. We can see that the performance on sequences [6] N. Martel-Brisson and A. Zaccarin. Kernel-based learning
lower frame rates degraded slightly, demonstrating that our of cast shadows from a physical model of light sources and
approach can still learn and removing cast shadows even in surfaces for low-level segmentation. In CVPR, June. 2008.
such low frame rates. 1, 2, 6, 7, 8
[7] B. Maxwell, R. Friedhoff, and C. Smith. A bi-illuminant
dichromatic reflection model for understanding images. In
6. Conclusion CVPR, pages 1–8, June 2008. 2, 3
In this paper, we have presented a novel algorithm capa- [8] S. Nadimi and B. Bhanu. Physical models for moving
ble of detecting cast shadows in various scenes. Qualitative shadow and object detection in video. IEEE Trans. PAMI,
and quantitative evaluation of the physics-based shadow 26(8):1079–1087, 2004. 1
model validated that our approach is more effective in de- [9] F. Porikli and J. Thornton. Shadow flow: a recursive method
to learn moving cast shadows. In ICCV, pages 891–898 Vol.
scribing background surface variation under cast shadows.
1, Oct. 2005. 1, 2
The physics-based color features can be used to learn a
[10] A. Prati, I. Mikic, M. Trivedi, and R. Cucchiara. Detecting
global shadow model for a scene. Therefore, our method
moving shadows: algorithms and evaluation. IEEE Trans.
does not suffer from the problem of insufficient training PAMI, 25(7):918–923, 2003. 3, 6
data as in pixel-based shadow models. Moreover, with the [11] E. Salvador, A. Cavallaro, and T. Ebrahimi. Cast shadow seg-
aid of the global shadow model, we can update the local mentation using invariant color features. CVIU, 95(2):238–
shadow models through confidence-rated learning, which is 259, 2004. 1
significantly faster than conventional online updating. To [12] O. Schreer, I. Feldmann, U. Golz, and P. A. Kauff. Fast and
further improve the detection accuracy, more discriminative robust shadow detection in videoconference applications. In
features or the spatial and temporal smoothness constraints IEEE Int’l Symp. Video/Image Processing and Multimedia
can also be incorporated into the detection process in the Communications, pages 371–375, 2002. 1
future. [13] S. Shafer. Using Color to Separate Reflection Components.
Color Research Applications, pages 210–218, 1985. 2
Acknowledgement [14] J. Stander, R. Mech, and J. Ostermann. Detection of moving
cast shadows for object segmentation. IEEE Trans. Multime-
This work was supported in part by the National Science dia, 1(1):65–76, Mar 1999. 1
Council of Taiwan, R.O.C., under Grant NSC95-2221-E- [15] C. Stauffer and W. E. L. Grimson. Adaptive background mix-
001-028-MY3. ture models for real-time tracking. In Proc. CVPR, volume 2,
pages –252, 1999. 4, 5
References [16] W. Zhang, X. Fang, X. Yang, and Q. M. J. Wu. Moving cast
shadows detection using ratio edge. IEEE Trans. Multime-
[1] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati. Detecting dia, 9(6):1202–1214, 2007. 1
moving objects, ghosts, and shadows in video streams. IEEE