Talk slides imsct2016

Nonparametric smooth estimators for probability density
function for circular data
Yogendra P. Chaubey
Department of Mathematics and Statistics
Concordia University, Montreal, Canada H3G 1M8
E-mail:yogen.chaubey@concordia.ca
Talk to be presented at the International Conference on
Interdisciplinary Mathematics, Statistics and
Computational Techniques (IMSCT2016-FIMXXV), Manipal
University, Jaipur (India), December 22-24, 2016
Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 1 / 74

Abstract
In this talk we provide a short review for smooth estimation of density and
distribution functions for circular data. It has been shown that the usual
kernel density estimator used for linear data may not be appropriate in the
context of circular data. Fisher (1989: J. Structural Geology, 11, 775-778)
presents an adaptation of the linear kernel estimator, however, better
alternatives are now available based on circular kernels; see e.g. Di Marzio,
Panzera, and Taylor, 2009: Statistics & Probability Letters, 79(19),
2066-2075. In this talk I use a simple approximation theory to motivate
the circular kernel density estimation and further explore the usefulness of
the wrapped Cauchy kernel in this context. It is seen that the wrapped
Cauchy kernel appears as a natural candidate in connection to orthogonal
series density estimation on a unit circle. In the literature the use of von
Mises circular kernel is investigated (see Taylor, 2008: Computational
Statistics & Data Analysis, 52(7), 3493-3500), that requires numerical
computation of Bessel function. On the other hand, the wrapped Cauchy
kernel is much simpler to use. This adds further weight to the considerable
role of the wrapped Cauchy distribution in circular statistics.

1 Introduction
2 Motivation for the Circular Kernel Density Estimator
3 Alternative Circular Density Estimators
Transformation Based Kernel Density Estimators
Approximation Methods by Orthogonal Functions
Orthogonal Series of Cosine Functions
Fourier Series Expansion
Orthogonal Polynomials on Sub-intervals
Chebyshev Polynomials
Legendre Polynomials
4 Density Estimator derived from Smooth Estimator of the Distribution Function
Inverse Steriographic Projection of Bernestein Polynomial Estimator
5 A Connection Between the Circular Kernel Density Estimator and the Orthogonal
Series
Some Preliminary Results from Complex Analysis
Orthogonal Series on Circle
6 Examples
Example 1 - Turtle Directions
Example 2 - Movements of Ants

Introduction
Given an i.i.d. d−dimensional random sample {X1, ..., Xn} from a
continuous DF F with density f, the Parzen-Rosenblatt kernel density
estimator is given by
˜f(x; h) ≡ n−1
h−d
n
i=1
K
x − Xi
h
. (1.1)
h is known as the window-width or band-width and K is called the
kernel function. The band-width h typically tends to 0 as the sample
size n tends to inﬁnity and K is typically a symmetric density
function centered around zero with unit variance.
The motivation for this estimator as originally put forward by
Rosenblatt (1956) comes from the approximation
f(x) ≈
Pr[X ∈ ∆x]
Vx
. (1.2)

Introduction
∆x : A hypercube centered around x and Vx is the volume of the
region ∆x.
The region ∆x is moved around all the observations and the
probabilities are computed by the proportions, subsequently the
approximated values on the right hand side are averaged over all the
observations.
The generalization of this idea of using eﬀectively an uniform kernel
to symmetric probability distributions is in the background of
Rosenblatt’s (1956) proposal.
The proposal was further, independently, studied by Parzen (1962)
and popularized in many subsequent papers, and it is now well known
as Parzen-Rosenblatt kernel density estimator [see Prakasa Rao
(1983, Ch. 3) or Silverman (1986)].

Introduction
An alternative motivation is provided recently in Chaubey and Sen
(2002) using approximation theory, that provides the legitimacy of
asymmetric kernels while considering density estimation for
non-negative random variables.
Lemma 2.1 Consider a sequence {Φn(y, t)}∞
n=1 of distribution functions
in Rd for every fixed t ∈ Rd, such that for Yn ∼ Φn(·, t)
(i) EYn = t
(ii) vn(t) = max1≤i≤d V ar(Yin) → 0 as n → ∞, for every fixed t.
(iii) Φn(y, t) is continuous in t.
Define for any bounded continuous multivariate function u(t)
˜un(t) =
Rd
u(x)dΦn(x, t). (2.11)
Then ˜un(t) → u(t) as n → ∞ for t in any compact subset of Rd, the
convergence being uniform over any subset over which u(t) is uniformly
continuous. Furthermore, if the function u(t) is monotone, the
convergence holds uniformly over entire Rd.

Introduction
In cases where multimodal and/or asymmetric models may be
appropriate, semiparametric or nonparametric modelling may be
considered more appropriate. [18] and [35] considered
semi-parametric analysis based on mixture of circular normal and von
Mises distributions and [23], [2], [19], and [29] have considered kernel
density estimators for spherical data.
In what follows we consider estimation of the density for circular data,
i.e. an absolutely continuous (with respect to the Lebesgue measure)
circular density f(θ), θ ∈ [−π, π], i.e f(θ) is 2π−periodic,
f(θ) ≥ 0 for θ ∈ R and
π
−π
f(θ)dθ = 1. (1.3)
Given a random sample {θ1, ...θn}for the above density, the kernel
density estimator may be written as
˜f(x; h) =
1
nh
n
i=1
K
θ − θi
h
) . (1.4)

Introduction
Fisher (1989) proposed non-parametric density estimation for circular
data by adapting the linear kernel density estimator (1.1) with a
quartic kernel [see also Fisher (1993), §2.2 (iv) where an improvement
is suggested], deﬁned on [−1, 1], that is given by
K(θ) =
.9375(1 − θ2)2 for − 1 ≤ θ ≤ 1;
0 otherwise.
(1.5)
The factor .9375 insures that K(θ) is a density function. The data
must be transformed to the interval [−1, 1]. The values of the
smoothing constant h are proposed to be explored in the interval
(.25h0, 1.5h0), h0 being given by
h0 =
7
1
2 ˆζ
n
1
5
, (1.6)
where
ˆζ = 1/ˆκ, (1.7)

Introduction
ˆκ being the modiﬁed maximum Likelihood estimate of the
concentration coeﬃcient of the von Mises circular distribution
KvM (θ) =
1
2πI0(κ)
exp{κ cos θ}, − π ≤ θ ≤ π, (1.8)
( See Eqs. (4.40) and (4.41) of Fisher (1993).)
Assuming that the sample values of θ are in the interval [0, 2π), those
values of θi will only contribute to the sum in (??) such that
|θi−θ
h | < 1. We may remark here that in the algorithm described in
Fisher (1993), it is explicitly assumed that the data lies in the interval
[−1, 1]. Further, in general this method of smoothing does not
produce a periodic estimator, a property that is essential for a circular
distribution. Fisher (1993) suggested to perform the smoothing by
replicating the data to 3 to 4 cycles and consider the part in the
interval [−π, π]. This problem is easily circumvented by using circular
kernels.

Introduction
Di Marzio et al. (2011) considered nonparametric estimation of
Toroidal Density using Toroidal Kernels defined on a d−dimensional
Torous Td = [−π, π]d, that applies for the present case with d = 1.
The Toroidal kernels are defined as d-fold product
KC =
d
s=1
Kκs
where C represents the set of smoothing parameters {κ1, ..., κd} and Kκ is
real-valued function defined on T = [−π, π] with the following properties:
T1. It admits an uniformly convergent Fourier series
{1 + 2 ∞
j=1 γj(κ) cos(jθ)}/(2π), θ ∈ [−π, π], where γj(κ) is strictly
increasing.
T2. T Kκ(θ)dθ = 1, and if Kκ takes negative values, there exists
0 < M < ∞ such that for all κ > 0
T
|Kκ(θ)|dθ ≤ M.
T3. For all 0 < δ < π, limκ→∞ δ≤|θ|≤π |Kκ(θ)|dθ = 0Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 10 / 74

Introduction
The candidates for such a function include many well-known families
of circular distributions. The corresponding form of the kernel circular
density estimator is given by
˜f(θ; κ) =
1
n
n
i=1
J (θ − θi; κ)) . (1.9)
where J is a circular density with center 0 and concentration
parameter κ.
Taylor (2008) considered the von Misses circular normal distribution
with concentration parameter κ for J, that gives the estimator for f as
ˆfvM (θ; ρ) =
1
n
n
i=1
KvM (θ; θj, ρ), (1.10)
and discussed determination of the optimal data based choice for κ.

Introduction
In Section 2, I present a simple approximation theory motivation for
considering the circular kernel density estimator given in (1.9).
Next I demonstrate that the wrapped Cauchy kernel presents itself as
the kernel of choice by considering an estimation problem on the unit
circle. We also show that this approach leads to orthogonal series
density estimation, however no truncation of the series is required.
The wrapped Cauchy distribution with location parameter µ and
concentration parameter ρ is given by
KWC(θ; µ, ρ) =
1
2π
1 − ρ2
1 + ρ2 − 2ρ cos(θ − µ)
, −π ≤ θ < π, (1.11)
that becomes degenerate at θ = µ as ρ → 1. The estimator of f(θ)
based on the above kernel is given by
ˆfWC(θ; ρ) =
1
n
n
i=1
KWC(θ; θj, ρ). (1.12)

Introduction
In Section 3, ﬁrst we present some basic results from the literature on
orthogonal polynomials on the unit circle and then introduce the
strategy of estimating f(θ) by estimating an expectation of a speciﬁc
complex function, that in turn produces the non-parametric circular
kernel density estimator in (1.12).
Section 4 shows that the circular kernel density estimator is
equivalent to the orthogonal series estimation in a limiting sense.
This equivalence establishes a kind of qualitative superiority of the
kernel estimator over the orthogonal series estimator that requires the
series to be truncated, however the kernel estimator does not have
such a restriction.
In Section 5, we present some alternative approaches based on
transformations and provide a critical assessment. The last section
provides some examples.

Motivation for the Circular Kernel Density Estimator
The starting point of the nonparametric density estimation is the
theorem given below from approximation theory (see [33]). Before
giving the theorem we will need the following deﬁnition:
Deﬁnition
Let {Kn} ⊂ C∗ where C∗ denotes the set of periodic analytic functions
with a period 2π. We say that {Kn} is an approximate identity if
A. Kn(θ) ≥ 0 ∀ θ ∈ [−π, π];
B.
π
−π
Kn(θ) = 1;
C. limn→∞ max|θ|≥δ Kn(θ) = 0 for every δ > 0.

The deﬁnition above is motivated from the following theorem which is
similar to the one used in the theory of linear kernel estimation (see
[37]). Also, note that we have replaced Kn of [33]) by 2πKn without
changing the result of the theorem.
Theorem
Let f ∈ C∗, {Kn} be approximate identity and for n = 1, 2, ... set
f∗
(θ) =
π
−π
f(η)Kn(η − θ)dη. (2.1)
Then we have
lim
n→∞
sup
x∈[−π,π]
|f∗
(θ) − f(θ)| = 0. (2.2)

Note that taking the sequence of concentration coefficients ρ ≡ ρn
such that ρn → 1, the density function of the Wrapped Cauchy will
satisfy the conditions in the definition in place of Kn. In general Kn,
appearing in the above theorem may be replaced by a sequence of
periodic densities on [−π, π], that converge to a degenerate
distribution at θ = 0.
For a given random sample of θ1, ..., θN from the circular density f,
the Monte-Carlo estimate of f∗ is given by
˜f(θ) =
1
n
n
i=1
Kn(θi − θ). (2.3)
The kernel given by the wrapped Cauchy density satisfies the
assumptions in the above theorem that provides the estimator
proposed in (1.12).

This gives the motivation for considering circular kernels for
nonparametric density estimation for circular data as proposed in
discussed in a more detailed by Marzio MD, et al. (2009). However,
their development considers circular kernels of order r = 2 that
further requires
π
−π
sinj
(θ)Kn(θ)dθ = 0 for 0 < j < 2.
The circular kernel density estimator based on the wrapped Cauchy
weights is given by
˜fWC(θ) =
1
n
n
i=1
fWC(θi − θ). (2.4)
that may be considered more convenient in contrast to the von Mises
kernel due to the fact that it does not require computation of an
integral I0(κ).

The circular kernel estimators are also implicit in the estimator
proposed in Jammalmadaka and Sen (2001 , p.243) where the linear
distance θ − θi in the linear kernel density is replaced by angular
distance 1 − cos(θ − θi) resulting in the estimator
˜fJS(θ) =
1
nh
n
i=1
kn
1 − cos(θi − θ)
h
. (2.5)
Remark
One may seek to see if the simple Cardioid distribution given by
fC(θ) =
1
2π
(1 + ρ cos(θ)), 0 ≤ θ < 2π (2.6)
will be a candidate as circular weight function. One may note that in the
extreme case when ρ → 0, it becomes an uniform distribution, so the
quantity ρ is not really a concentration parameter.

Remark
There are various extensions of Cardioid distribution discussed by Abe
et al. (2009, 2010, 2013), however they loose the simplicity of the
original Cardioid distribution.
These extensions involve replacing cos(θ − µ) by
cos(θ − µ + ν sin(θ − µ)). These are unimodal only for |ν| ≤ 1 and
thus we may not get degenerate distribution as |ν| → 1.
For ν = 0 we get the original Cardioid and therefore even this
extension is of not much use. Also, it involves computation of Bessel
function, so it is no better than the von Mises distribution.

Remark
It can be, however, modiﬁed to a density
fEC(θ) =
2m−1
π 2m
m
(1 + cos(θ))m
, 0 ≤ θ < 2π (2.7)
that gets concentrated at zero as m → ∞. This has been considered
by E˘gecio˘glu and Srinivasan (2000) in the form
w(x) = cm(x) =
1
Am
cos2m
(x/2). (2.8)
where
Am =
π
22m−1
2m
m
. (2.9)
This can be seen to be of the form as in (2.7) by noting that
cos2
(x/2) =
1
2
(1 + cos x). (2.10)

Remark
Marzio et al. (2009) point out the applicability of the uniform kernel
on [− π
κ+1, π
κ+1), Dirichlet and Fej´er’s kernels given by
Dκ(θ) =
sin{(κ + 1
2)θ}
2π sin(θ/2)
, Fκ(θ) =
1
2π(κ + 1)
sin({κ + 1}θ/2)
sin(θ/2)
2
,
(2.11)
κ ∈ N.

Transformation Based Kernel Density Estimators
An alternative approach of adapting the linear kernel density
estimator (1.1) to circular data is to transform the interval [−π, π] to
(−∞, ∞). This may be achieved the stereographic projection, a
technique that has been used in the literature to introduce new
families of circular distributions (see e.g. Abe et. al. (2010)), that is
given by the transformation θ → x = tan(θ
2) ∈ (−∞, ∞). Let the
kernel density estimator given by the data on x be denoted by ˆg(x; h),
then the corresponding ˆf(θ) is given by
ˆf(θ; h) =
1
1 + cos(θ)
ˆg
sin θ
1 + cos θ
; h . (3.1)
An attractive feature of the above procedure in contrast to Fisher’s
adaptation of the linear method is that the latter method gives a
periodic estimator, however the former does not.

Here we will consider approximating continuous bounded functions
f(x) in a compact interval I = [a, b] ⊂ R. For a given nonnegative
function w(x) deﬁned on I, the L2 weighted norm of f(x) is deﬁned
as
f w
2 =
b
a
|f(x)|2
w(x)dx. (3.2)
The space of such functions will be denoted by Lw
2 . The general
method of approximation of functions f ∈ Lw
2 involves the set of
basis functions {ϕk(x)}∞
0 and a non-negative weight function w(x)
such that
< ϕk, ϕk >w=
b
a
ϕk(x)ϕk (x)w(x)dx =
0 for k = k
> 0 for k = k
(3.3)

Then for f ∈ Lw
2 the partial sum
fN (x) =
N
k=0
gkϕk(x), (3.4)
where
gk =
b
a
f(x)ϕk(x)w(x)dx, (3.5)
is considered to be the ‘best’ approximation in terms of the fact that
the coeﬃcients gk are such that ak = gk minimise
f − fN
w
2 =
b
a
|f(x)|2
w(x)dx. (3.6)
The original idea is attributed to ˇCencov (1962) that considered the
cosine basis
{ϕ0(x) = 1, ϕj(x) =
√
2 cos(πjx), j = 1, 2, ...}
and w(x) = 1.

In recent literature, many other type of basis functions including
trigonometric, polynomial, spline, wavelet and others have been
considered. The reader may refer to Devroy and Györfi (1985),
Efromvich (1999), Hart (1997), Walter (1994) for a discussion of
different bases and their properties.
Efromvich (2010) presents an extensive overview of density estimation
by orthogonal series concentrated on the interval [0, 1]. As mentioned
in Efromvich (2010) the choice of the basis function primarily depends
on the support of the function. Thus for the densities on (−∞, ∞), or
on [0, ∞), Hermite and Laguerre series are recommended; see Devroye
and Györfy (2001), Walter (1994), Hall (1980) and Walter (1977).
For compact intervals, trigonometric (or Fourier) series are
recommended; discussion about these can be found in ˇCencov (1980),
Devroy and Györfy (1985), Efromvich (1999), Hart (1997), Silverman
(1986), Hall (1981), Tarter and Lock (1993).
Classical orthogonal polynomials such as Chebyshev, Jacobi, Legendre
and Gegnbauer are also popular; see Trefthen (2013)), Rudzkis and

Once the basis functions are chosen, the density f(x) for a random
sample {x1, ..., xn} may be estimated by
ˆfN (x) =
N
k=0
ˆgkϕk(x), (3.7)
where
ˆgk =
1
n
n
i=1
ϕk(xi). (3.8)
Efromvich (2010) discusses in detail various strategies of selecting N,
albeit in a more general setting by considering the density estimators
of the form
ˆf(x) = ˆf(x, { ˆwk}) =
∞
k=0
ˆwkˆgkϕk(x) (3.9)
that includes the truncated estimator ˆfJ as well as hard-thresholding
and block-thresholding estimators, commonly studied in the wavelet
literature. However, this modiﬁcation will not be pursued in further
discussion.

It can be shown that {1, cos(x), cos(2x), ..} = {cos(kx)}∞
k=0 is an
orthogonal system on [−π, π]. Assuming that for the circular data,
the support of the density is the interval [−π, π], (else we transform
the sample θ1, ..., θn to x1, ..., xn The density estimator for the
transformed data is, therefore, given by
ˆfOC(x) =
1
2π
+
N
k=1
ˆgk cos(kθ). (3.10)
where
ˆgk =
nπ
i=1
cos(kθi).
N is considered a smoothing parameter and may be determined using
the cross-validation method described in Efromvich (2010).

A common problem with truncation and cosine series estimator is that
it may not produce a true density. In order to alleviate this problem
Efromvich (1999) considers L2 projection of ˆf onto a class of
non-negative densities given by
˘f(x) = max(0, ˆf(x) − c), (3.11)
where c is chosen to make ˘f a proper density.
A 2π periodic function h(x) may be approximated by a truncated
Fourier series as
f(x) ≈=
1
2
a0 +
N
1
{ak cos(kx) + bk sin(kx)}, (3.12)
where,
ak =
1
π
π
−π
f(x) cos(kx), k = 0, 1, 2, ...N; (3.13)
bk =
1
π
π
−π
f(x) sin(kx), k = 1, 2, ...N. (3.14)

Fourier Series
In the context of the circular data {θ1, ..., θn} on support [−π, π],
a0 = 1
π , and the unknown coefficients can be estimated by
âk =
1
nπ
n
i=1
cos(kθi),ˆbk =
1
nπ
n
i=1
sin(kθi); k = 1, 2.... (3.15)
Thus, the Fourier series density estimator is given by
ˆfFS(θ) =
1
2
â0 +
N
1
{âk cos(kθ) + ˆbk sin(kθ)}. (3.16)

Orthogonal polynomials of the Chebyshev’s class on [−1, 1] can be
converted to orthogonal polynomials on a circle C = {z| z = 1}
through the transformation
x =
1
2
(z + z−1
).
This has been quite popular in numerical approximation of functions
(see for example Trefethen (2013), Chapter 3). the kth Chebyshev
polynomial can be deﬁned by the real part of the function zk on the
unit circle:
x =
1
2
(z + z−1
) = cos θ, θ = cos−1
x, (3.17)
Tk(x) =
1
2
(zk
+ z−k
) = cos(kθ). (3.18)
The following theorems justify the use of orthogonal polynomial
estimators (see Rudin (1976)).
Theorem
If h is Lipschitz continuous on [−1, 1], it has a unique representation asYogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 30 / 74

Use of Chebyshev polynomials for density estimation for the circular
data requires the correspondence
f(θ) = h(x) : x = cos θ, θ ∈ (−π, π). (3.21)
The density estimator then becomes
ˆfJ (θ) =
1
2π
+
1
π
N
k=1
ˆgk cos(kθ), −π ≤ θ ≤ π, (3.22)
where
ˆgk =
1
n
n
i=1
cos(kθi); k = 1, 2, ...
that gives the same estimator as those considered by considering the
orthogonal functions {cos(kx)}∞
k=0 discussed earlier.l

The Chebyshev weight function is singular at the extremes of the
interval of support. Arbitrary power singularities may be assigned to
each extreme giving a general weight function
w(x) = (1 − x)α
(1 + x)β
(3.23)
where α, β > 1 are parameters. The associated polynomials are
known as Jacobi polynomials, usually denoted as {P
(α,β
n }.
The special case α = β, gives orthogonal polynomials that are known
as as Gegenauer or ultraspherical polynomials and are subject of much
discussion in numerical analysis; see Koornwinder et al. (2010).
The most special case of all α = β = 0 gives a constant weight
function and produces what are known as Legendre polynomials
denoted by Pn(x), n = 0, 1, 2, .... that deﬁne a orthogonal system for
the interval [−1, 1].

They may be simply described as
P0(x) = 1, P1(x) = x (3.24)
and the recurrence relation
(n + 1)Pn+1(x) = (2n + 1)xPn(x) − nPn−1(x).
An explicit representation may be given by the following formula:
Pn(x) = 2n
n
k=0
n
k
n+k−1
2
n
xk
(3.25)

This avoids the possible numerical problem in computing the
coeﬃcients due to singularity at the extremes. Hence this will be a
preferred alternative to the Chebyshev polynomials. In this case the
density estimator is given by and the density estimator in the original
scale is given by
ˆfLP (θ) =
1
2π
+
1
π
N
k=1
ˆgkPk (cos θ) (3.26)
where
ˆgk =
1
n
n
i=1
Pk (cos θi) . (3.27)

Derivative of DF Estimator
The justification of the circular kernel method for smoothing the
density function may be taken over for finding a smooth estimator of
the distribution function. Babu and Chaubey (2006) consider
estimating the distributions defined on a hypercube, extending the
univariate Bernstein polynomials (Babu, Chaubey and Canty (2002),
Vitale (1973)). Denoting the empirical distribution function of a
random sample of n− θ values the Bernstein polynomial estimator of
the distribution function is defined as
Bm(x) =
m
j=0
Fn
j
m
m
j
xj
(1 − x)m−j
, x ∈ [0, 1]. (4.1)

Bernstein Polynomials
The derivative of Bm(x) is proposed at the Bernstein polynomial
density estimator for x ∈ [0, 1] that is given by
ˆBm(x) =
m
j=1
Fn
j
m
− Fn
j − 1
m
β(x; j, m − j + 1), x ∈ [0, 1],
(4.2)
where β(x; a, b) is given by
β(x; a, b) =
1
B(a, b)
xa−1
(1 − x)b−1
, (4.3)
and B(a, b) = (a + b − 1)!/[(a − 1)!(b − 1)!].
Transforming the interval [0, 2π] to [0, 1], to get the density estimator
of Θ/(2π) and transforming back we get the density estimator of
f(θ) as
ˆfB(θ) =
1
2π
m
j=1
Fn
2πj
m
− Fn
2π(j − 1)
m
β
θ
2π
; j, m − j + 1
(4.4)Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 36 / 74

Carnicero et al. (2010) note that this does not provide a periodic
estimator of the density and propose a modification that requires in
defining the distribution function from an origin ν, denoted by F(ν)
such that
F(ν)
n (θ)
2π
m
= 1 − F(ν)
n (θ)
2π(m − 1)
m
.
In practice this requires estimating ν from the data; maximum
likelihood is recommended, however, this does not guarantee that the
equation will be satisfied. In this case it is recommended to average
the two values on both sides of the above and replace the first and
the last weight of the beta function by this value.
Below we show that aperiodic density estimator is obtained, if we
follow the idea behind Theorem 1.1 in approximating F(θ). We quote
the following result from Feller (1965§4.2) as used in Babu, Canty
and Chaubey (2002).

Theorem
Let u be any bounded and continuous function and Ψx,n, n = 1, 2, ... be a
family of distributions with mean µn(x) and variance vn(x) such that
µn(x) → x and h2
n(x) → 0. Then
u∗
(x) =
∞
−∞
u(t)dΨx,n(t) → u(x). (4.5)
The convergence is uniform in every subinterval in which hn(x) → 0
uniformly and u is uniformly continuous.
Replacing u(t) by F(t) we get an uniformly convergent approximation
of F given by
F∗
(θ) =
2π
0
F(η)dΨθ,n(η), 0 ≤ θ < 2π. (4.6)

Bernstein polynomial estimator is obtained by choosing Psix,n
deﬁned on the support [0, 1] by attaching the binomial weight
m
j xj(1 − x)m−j to the point j/m; this makes µn = x and
h2
n = x(1 − x)/n and these satisfy the conditions in the above
theorem.
However, if the support is [0, 2π), a prudent choice of Psiθ,n may be
a circular distribution itself, e.g. von Mises distribution (see (??) )
vM(θ, κn), with mean θ and concentration parameter
κn → 0asn → ∞. Note that we can write (5.5) as
F∗
(θ) = F(2π)Ψθ,n(2π) −
2π
0
Ψθ,n(η)f(η)dη
= 1 −
2π
0
Ψθ,n(η)dF(η) (4.7)

Since F is unknown, in order to estimate F∗, we use a plug-in
estimate resulting into an smooth estimator of F given by
˜F(θ) = 1 −
2π
0
Ψθ,n(η)dFn(η)
= 1 −
1
n
n
i=1
Ψθ,n(θi) (4.8)
Let
dn(θ; η) =
dΨθ,n(η)
dθ
(4.9)
then the proposed smooth estimator of f(θ) is given by
˜f(θ) = −
1
n
n
i=1
dn(θ; θi). (4.10)

Considering circular distributions with mean µ and concentration
parameter kn → 0 as n → ∞, let the density function corresponding
to Ψθ,n correspond to a location family given by ψ(θ − µ; κ) that has
mean µ and concentration parameter κ, then
dn(θ; η) = −ψ(η − θ; κ) (4.11)
and the density estimator ˜f(θ) becomes
˜f(θ) =
1
n
n
i=1
ψ(θi − θ; κ)). (4.12)
which is of the same form as the circular kernel density estimator
given in (2.7).

Candidates for symmetric densities concentrated at zero may be
constructed from stereographic projection of the interval [−1, 1], say,
to the circle. For example, consider the beta-type distribution given
by Seshadri (1991) that is obtained from a symmetric
beta-distribution by the real M¨obius transformation,
w(x; α, γ) =
(1 − α2)γ
B(γ, 1
2)
(1 − x2)γ−1
(1 − αx)2γ
, − 1 < x < 1. (4.13)
Considering α = 0, we get a symmetric distribution over x ∈ [−1, 1].
The stereographic projection of this interval over [0, 2π) is given by
x = sin θ. This gives
ψ(η; θ, n) =
1
2
1
B(n+1
2 , 1
2)
cos
η − θ
2
n
, 0 ≤ η < 2π. (4.14)
This gives a circular symmetric (around η = π) distribution that gets
more and more concentrated as n → ∞. This gives similar type
estimator as given by the circular kernel in (2.7).

ISP of Bernestein Polynomial Estimator
The interval [0, π/2] can be mapped to the interval [0, 1] by the
transformation
x → θ = sin−1
x : θ ∈ [0, π/2] (4.15)
Hence the angles θ ∈ [0, 2π) may be converted to [0, 1] by the
transformation
x = sin(θ/4)
This transforms the Bernstein polynomial to a periodic function given
by
ˆfBS(θ) =
1
4
cos
θ
4
ˆBm sin
θ
4
(4.16)
=
1
4
cos
θ
4
m
j=1
Gn
j
m
− Gn
j − 1
m
wj(θ) (4.17)
=
1
4
cos
θ
4
m
j=1
Fn sin
j
4m
− Fn sin
j − 1
4m
wj(θ),(4.18)
whereYogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 43 / 74

A Connection Between the Circular Kernel Density
Estimator and the Orthogonal Series
Let D be the open unit disk, {z | |z| < 1}, in Z and let µ be a
continuous measure deﬁned on the boundary ∂D, i.e. the circle
{z | |z| = 1}. The point z ∈ D will be represented by z = reiθ for
r ∈ [0, 1), θ ∈ [0, 2π) and i =
√
−1.
A standard result in complex analysis involves the Poisson
representation that involves the real and complex Poisson kernels that
are deﬁned as
Pr(θ, ϕ) =
1 − r2
1 + r2 − 2r cos(θ − ϕ)
(5.1)
for θ, ϕ ∈ [0, 2π) and r ∈ [0, 1) and by
C(z, ω) =
ω + z
ω − z
(5.2)
for ω ∈ ∂D and z ∈ D.

The connection between these kernels is given by the fact that
Pr(θ, ϕ) = Re C(reiθ
, eiϕ
) = (2π)fWC(θ; ϕ, ρ). (5.3)
The Poisson representation says that if g is analytic in a
neighborhood of ¯D with g(0) real, then for z ∈ D,
g(z) =
eiθ + z
eiθ − z
Re(g(eiθ
))
dθ
2π
(5.4)
(see [42, p. 27]).
This representation leads to the result (see (ii) in §5 of [42]) that for
Lebesgue a.e. θ,
lim
r↑1
W(reiθ
) ≡ W(eiθ
) (5.5)
exists and if dµ = w(θ)dθ
2π + dµs with dµs singular,

Estimator
and the Orthogonal Series
Then
w(θ) = ReW(eiθ
), (5.6)
where
W(z) =
eiθ + z
eiθ − z
dµ(θ). (5.7)

Our strategy for smooth estimation is the fact that for dµs = 0 we
have
f(θ) =
1
2π
lim
r↑1
Re W(reiθ
), (5.8)
where, now W(z) is deﬁned as
W(z) =
eiθ + z
eiθ − z
f(θ)dθ. (5.9)
We deﬁne the estimator of f(θ) motivated by considering an
estimator of W(z), the identity (5.6) and (5.8), i.e.
ˆfr(θ) =
1
2π
Re Wn(reiθ
) (5.10)
where
Wn(reiθ
) =
1
N
N
j=1
eiθj + reiθ
eiθj − reiθ
, (5.11)

Then using (5.4), we have
Re Wn(reiθ
) =
1
n
n
j=1
Pr(θ, θj), (5.13)
and therefore
ˆfr(θ) =
1
(2π)n
n
j=1
Pr(θ, θj)
=
1
n
n
j=1
fWC(θ; θj, r), (5.14)
that is of the same form as in (1.12).

Orthogonal Series on a Circle
We get the Fourier expansion of W(z) with respect to the basis
{1, z, z2, ...} as
W(z) = 1 + 2
∞
j=1
cjzj
(5.15)
where
cj = e−ijθ
f(θ)dθ,
is the jth trigonometric moment.
The series is truncated at some term N∗ so that the the error is
negligible.
However, we show below that estimating the trigonometric moment
cn, n = 1, 2, ... as
ˆcj =
1
n
n
k=1
e−ijθk
,
the estimator of W(z) is the same as given in the previous section.

This can be shown by writing
ˆW(z) = 1 +
2
n
n
j=1
{
∞
k=1
e−ikθj
zk
}
= 1 +
2
n
n
j=1
{
∞
k=1
(¯ωjz)k
}; ωj = eiθj
= 1 +
2
n
n
j=1
¯ωjz
1 − ¯ωjz
=
2
n
n
j=1
1
2
+
¯ωjz
1 − ¯ωjz
=
1
n
n
j=1
1 + ¯ωjz
1 − ¯ωjz
=
1
n
n
j=1
C(z, ωj),
which is the same as Wn(z) given in (5.12).

Remark:
This ensures that the orthogonal series estimator of the density
coincides with the circular kernel estimator.
The determination of the smoothing constant may be handled based
on the cross validation method outlined in [46]
Note that the simpliﬁcation used in the above formulae does not work
for r = 1. Even though, the limiting form of (5.15) is used to deﬁne
an orthogonal series estimator as given by
ˆfS(θ) =
1
2π
+
1
πn
n
j=1
n∗
k=1
cos k(θ − θj), (5.16)
where n∗ is chosen according to some criterion, for example to
minimize the integrated squared error.

Thus the above discussion presents two contrasting situations: in one
we have to determine the number of terms in the series and in the
other number of terms in the series is allowed to be inﬁnite, however,
we choose to evaluate Re W(eiθ) for some r close to 1 as an
approximation to Re W(eiθ).
Considering above discussions, we provide below some examples using
the von Mises and wrapped Cauchy kernels and contrast them with
using the inverse stereographic kernels using the normal and logistic
distributions.
It is seen that wrapped Cauchy may provide estimators that are not
as smooth as those provided by other methods. Further investigation
of these methods is in progress.

Turtle Data
The following Table gives the measurements of the directions taken by
76 turtles after treatment from Appendix B.3 in Fisher (1993). This
data has been analysed in Fisher (1989) and Prakasa Rao (1986.)
8,38,50,64,83,98,204,257, 9,38,53,65,88,100,215,268,
13,40,56,65,88,103,223,285, 13,44,57,68,88,106,226,319,
14,45,58,70,90,113,237,343, 18,47,58,73,92,118,238,350,
22,48,61,78,92,138,243, 27,48,63,78,93,153,244,
30,48,64,78,95,153,250, 34,48,64,83,96,155,251

Turtle Data - WC Kernel:
Histogram and Circular Kernel Density
Turtle Data, Stefens(1969)
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.4 Rho=.6
Rho=.7
Rho=.8
Figure: 1. Histogram and Circular Kernel Density with wrapped Cauchy;
ρ = .6, .7, .8.

Turtle Data - VM Kernel:
Histogram and Circular Kernel VM Density
Turtle Data, Stefens(1969)
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.4 Kappa=3
Kappa=4
Kappa=5
Figure: 2. Histogram and Circular Kernel Density with von Mises; κ = 3, 4, 5.

Turtle Data - ISC Normal Kernel:
Histogram and Circular Kernel ISCNorm Density
Turtle Data
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.4 sigma=.25
sigma=.3
sigma=.35
Figure: 3. Histogram and Circular Kernel Density with ISCNorm Kernel;
σ = .25, .3, .35.

Turtle Data - ISC Logistic Kernel:
Histogram and Circular Kernel ISCLogistic Density
Turtle Data
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.4 sigma=.15
sigma=.25
sigma=.35
Figure: 4. Histogram and Circular Kernel Density with ISCLogistic Kernel;
σ = .15, .25, .35.

Turtle Data
The following Table gives the measurements of the directions chosen
by 100 ants in response to an evenly illuminated black target from
Appendix B.7 in Fisher (1993).
330,180,160,200,180,190,10,160,110,140,
290,160,200,180,160,210,220,180,270,40,
60,280,190,120,210,220,180,120,180,300,
200,180,250,200,190,200,210,150,200,80,
200,170,180,210,180,60,170,300,180,210,
180,190,30,130,230,260,90,190,140,200,
280,180,200,30,50,110,160,220,360,170,
220,140,180,210,150,180,180,160,150,200,
190,150,200,200,210,220,170,70,160,210,
180,150,350,230,180,170,200,190,170,190

Ants Data - WC Kernel:
Histogram and Circular Kernel WC Density
Ants Data
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.40.5 Rho=.81
Rho=.82
Rho=.83
Figure: 5. Histogram and Circular Kernel Density with wrapped Cauchy;
ρ = .81, .84, .86.

Ants Data - VM Kernel:
Histogram and Circular Kernel VM Density
Ants Data
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.40.5 Kappa=5
Kappa=6
Kappa=7
Figure: 6. Histogram and Circular Kernel Density with von Mises Kernel;
κ = 10, 20, 40.

Ants Data - ISC Normal Kernel:
Histogram and Circular Kernel ISCNorm Density
Ants Data
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.40.5 sigma=.2
sigma=.25
sigma=.3
Figure: 7. Histogram and Circular Kernel Density with ISCNorm kernel;
σ = .2, .25, .3.

Turtle Data - ISC Logistic Kernel:
Histogram and Circular Kernel ISCLogistic Density
Ants Data
Theta
Density
0 1 2 3 4 5 6 7
0.00.10.20.30.40.5 sigma=.15
sigma=.25
sigma=.35
Figure: 8. Histogram and Circular Kernel Density with ISCNorm kernel;
σ = .15, .25, .35.

Conclusion:
The von Mises kernel seems to provide smoother plots as compared to WC
but they are qualitatively the same. The estimators obtained by IS
transformation also produce similar results as to those given by the circular
kernel estimators. Smoothing parameter may be selected using the
proposal described in Taylor (2008). An enhanced strategy is to
investigate a range of values around the value given by cross-validation.

References
1 Abe, T. and Pewsey, A. (2011). Symmetric Circular Models Through
Duplication and Cosine Perturbation. Computational Statistics &
Data Analysis, 55(12), 3271–3282.
2 Bai, Z.D., Rao, C.R., Zhao, L.C. (1988). Kernel Estimators of
Density Function of Directional Data. Journal of Multivariate
Analysis, 27(1), 24–39.
3 Babu, G. J.; Canty, A.; Chaubey, Y. (2002). Application of Bernstein
polynomials for smooth estimation of a distribution and density
function. J. Statist. Plann. Inference , 105, 377-392.
4 Babu, G. Jogesh, and Chaubey, Yogendra P. (2006). Smooth
estimation of a distribution and density function on hypercube using
Bernstein polynomials for dependent random vectors. Statistics and
Probability Letters, 76, 959-969.
5 Buckland S.T. (1992). Fitting density functions with polynomials. J.
R. Stat. Soc., A41, 6376.

References
6 Carnicero, J.A., Wiper, M.P. and Aus´ın, M.C. (2010). Circular
Bernstein polynomial distributions. Working Paper 10-25, Statistics
and Econometrics Series 11, Departamento de Estadstica Universidad
Carlos III de Madrid. e-archivo.uc3m.es/bitstream/handle/
10016/8318/ws102511.pdf?sequence=1
7 ˇCencov, N.N. (1980). Evaluation of an unknown distribution density
from observations. Soviet Math. Dokl., 3, 15591562.
8 ˇCencov, N.N. (1980).Statistical Decision Rules and Optimum
Inference. New York: Springer-Verlag.
9 Chaubey, Yogendra P. (2016). Smooth Kernel Estimation of a
Circular Density Function: A Connection to Orthogonal Polynomials
on the Unit Circle. Preprint, https://arxiv.org/abs/1601.05053
10 Chaubey, Yogendra P.; Li, J.; Sen, A. and Sen, P.K. (2012). A new
smooth density estimator for non-negative random variables. Journal
of the Indian Statistical Association, 50, 83-104.

References
11 Devroye, L. and Gy¨orﬁ, L. (1985). Nonparametric Density Estimation:
The L1 View. New York: John Wiley & Sons
12 Di Marzio, M., Panzera, A., Taylor, C.C. (2009). Local Polynomial
Regression for Circular Predictors. Statistics & Probability Letters,
79(19), 2066–2075.
13 Di Marzio M, Panzera A, Taylor C. C. (2011). Kernel density
estimation on the torus. Journal of Statistical Planning & Inference,
141, 2156-2173.
14 Efromvich, S. (1999). Nonparametric Curve Estimation: Methods,
Theory, and Applications, New York: Springer.
15 Efromvich, S. (2010). Orthogonal series density estimation. Wiley
Interdisciplinary Rreviews, 2, 467-476.

References
16 E˘gecio˘glu, Ömer and Srinivasan, Ashok (2000). Efficient
nonparametric density estimation on the sphere with applications in
fluid mechanics. Siam J. Sci. Comput. 22 152-176.
17 Feller, W. (1965). An Introduction to Probability Theory and its
Applications, Vol. II. New York: Wiley.
18 Fernändez-Durän, J.J. (2004). Circular distributions based on
nonnegative trigonometric sums, Biometrics, 60, 499–503.
19 Fisher, N.I. (1989). Smoothing a sample of circular data, J.
Structural Geology, 11, 775–778.
20 Fisher, N.I. (1993). Statistical Analysis of Circular Data. Cambridge
University Press, Cambridge.

References
21 Hall P. (1980). Estimating a density on the positive half line by the
method of orthogonal series. Ann. Inst. Stat. Math. 32, 351362.
22 Hall P. (1981). On trigonometric series estimates of densities.Ann.
Stat. 9, 683685.
23 Hall P, Watson GP, Cabrera J (1987). Kernel Density Estimation for
Spherical Data. Biometrika, 74(4), 751–762.
24 Hart JD. (1997). Nonparametric Smoothing and Lack-of-Fit Tests.
New York: Springer.
25 Jammalamadaka, S. R., SenGupta, A. (2001). Topics in Circular
Statistics. World Scientiﬁc, Singapore.

References
26 Jones, M.C. and Pewsey, A (2012). Inverse Batschelet Distributions
for Circular Data. Biometrics, 68(1), 183–193.
27 Kato, S. and Jones, M.C. (2010). A family of distributions on the
circle with links to, and applications arising from, M¨obius
transformation. J. Amer. Statist. Assoc. 105, 249-262.
28 Kato, S. and Jones, M. C. (2015). A tractable and interpretable
four-parameter family of unimodal distributions on the circle.
Biometrika, 102(9), 181–190.
29 Klemel¨a, J. (2000). Estimation of Densities and Derivatives of
Densities with Directional Data. Journal of Multivariate Analysis,
73(1), 18–40.
30 Tom H. Koornwinder, T.H., Wong, R.S.C., Koekoek R., Swarttouw,
R.F. (2010). Orthogonal polynomials. In NIST Handbook of
Mathematical Functions, Eds: Olver, Frank W. J.; Lozier, Daniel M.;
Boisvert, Ronald F.; Clark, Charles W., London: Cambridge University
Press.Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 69 / 74

References
31 Mardia, K.V. (1972). Statistics of Directional Data. Academic Press,
New York.
32 Mardia, K.V. and Jupp, P. E. (2000). Directional Statistics. John
Wiley & Sons, New York, NY, USA.
33 Mhaskar, H.N. and Pai, D.V. (2000). Fundamentals of approximation
Theory. Narosa Publishing House, New Delhi, India.
34 Minh, D. and Farnum, N. (2003). Using bilinear transformations to
induce probability distributions. Commun. Stat.-Theory Meth., 32,
1–9.
35 Mooney A., Helms P.J. and Jolliﬀe, I.T. (2003). Fitting mixtures of
von Mises distributions: a case study involving sudden infant death
syndrome. Comput. Stat. Data Anal., 41, 505– 513.

References
36 Parzen, E. (1962). On Estimation of a Probability Density Function
and Mode. Ann. Math. Statist, 33:3, 1065-1076.
37 Prakasa Rao, B.L.S. (1983). Non Parametric Functional Estimation.
Academic Press, Orlando, Florida.
38 Rudin, W. (1987). Real and Complex Analysis- Third Edition.
McGraw Hill: New York.
39 Rudzkis R, Radavicius M. (2005). Adaptive estimation of distribution
density in the basis of algebraic polynomials. Theory Probab Appl,
49, 93109.
40 Rosenblatt, M. (1956). Remarks on some nonparametric estimates of
a density function. Ann. Math. Stat., 27:56, 832837

References
41 Seshadri, V. (1991). A family of distributions related to the
McCullagh family. Statistics and Probability Letters, 12, 373–378.
42 Simon, B. (2005). Orthogonal Polynomials on the Unit Circle, Part 1:
Classical Theory. American Mathematical Society, Providence, Rhode
Island.
43 Shimizu, K. and Iida, K. (2002). Pearson type vii distributions on
spheres. Commun. Stat.–Theory Meth., 31, 513–526.
44 Silverman B. W. (1986). Density Estimation for Statistics and Data
Analysis. London: Chapman & Hall.
45 Tarter, M.E. and Lock, M. D. (1993). Model-Free Curve Estimation.
New York: Chapman & Hall.

References
46 Taylor, C.C. (2008). Automatic Bandwidth Selection for Circular
Density Estimation. Computational Statistics & Data Analysis, 52(7),
3493–3500.
47 Trefthen, L. N. (2013). Approximation Theory and Approximation
Practice. SIAM: Philadelphia
48 Vitale, R.A. (1973). A Bernstein polynomial approach to density
estimation. Commun Stat, 2, 493-506.
49 Walter, G. G. (1977). Properties of Hermite series estimation of
probability density. Ann. Stat., 5, 12581264.
50 Walter, G. G. (1994). Wavelets and other Orthogonal Systems with
Applications. London: CRC Press.

Talk slides will be available on SlideShare:
www.slideshare.net/ychaubey/talk-slides-msast2016-70014046
THANKS!!

Talk slides imsct2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Talk slides imsct2016

Similar to Talk slides imsct2016 (20)

Recently uploaded

Recently uploaded (20)

Talk slides imsct2016