Curve Estimation

Introduction to Curve Estimation
0.006
0.004
Density
0.002
0.000
700 800 900 1000 1100 1200 1300
Wilcoxon score
Michael E. Tarter & Micheal D. Lock
Model-Free Curve Estimation
Monographs on Statistics and Applied Probability 56
Chapman & Hall, 1993.
Chapters 1–4.
Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 1

Outline
1. Generalized representation
2. Short review on Fourier series
3. Fourier series density estimation
4. Kernel density estimation
5. Optimizing density estimates

Generalized representation
Estimation versus Specification

KAA
A
A
A
A
A
A
We are familiar How can we be

with its theory sure about the
and application. underlying distribution?

Usual density representation:
– composed of elementary functions

– usually in closed form
– finite and rather small number of “personalized” parameters
Generalized representation:
– infinite number of parameters

– usually: representation as infinite sum of elementary functions
→ Fourier series density estimation
→ Kernel density estimation

Complex Fourier series
∞
X
f (x) = Bk exp{2πikx}
k=−∞
– x ∈ [0, 1].
– {Bk } are called Fourier coefficients.
– Why can we represent any function in such a way?

Some useful features:
Ψk = exp{2πikx}, {Ψk } forms an orthonormal sequence, that is


Z1 1 k = l

exp{2πi(k − l)x}dx =
0 k 6= l

0

{Ψk } is complete, that is
 2
Z1 m
X
lim f (x) − Bk exp{2πikx} dx = 0
m→∞
0 k=−m
Therefore, we can expand every function f (x), x ∈ [0, 1], in space L2

with Fourier series.
2
|f (x)|2dx < ∞, which holds for
R
L2 function assumes that kf k =
most of the curves we are interested in.

Fourier series density estimation
Given an iid sample {Xj }, j = 1, . . . , n, with support on [0, 1]

(otherwise rescale).
Representation of true density:
∞
X Z1
f (x) = Bk exp{2πikx} with Bk = f (x) exp{−2πikx}dx
k=−∞ 0

Estimator:
∞ n
X 1 X
fˆ(x) = bk B̂k exp{2πikx} with B̂k = exp{−2πikXj }
n j=1
k=−∞
{bk } are called multipliers.

Estimator:
∞ n
X 1 X
fˆ(x) = bk B̂k exp{2πikx} with B̂k = exp{−2πikXj }
n j=1
k=−∞
{bk } are called multipliers.
Easy computation:
Use exp{−2πikXj } = cos(2πkXj ) − i sin(2πkXj ) and B̂−k = B̂k?

(complex conjugate). B̂0 ≡ 1.
Therefore, computation only needed for positive k.

B̂k is unbiased estimator for Bk .
However, fˆ is usually biased because number of terms is either infinite

or unknown.
Another advantage of sample coefficients {B̂k }: Same set leads to

variety of other estimates.
That’s where multipliers come into play!

Fourier multipliers
“Raw” density estimator:


1 |k| ≤ m
 m
X
bk = ⇒ fˆ(x) = B̂k exp{2πikx}
0 |k| > m

k=−m
Evaluate fˆ(x) in equally spaced points x ∈ [0, 1].

Estimating the expectation
Z1 m
ˆ 1 X 1
µ̂ = xf (x)dx = · · · = + B̂k
2 2πik
0 k=−m
k6=0

(2πik)−1

|k| ≤ m, k 6= 0 1
bk = , evaluate at x = 0 and add .
0
 |k| > m 2

Advantages of multipliers
– Examination of various distributional features without recomputing

sample coefficients.
– Optimize the estimation procedure.
– Smoothing of estimated curve vs. higher contrast.
Some examples . . .

Raw Fourier series density estimator with m = 3
3.0
2.5
2.0
Density
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0

3.0
2.5
2.0
Density
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0

3.0
2.5
2.0
Density
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0

Kernel density estimation
Histograms are crude kernel density estimators where the kernel is a

block (rectangular shape) somehow positioned over a data point.

Kernel density estimation
Histograms are crude kernel density estimators where the kernel is a

block (rectangular shape) somehow positioned over a data point.
Kernel estimators:
– use various shapes as kernels

– place the center of a kernel right over the data point
– spread the influence of one point with varying kernel width
⇒ contribution from each kernel is summed to overall estimate

Gaussian kernel density estimate
1.5
1.0
0.5
0.0
● ● ● ● ● ● ● ● ● ●
0 5 10 15

Kernel estimator
n
ˆ 1 X x − Xj
f (x) = K
nh j=1 h
– h is called bandwidth or smoothing parameter.
– K is the kernel function: nonnegative and symmetric such that

Z Z
K(x)dx = 1 and xK(x)dx = 0.

– Under mild conditions (h must decrease with increasing n) the
kernel estimate converges in probability to the true density.
– Choice of kernel function usually depends on computational criteria.
– Choice of bandwidth is more important (see literature on “Kernel

Smoothing”).

Some kernel functions
Gauss Triangular
0.4
1.0
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0.0
0.0
−4 −2 0 2 4 −2 −1 0 1 2
Epanechnikov
0.30
0.25
0.20
3(1 − y 2/5) √
0.15
K(y) = √ , |y| ≤ 5
0.10
4 5
0.05
0.00
−3 −2 −1 0 1 2 3

Duality of Fourier series and kernel methodology
X
fˆ(x) = bk B̂k exp{2πikx}
k
n X
1 X
= bk exp{2πik(x − Xj )}
n j=1
k
With h = 1:
X
K(x) = bk exp{2πikx}
k

The Dirichlet kernel
The raw density estimator has kernel KD :

m
X sin(π(2m + 1)x)
KD (x) = exp{2πikx} = · · · =
sin(πx)
k=−m
where lim KD (x) = 2m + 1.

x→0

Dirichlet kernels
Dirichlet with m = 4 Dirichlet with m = 8

30
30
20
20
10
10
0
0
−10
−10
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4
Dirichlet with m = 12
30
20
10
0
−10
−0.4 −0.2 0.0 0.2 0.4

Differences between kernel and Fourier representation
– Fourier estimates are restricted to finite intervals while some kernels

are not.
– As Dirichlet kernel shows, kernel estimates can result in negative

values if the kernel function takes on negative values.

– Kernel: K controls shape, h controls spread of kernel.
Two-step strategy: Select kernel function and choose data-
dependent smoothing parameter.
– Fourier: m controls both shape and spread.

Goodness-of-fit can be governed by entire multiplier sequence.

Optimizing density estimates
Optimization with regard to weighted mean integrated square error

(MISE):
Z1 2
J(fˆ, f, w) = E f (x) − fˆ(x) w(x)dx.
0
w(x) is nonnegative weight function to emphasize estimation over

subregions. First consider optimization with w(x) ≡ 1.

The raw density estimator again
m ∞
X 1 X
J(fˆ, f ) = 2 1 − |Bk |2 + 2 |Bk |2

n
k=1 k=m+1

The raw density estimator again
m ∞
X 1 X
J(fˆ, f ) = 2 1 − |Bk |2 + 2 |Bk |2

n
k=1 k=m+1
KAA
A
A
A
A
A
A
Variance component Bias component

Single term stopping rule
– Estimate ∆Js = J(fˆs, f )−J(fˆs−1, f ), gain of including sth Fourier

coefficient. MISE is decreased if ∆Js is negative.
– Include terms only if their inclusion results in negative difference.

Multiple testing problem!
– Inclusion of higher-order terms results in rough estimate.
– Suggestions: Stop after t successive nonnegative inclusions. Choice

of t is data/curve dependent.

Other stopping rules
– Different considerations about estimating MISE lead to various

optimization concepts.
– Not at all generally superior to single term rule. Depends on curve

features.

Multiplier sequences
– So far: “raw” estimate with bk = 1 or bk = 0.
– Now allow {bk } to be sequence tending to zero with increasing k.
– Concepts depend again on considerations about MISE.
– Question of advisable stopping rule remains.

Two-step strategy with multiplier sequence
1. Estimate with raw estimator and one of former stopping rules.
2. Applying a multiplier sequence to the remaining terms will always

improve the estimate.

Weighted MISE
Z1 2
J(fˆ, f, w) = E f (x) − fˆ(x) w(x)dx.
0
– Weight functions w(x) emphasize subregions of support interval

(e.g. left or right tails).
– Turns out that unweighted MISE leads to great accuracy in regions
with high density.
⇒ Weighting will improve estimate when other regions are of interest.

Data transformation
Xj −min(X)
– Data needs rescaling to [0, 1]. Always possible: max(X)−min(X)
– Next approach: Transform data in nonlinear manner to emphasize

subregions.

Data transformation
Xj −min(X)
– Data needs rescaling to [0, 1]. Always possible: max(X)−min(X)
– Next approach: Transform data in nonlinear manner to emphasize

subregions.
– Let G : [a, b] → [0, 1] be strictly increasing one-to-one function

dG(x)
with g(x) = dx .
⇒ Ψk (G(x)) = exp{2πikG(x)} is orthonormal on [a, b] with respect

to weight g(x).

Transformation and optimization
– Data transformation with G(x) is equivalent to weighted MISE

with w(x) = 1/g(x).
– Only difference to unweighted MISE: Computation of Fourier

coefficients involves application of G(x).
⇒ Strategy: Transform data, optimize with unweigthed procedures,

retransform.
Most efficient: Transform data to unimodal symmetric distribution.

Application to gene expression data
– Problem: Fitting two distributions to another by removing a

minimal number of data points.
– Idea: Estimate the two densities in an optimal manner. Remove

points until goodness-of-fit is high with regard to modified MISE.

Curve Estimation

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Curve Estimation

Încărcat de

Drepturi de autor:

Formate disponibile

Introduction to Curve Estimation

700 800 900 1000 1100 1200 1300

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 1

2. Short review on Fourier series

3. Fourier series density estimation

4. Kernel density estimation

5. Optimizing density estimates

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 2

Estimation versus Specification

We are familiar How can we be

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 3

– composed of elementary functions

– infinite number of parameters

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 4

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 5

Ψk = exp{2πikx}, {Ψk } forms an orthonormal sequence, that is

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 6

Therefore, we can expand every function f (x), x ∈ [0, 1], in space L2

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 7

Given an iid sample {Xj }, j = 1, . . . , n, with support on [0, 1]

Representation of true density:

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 8

{bk } are called multipliers.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 9

{bk } are called multipliers.

Use exp{−2πikXj } = cos(2πkXj ) − i sin(2πkXj ) and B̂−k = B̂k?

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 9

However, fˆ is usually biased because number of terms is either infinite

Another advantage of sample coefficients {B̂k }: Same set leads to

That’s where multipliers come into play!

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 10

“Raw” density estimator:

Evaluate fˆ(x) in equally spaced points x ∈ [0, 1].

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 11

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 12

– Examination of various distributional features without recomputing

– Optimize the estimation procedure.

– Smoothing of estimated curve vs. higher contrast.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 13

0.0 0.2 0.4 0.6 0.8 1.0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 14

0.0 0.2 0.4 0.6 0.8 1.0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 15

0.0 0.2 0.4 0.6 0.8 1.0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 16

Histograms are crude kernel density estimators where the kernel is a

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 17

Histograms are crude kernel density estimators where the kernel is a

– use various shapes as kernels

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 17

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 18

– h is called bandwidth or smoothing parameter.

– K is the kernel function: nonnegative and symmetric such that

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 19

– Choice of kernel function usually depends on computational criteria.

– Choice of bandwidth is more important (see literature on “Kernel

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 20

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 21

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 22

The raw density estimator has kernel KD :

where lim KD (x) = 2m + 1.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 23

Dirichlet with m = 4 Dirichlet with m = 8

−0.4 −0.2 0.0 0.2 0.4