Sunteți pe pagina 1din 41

Introduction to Curve Estimation

0.006
0.004
Density

0.002
0.000

700 800 900 1000 1100 1200 1300

Wilcoxon score
Michael E. Tarter & Micheal D. Lock
Model-Free Curve Estimation
Monographs on Statistics and Applied Probability 56
Chapman & Hall, 1993.

Chapters 1–4.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 1


Outline

1. Generalized representation

2. Short review on Fourier series

3. Fourier series density estimation

4. Kernel density estimation

5. Optimizing density estimates

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 2


Generalized representation

Estimation versus Specification


 KAA
 A
 A
 A
 A
 A
 A

We are familiar How can we be


with its theory sure about the
and application. underlying distribution?

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 3


Usual density representation:

– composed of elementary functions


– usually in closed form
– finite and rather small number of “personalized” parameters

Generalized representation:

– infinite number of parameters


– usually: representation as infinite sum of elementary functions
→ Fourier series density estimation
→ Kernel density estimation

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 4


Complex Fourier series


X
f (x) = Bk exp{2πikx}
k=−∞

– x ∈ [0, 1].
– {Bk } are called Fourier coefficients.
– Why can we represent any function in such a way?

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 5


Some useful features:

Ψk = exp{2πikx}, {Ψk } forms an orthonormal sequence, that is



Z1 1 k = l

exp{2πi(k − l)x}dx =
0 k 6= l

0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 6


{Ψk } is complete, that is
 2
Z1 m
X
lim f (x) − Bk exp{2πikx} dx = 0
m→∞
0 k=−m

Therefore, we can expand every function f (x), x ∈ [0, 1], in space L2


with Fourier series.
2
|f (x)|2dx < ∞, which holds for
R
L2 function assumes that kf k =
most of the curves we are interested in.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 7


Fourier series density estimation

Given an iid sample {Xj }, j = 1, . . . , n, with support on [0, 1]


(otherwise rescale).

Representation of true density:


X Z1
f (x) = Bk exp{2πikx} with Bk = f (x) exp{−2πikx}dx
k=−∞ 0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 8


Estimator:
∞ n
X 1 X
fˆ(x) = bk B̂k exp{2πikx} with B̂k = exp{−2πikXj }
n j=1
k=−∞

{bk } are called multipliers.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 9


Estimator:
∞ n
X 1 X
fˆ(x) = bk B̂k exp{2πikx} with B̂k = exp{−2πikXj }
n j=1
k=−∞

{bk } are called multipliers.

Easy computation:

Use exp{−2πikXj } = cos(2πkXj ) − i sin(2πkXj ) and B̂−k = B̂k?


(complex conjugate). B̂0 ≡ 1.
Therefore, computation only needed for positive k.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 9


B̂k is unbiased estimator for Bk .

However, fˆ is usually biased because number of terms is either infinite


or unknown.

Another advantage of sample coefficients {B̂k }: Same set leads to


variety of other estimates.

That’s where multipliers come into play!

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 10


Fourier multipliers

“Raw” density estimator:



1 |k| ≤ m
 m
X
bk = ⇒ fˆ(x) = B̂k exp{2πikx}
0 |k| > m

k=−m

Evaluate fˆ(x) in equally spaced points x ∈ [0, 1].

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 11


Estimating the expectation

Z1 m
ˆ 1 X 1
µ̂ = xf (x)dx = · · · = + B̂k
2 2πik
0 k=−m
k6=0


(2πik)−1

|k| ≤ m, k 6= 0 1
bk = , evaluate at x = 0 and add .
0
 |k| > m 2

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 12


Advantages of multipliers

– Examination of various distributional features without recomputing


sample coefficients.

– Optimize the estimation procedure.

– Smoothing of estimated curve vs. higher contrast.

Some examples . . .

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 13


Raw Fourier series density estimator with m = 3
3.0
2.5
2.0
Density

1.5
1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 14


Raw Fourier series density estimator with m = 7
3.0
2.5
2.0
Density

1.5
1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 15


Raw Fourier series density estimator with m = 15
3.0
2.5
2.0
Density

1.5
1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 16


Kernel density estimation

Histograms are crude kernel density estimators where the kernel is a


block (rectangular shape) somehow positioned over a data point.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 17


Kernel density estimation

Histograms are crude kernel density estimators where the kernel is a


block (rectangular shape) somehow positioned over a data point.

Kernel estimators:

– use various shapes as kernels


– place the center of a kernel right over the data point
– spread the influence of one point with varying kernel width
⇒ contribution from each kernel is summed to overall estimate

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 17


Gaussian kernel density estimate
1.5
1.0
0.5
0.0

● ● ● ● ● ● ● ● ● ●

0 5 10 15

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 18


Kernel estimator

n  
ˆ 1 X x − Xj
f (x) = K
nh j=1 h

– h is called bandwidth or smoothing parameter.

– K is the kernel function: nonnegative and symmetric such that


Z Z
K(x)dx = 1 and xK(x)dx = 0.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 19


– Under mild conditions (h must decrease with increasing n) the
kernel estimate converges in probability to the true density.

– Choice of kernel function usually depends on computational criteria.

– Choice of bandwidth is more important (see literature on “Kernel


Smoothing”).

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 20


Some kernel functions

Gauss Triangular
0.4

1.0
0.8
0.3

0.6
0.2

0.4
0.1

0.2
0.0
0.0

−4 −2 0 2 4 −2 −1 0 1 2

Epanechnikov
0.30
0.25
0.20

3(1 − y 2/5) √
0.15

K(y) = √ , |y| ≤ 5
0.10

4 5
0.05
0.00

−3 −2 −1 0 1 2 3

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 21


Duality of Fourier series and kernel methodology

X
fˆ(x) = bk B̂k exp{2πikx}
k
n X
1 X
= bk exp{2πik(x − Xj )}
n j=1
k

With h = 1:
X
K(x) = bk exp{2πikx}
k

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 22


The Dirichlet kernel

The raw density estimator has kernel KD :


m
X sin(π(2m + 1)x)
KD (x) = exp{2πikx} = · · · =
sin(πx)
k=−m

where lim KD (x) = 2m + 1.


x→0

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 23


Dirichlet kernels

Dirichlet with m = 4 Dirichlet with m = 8


30

30
20

20
10

10
0

0
−10

−10
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

Dirichlet with m = 12
30
20
10
0
−10

−0.4 −0.2 0.0 0.2 0.4

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 24


Differences between kernel and Fourier representation

– Fourier estimates are restricted to finite intervals while some kernels


are not.

– As Dirichlet kernel shows, kernel estimates can result in negative


values if the kernel function takes on negative values.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 25


– Kernel: K controls shape, h controls spread of kernel.
Two-step strategy: Select kernel function and choose data-
dependent smoothing parameter.

– Fourier: m controls both shape and spread.


Goodness-of-fit can be governed by entire multiplier sequence.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 26


Optimizing density estimates

Optimization with regard to weighted mean integrated square error


(MISE):
Z1  2
J(fˆ, f, w) = E f (x) − fˆ(x) w(x)dx.
0

w(x) is nonnegative weight function to emphasize estimation over


subregions. First consider optimization with w(x) ≡ 1.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 27


The raw density estimator again

m ∞
X 1 X
J(fˆ, f ) = 2 1 − |Bk |2 + 2 |Bk |2

n
k=1 k=m+1

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 28


The raw density estimator again

m ∞
X 1 X
J(fˆ, f ) = 2 1 − |Bk |2 + 2 |Bk |2

n
k=1 k=m+1

 KAA
 A
 A
 A
 A
 A
 A

Variance component Bias component

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 28


Single term stopping rule

– Estimate ∆Js = J(fˆs, f )−J(fˆs−1, f ), gain of including sth Fourier


coefficient. MISE is decreased if ∆Js is negative.

– Include terms only if their inclusion results in negative difference.


Multiple testing problem!

– Inclusion of higher-order terms results in rough estimate.

– Suggestions: Stop after t successive nonnegative inclusions. Choice


of t is data/curve dependent.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 29


Other stopping rules

– Different considerations about estimating MISE lead to various


optimization concepts.

– Not at all generally superior to single term rule. Depends on curve


features.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 30


Multiplier sequences

– So far: “raw” estimate with bk = 1 or bk = 0.

– Now allow {bk } to be sequence tending to zero with increasing k.

– Concepts depend again on considerations about MISE.

– Question of advisable stopping rule remains.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 31


Two-step strategy with multiplier sequence

1. Estimate with raw estimator and one of former stopping rules.

2. Applying a multiplier sequence to the remaining terms will always


improve the estimate.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 32


Weighted MISE

Z1  2
J(fˆ, f, w) = E f (x) − fˆ(x) w(x)dx.
0

– Weight functions w(x) emphasize subregions of support interval


(e.g. left or right tails).
– Turns out that unweighted MISE leads to great accuracy in regions
with high density.
⇒ Weighting will improve estimate when other regions are of interest.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 33


Data transformation

Xj −min(X)
– Data needs rescaling to [0, 1]. Always possible: max(X)−min(X)

– Next approach: Transform data in nonlinear manner to emphasize


subregions.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 34


Data transformation

Xj −min(X)
– Data needs rescaling to [0, 1]. Always possible: max(X)−min(X)

– Next approach: Transform data in nonlinear manner to emphasize


subregions.

– Let G : [a, b] → [0, 1] be strictly increasing one-to-one function


dG(x)
with g(x) = dx .

⇒ Ψk (G(x)) = exp{2πikG(x)} is orthonormal on [a, b] with respect


to weight g(x).

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 34


Transformation and optimization

– Data transformation with G(x) is equivalent to weighted MISE


with w(x) = 1/g(x).

– Only difference to unweighted MISE: Computation of Fourier


coefficients involves application of G(x).

⇒ Strategy: Transform data, optimize with unweigthed procedures,


retransform.
Most efficient: Transform data to unimodal symmetric distribution.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 35


Application to gene expression data

– Problem: Fitting two distributions to another by removing a


minimal number of data points.

– Idea: Estimate the two densities in an optimal manner. Remove


points until goodness-of-fit is high with regard to modified MISE.

Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 36

S-ar putea să vă placă și