Near Optimal Thresholding Estimation of A Poisson Intensity On The Real Line

a
r
X
i
v
:
0
8
1
0
.
5
2
0
4
v
1

[
m
a
t
h
.
S
T
]

2
9

O
c
t

2
0
0
8
Near optimal thresholding estimation of a Poisson
intensity on the real line
Patricia Reynaud-Bouret
1
and Vincent Rivoirard
2
Abstract The purpose of this paper is to estimate the intensity of a Poisson process N by using
thresholding rules. In this paper, the intensity, dened as the derivative of the mean measure of N
with respect to ndx where n is a xed parameter, is assumed to be non-compactly supported. The
estimator

f
n,
based on random thresholds is proved to achieve the same performance as the oracle
estimator up to a possible logarithmic term. Then, minimax properties of

f
n,
on Besov spaces
B
p,q
are established. Under mild assumptions, we prove that
sup
fB
p,q
L
E(||
f
n,
f||
2
2
) C
_
log n
n
_
+
1
2
+
(
1
2
1
p
)
+
and the lower bound of the minimax risk for B
p,q
L
coincides with the previous upper bound up to

the logarithmic term. This new result has two consequences. First, it establishes that the minimax
rate of Besov spaces B
p,q
with p 2 when non compactly supported functions are considered is
the same as for compactly supported functions up to a logarithmic term. When p > 2, the rate
exponent, which depends on p, deteriorates when p increases, which means that the support plays
a harmful role in this case. Furthermore,

f
n,
is adaptive minimax up to a logarithmic term.
Keywords Adaptive estimation, Model selection, Oracle inequalities, Poisson process, Thresh-
olding rule
Mathematics Subject Classication (2000) 62G05 62G20
1 Introduction
The goal of the present paper is to derive a data-driven thresholding method to estimate the
intensity of a Poisson process on the real line.
Poisson processes have been used for years to model a wide variety of situations, and in particular
data whose maximal size is a priori unknown. For instance, in nance, Merton [29] introduces
Poisson processes to model stock-price changes of extraordinary magnitude. In geology, Uhler and
Bradley [32] use Poisson processes to model the occurrences of petroleum reservoirs whose size is
highly inhomogeneous. Actually, if we only focus on the size of the jumps in Mertons model or
on the sizes of individual oil reservoirs, these models consist in an inhomogeneous Poisson process
with heavy-tailed intensities (see [19] for a precise formalism for the nancial example). So, our
goal is to provide data-driven estimation of a Poisson intensity with as few support assumptions as
possible.
1
CNRS and Departement de Mathematiques et Applications, ENS-Paris, 45 Rue dUlm, 75230 Paris Cedex 05,
France. Email: reynaud@dma.ens.fr
2
Equipe Probabilite, Modelisation et Statistique, Laboratoire de Mathematique, CNRS UMR 8628, Universite
Paris Sud, 91405 Orsay Cedex, France. Departement de Mathematiques et Applications, ENS-Paris, 45 Rue dUlm,
75230 Paris Cedex 05, France. Email: Vincent.Rivoirard@math.u-psud.fr
2 P. Reynaud-Bouret and V. Rivoirard
Of course, many adaptive methods have been proposed to deal with Poisson intensity esti-
mation. For instance, Rudemo [31] studied data-driven histogram and kernel estimates based on
the cross-validation method. Donoho [15] tted the universal thresholding procedure proposed by
Donoho and Johnstone [17] by using the Anscombes transform. Kolaczyk [28] rened this idea by
investigating the tails of the distribution of the noisy wavelet coecients of the intensity. For a
particular inverse problem, Cavalier and Koo [10] rst derived optimal estimates in the minimax
setting. More precisely, for their tomographic problem, Cavalier and Koo [10] pointed out minimax
thresholding rules on Besov balls. By using model selection, other optimal estimators have been
proposed by Reynaud-Bouret [30] or Willet and Nowak [33].
To derive sharp theoretical results, these methods need to assume that the intensity has a known
bounded support and belongs to L
. Model selection may allow to remove the assumption on the

support. See oracle results established by [19] who nevertheless assumes that the intensity belongs
to L
. We have to mention that the model selection methodology proposed by Baraud and Birge
[7], [4] is assumption-free as well. However, as explained by Birge [7], it is too computationally
intensive to be implemented. Besides, in [7], [4] and [19], minimax performances on classical
functional spaces are derived only for compactly supported signals.
In the present paper, to estimate the intensity of a Poisson process, we propose an easily
implementable thresholding rule specied in the next section. This procedure is near optimal
under oracle and minimax points of view. We do not assume that the support of the intensity is
known or even nite and most of the time, the signal to estimate may be unbounded.
1.1 The thresholding procedure and main result
In the sequel, we consider a Poisson process on the real line, denoted N, whose mean measure
is nite and absolutely continuous with respect to the Lebesgue measure (see Section 2.1 where we
recall classical facts on Poisson processes). Given n a positive integer, we introduce f L
1
(R) the
intensity of N as
f(x) =
d
x
ndx
.
Since f belongs to L
1
(R), the total number of points of the process N, denoted N
R
, satises
E(N
R
) = n||f||
1
and N
R
< almost surely. In the sequel, f will be held xed and n will go to
+. The introduction of n could seem articial, but it allows to present our asymptotic theoretical
results in a meaningful way. In addition, our framework is equivalent to the observation of a n-
sample of a Poisson process with common intensity f with respect to the Lebesgue measure. Since
N is a random countable set of points, we denote by dN the discrete random measure
TN

T
.
Hence we have for any compactly supported function g,
_
g(x)dN
x
=
TN
g(T). Now, our goal
is to estimate f by using the realizations of N.
For this purpose, we assume that f belongs to L
2
(R) and we use the decomposition of f on one
of the biorthogonal wavelet bases described in Section 2.2. We recall that, as classical orthonormal
wavelet bases, biorthogonal wavelet bases are generated by dilations and translations of father and
mother wavelets. But considering biorthogonal wavelets allows to distinguish, if necessary, wavelets
for analysis (that are piecewise constant functions in this paper) and wavelets for reconstruction
with a prescribed number of continuous derivatives. Then, the decomposition of f on a biorthogonal
wavelet basis takes the following form:
f =
kZ
k
+
j0
kZ
j,k

j,k
, (1.1)
Adaptive thresholding estimation of a Poisson intensity 3
where for any j 0 and any k Z,
k
=
_
R
f(x)
k
(x)dx,
j,k
=
_
R
f(x)
j,k
(x)dx.
See Section 2.2 for further details. To shorten mathematical expressions, we set
= { = (j, k) : j 1, k Z}
and for any ,
=
k
(respectively
k
) if = (1, k) and
=
j,k
(respectively

j,k
) if = (j, k) with j 0. Similarly,
=
k
if = (1, k) and
=
j,k
if = (j, k)
with j 0. Now, (1.1) can be rewritten as
f =
with
=
_

(x)f(x)dx. (1.2)
In particular, (1.2) holds for the Haar basis where in this case
. Now, let us dene the

thresholding estimate of f by using the properties of Poisson processes. First, we introduce for any
, the natural estimator of
dened by
=
1
n
_

(x)dN
x
(1.3)
that satises E(
) =
. Then, given some parameter > 0, we dene the threshold
,
=
_
2
V
,n
log n +
log n
3n
||
||
, (1.4)
with
V
,n
=

V
,n
+
_
2log n
V
,n
||
||
2
n
2
+ 3log n
||
||
2
n
2
where
V
,n
=
1
n
2
_

2
(x)dN
x
.
Note that

V
,n
satises E(
V
,n
) = V
,n
, where
V
,n
= Var(
) =
1
n
_

2
(x)f(x)dx.
Finally given some subset
n
of of the form
n
= { = (j, k) : j j
0
} ,
where j
0
= j
0
(n) is an integer, we set for any ,
1
{|
|
,
}
1
{n}
and we set

= (
. Finally, the estimator of f is
f
n,
=
(1.5)
and only depends on the choice of and j
0
xed later. When the Haar basis is used, the estimate
is denoted

f
H
n,
and its wavelet coecients are denoted

H
= (
. Thresholding procedures
have been introduced by Donoho and Johnstone [17]. The main idea of [17] is that it is sucient to
keep a small amount of the coecients to have a good estimation of the function f. The threshold
,
seems to be dened in a rather complicated manner but is in fact inspired by the universal
threshold proposed by [17] in the Gaussian regression framework. The universal threshold of [17]
is dened by
U
,n
=
_
2
2
log n, where
2
(assumed to be known) is the variance of each noisy
wavelet coecient. In our set-up V
,n
= Var(
) depends on f, so it is estimated by

V
,n
. Remark
that for xed , when there exists a constant c
0
> 0 such that f(x) c
0
for x in the support of
and if
= o
n
(n(log n)
1
), the deterministic term of (1.4) is negligible with respect to the
random one and we have asymptotically
,

_
2
V
,n
log n,
which looks like the universal threshold expression if is close to 1. Actually, the deterministic
term of (1.4) allows to consider close to 1 and to control large deviations terms for high resolution
levels. In the same spirit, V
,n
is slightly overestimated and we consider

V
,n
instead of

V
,n
to
dene the threshold.
The performance of universal thresholding by using the oracle point of view is studied in [17].
In the context of wavelet function estimation by thresholding, the oracle does not tell us the true
function, but tells us the coecients that have to be kept. This estimator obtained with the aid
of an oracle is not a true estimator, of course, since it depends on f. But it represents an ideal
for the particular estimation method. The goal of the oracle approach is to derive true estimators
which can essentially mimic the performance of the oracle estimator. For Gaussian regression,
[17] proved that universal thresholding leads to an estimator that satises an oracle inequality:
more precisely, the risk of the universal thresholding rule is not larger than the oracle risk up to
some logarithmic term which is the price to pay for not having extra information on the locations
of the coecients to keep. So the main question is: does

f
n,
satisfy a similar oracle inequality? In
our framework, it is easy to see that the oracle estimate is

f =
, where for any

n
,
1
{
2
>V
,n
}
and we have
E((
)
2
) = min(
2
, V
,n
).
By keeping the coecients

larger than thresholds dened in (1.4), our estimator has a risk that
is not larger than the oracle risk, up to a logarithmic term, as stated by the following key result.
Theorem 1. Let us consider a biorthogonal wavelet basis satisfying the properties described in
Section 2.2. Let us x two constants c 1 and c
R, and let us dene for any n, j

0
= j
0
(n)
the integer such that 2
j
0
n
c
(log n)
c
< 2
j
0
+1
. If > c, then

f
n,
satises the following oracle
inequality: for n large enough
E(||
f
n,
f||
2
2
) C
1
_
_
n
min(
2
, V
,n
log n) +
/ n
_
_
+
C
2
n
(1.6)
where C
1
is a positive constant depending only on , c and the functions that generate the biorthog-
onal wavelet basis. C
2
is also a positive constant depending on , c c
, f
1
and the functions that
generate the basis.
Note that Theorem 1 holds with c = 1 and > 1. Following the oracle point of view of Donoho
and Johnstone, Theorem 1 shows that our procedure is near optimal. The lack of optimality is due
to the logarithmic factor. But this term is in some sense unavoidable, as shown later in Theorem
6. Now, let us discuss the near optimality of our procedure from some other perspectives.
1.2 Discussion on the assumptions
Previously, we explained why it is crucial to provide theoretical results under very mild assumptions
on f. Observe that Theorem 1 is established by only assuming that f belongs to L
1
(R) (to ensure
that N
R
< almost surely) and f belongs to L
2
(R) (to obtain wavelet decomposition and the
study of the performance of

f
n,
under the L
2
-loss). In particular, f can be unbounded and nothing
is said about its support which can be unknown or even innite. The goal of this section is to discuss
this last point since, most of the time, estimation is performed by assuming that the intensity has
a compact support known by the statistician, usually [0, 1]. Of course, most of the Poisson data
are not generated by an intensity supported by [0, 1] and statisticians know this fact but they have
in mind a simple preprocessing that can be described as follows. Let us assume that we know a
constant M such that the support of f is contained in [0, M]. Then, observations are rescaled by
dividing each of them by M and new observations (that all depend on M) belong to [0, 1]. An
estimator adapted to signals supported by [0, 1] can be performed, which leads to a nal estimator
of f supported by [0, M] by applying the inverse rescaling. Note that such an estimator highly
depends on M.
Let us go further by describing the situations that may be encountered. If the observations are
physical measures given by an instrument that has a limited capacity, then the practitioner usually
knows M. In this case, if the observations are not concentrated close to 0 but are spread on the
whole interval [0, M] in a homogeneous way, then the previous rescaling method performs well. But
if one does not have access to M then we are forced in the previous method to estimate it, usually
by the largest observation. Then one is forced to face the problem that two dierent experiments
will not lead to estimators with the same support or dened at the same scale and hence it will be
hard to compare them. Note also that up to our knowledge, sharp asymptotic properties of such
rescaling estimators depending on the largest observation have not been studied. In particular, this
method does not seem to be robust if the observations are not compactly supported and if their
distribution is heavy-tailed. This situation happens for instance in the nancial and geological
examples mentioned previously (see [29, 32, 22]) but also in a wide variety of situations (see [12]).
In these cases, if observations are rescaled by the largest one, then, methods described at the
beginning of the paper provide a very rough estimate of f on small intervals close to 0. However,
most of observations may be concentrated close to 0 (for instance for geological data, see [22]) and
sharp local estimation at 0 may be of interest. To overcome this problem, statisticians with the
help of experts can truncate the data and estimate the intensity on a smaller interval [0, M
cut
]
corresponding to the interval of interest. Then, they face the problem that M
cut
may be random,
subjective, may change from a set of data to another one and may omit values with a potential
interest in the future.
So, even if partial solutions exist to overcome issues addressed by the support of f, they need a
special preprocessing and are not completely justied from a theoretical point of view. We propose
a procedure that ignores this preprocessing and which is adapted to non compactly supported
Poisson intensities. Our procedure is simple (simpler than the preprocessing described previously)
and we prove in the sequel that our method is adaptive minimax with respect to the support which
can be bounded or not.
1.3 Optimality of

f
n,
under the minimax approach
To the best of our knowledge, minimax rates for Poisson intensity estimation have not been investi-
gated when the intensity is not compactly supported. But let us mention results established in the
following close set-up: the problem of estimating a non-compactly supported density based on the
observations of a n-sample, which has been partly solved from the minimax point of view. First,
let us cite [9] where minimax results for a class of functions depending on a jauge are established
or [20] for Sobolev classes. In these papers, the loss function depends on the parameters of the
functional class. Similarly, Donoho et al. [18] proved the optimality of wavelet linear estimators
on Besov spaces B
p,q
when the L
p
-risk is considered. First general results where the loss is inde-
pendent of the functional class have been pointed out by Juditsky and Lambert-Lacroix [25] who
investigated minimax rates on the particular class of the Besov spaces B
,
for the L
-risk. When
> 2 + 1/, the minimax risk is of the same order up to a logarithmic term as in the equivalent
estimation problem on [0, 1]. However, the behavior of the minimax risk changes dramatically when
2 + 1/, and in this case, it depends on . Note that minimax rates for the whole class of
Besov spaces B
p,q
( > 0, 1 p, q ) are not derived in [25]. This is the goal of Section 3 under
the L
2
risk in the Poisson set-up.
Under mild assumptions on , , p, c and c
, we prove that the maximal risk of our procedure

over balls of B
p,q
L
is smaller than
_
log n
n
_
s
with s =
_
2
1+2
if 1 p 2
1+
1
p
if 2 p +.
We mention that actually for p > 2, it is not necessary to assume that the functions belong to
L
to derive the rate. In addition, we derive the lower bound of the minimax risk for B
p,q
L
that coincides with the previous upper bound up to the logarithmic term. Let us discuss these
results. We note an elbow phenomenon for the rate exponent s. When p 2, s corresponds to
the minimax rate exponent for estimating a compactly supported intensity of a Poisson process.
Roughly speaking, it means that it is not harder to estimate non-compactly supported functions
than compactly supported functions from the minimax point of view. When p > 2, the rate
exponent, which depends on p, deteriorates when p increases, which means that the support plays
a harmful role in this case. An interpretation of this fact and a long discussion of the minimax
results are proposed in Section 3.2. Let us just mention that these results are established by using
the maxiset approach presented in Section 3.1. We conclude this section by emphasizing that

f
n,
is rate-optimal, up to the logarithmic term, without knowing the regularity and the support of the
underlying signal to be estimated.
1.4 Overview of the paper
Section 2 recalls properties of the Poisson process and introduces the biorthogonal wavelet bases
used in this paper. Section 3 discusses the properties of our procedure in the minimax and maxiset
approaches. Section 4 provides a very general oracle type inequality based on the model selection
approach from which Theorem 1 is derived and contains the proofs of the other results.
2 Main Tools
2.1 Some probabilistic properties of the Poisson process
Let us rst recall some basic facts about Poisson processes.
Denition 1. Let (X, X) be a measurable space. Let N be a random countable subset of X. N is
said to be a Poisson process on (X, X) if
1. for any A X, the number of points of N lying in A is a random variable, denoted N
A
,
which obeys a Poisson distribution with parameter (A), where is a measure on X.
2. for any nite family of disjoint sets A
1
, ..., A
n
of X, N
A
1
, ..., N
An
are independent random
variables.
The measure , called the mean measure of N, has no atom (see [27]). In this paper, we assume
that X = R, (R) < and is absolutely continuous with respect with the Lebesgue measure.
As explained in Introduction, without loss of generality, we introduce a parameter n and we dene
the intensity of the process as f =
d
ndx
. We can also mention that a Poisson process N is innitely
divisible, which means that it can be written as follows: for any positive integer k:
dN =
k
i=1
dN
i
(2.1)
where the N
i
s are mutually independent Poisson processes on R with mean measure /k. The
following proposition (sometimes attributed to Campbell (see [27])) is fundamental and will be
used along this paper.
Proposition 1. For any measurable function g and any z R, such that
_
e
zgx
d
x
< one has,
E
_
exp
_
z
_
R
g(x)dN
x
__
= exp
__
R
_
e
zg(x)
1
_
d
x
_
.
So,
E
__
R
g(x)dN
x
_
=
_
R
g(x)d
x
, Var
__
R
g(x)dN
x
_
=
_
R
g
2
(x)d
x
.
If g is bounded, this implies the following exponential inequality. For any u > 0,
P
_
_
R
g(x)(dN
x
d
x
)
2u
_
R
g
2
(x)d
x
+
1
3
||g||
u
_
exp(u). (2.2)
2.2 Biorthogonal wavelet bases and Besov spaces
In this paper, the intensity f to be estimated is assumed to belong to L
1
L
2
. In this case, f can be
decomposed on the Haar wavelet basis and this property is used throughout this paper. However,
the Haar basis suers from lack of regularity. To remedy this problem, in particular for deriving
minimax properties of

f
n,
on Besov spaces, we consider a particular class of biorthogonal wavelet
bases that are described now. For this purpose, let us set
= 1
[0,1]
.
For any r > 0, there exist three functions ,

and

with the following properties:
1.

and

are compactly supported,
2.

and

belong to C
r+1
, where C
r+1
denotes the Holder space of order r + 1,
3. is compactly supported and is a piecewise constant function,
4. is orthogonal to polynomials of degree no larger than r,
5. {(
k
,
j,k
)
j0,kZ
, (
k
,

j,k
)
j0,kZ
} is a biorthogonal family: for any j, j
0, for any k, k

Z,
_
R
j,k
(x)
k
(x)dx =
_
R
k
(x)
,k
(x)dx = 0,
_
R
k
(x)
k
(x)dx = 1
k=k
,
_
R
j,k
(x)
,k
(x)dx = 1
j=j
,k=k
,
where for any x R and for any (j, k) Z
2
,
k
(x) = (x k),
j,k
(x) = 2
j
2
(2
j
x k)
and
k
(x) =

(x k),

j,k
(x) = 2
j
2

(2
j
x k).
This implies the wavelet decomposition (1.1) of f. Such biorthogonal wavelet bases have been built
by Cohen et al. [11] as a special case of spline systems (see also the elegant equivalent construction
of Donoho [16] from boxcar functions). The Haar basis can be viewed as a particular biorthogonal
wavelet basis, by setting

= and

= = 1
[0,
1
2
]
1
]
1
2
,1]
, with r = 0 even if Property 2 is not
satised with such a choice. The Haar basis is an orthonormal basis, which is not true for general
biorthogonal wavelet bases. However, we have the frame property: if we denote
= {, ,

,

}
there exist two constants c
1
() and c
2
() only depending on such that
c
1
()
_
_
kZ
2
k
+
j0
kZ
2
j,k
_
_
f
2
2
c
2
()
_
_
kZ
2
k
+
j0
kZ
2
j,k
_
_
.
For instance, when the Haar basis is considered, c
1
() = c
2
() = 1. In particular, we have
c
1
()||
||
2
f
n,
f
2
2
c
2
()||
||
2
2
. (2.3)
An important feature of such bases is the following: there exists a constant
> 0 such that

inf
x[0,1]
|(x)| 1, inf
xsupp()
|(x)|
, (2.4)
where supp() = {x R : (x) = 0}. This property is used throughout the paper.
Now, let us give some properties of Besov spaces that are extensively used in the next section.
We recall that Besov spaces, denoted B
p,q
in the sequel, are dened by using modulus of continuity
(see [14] and [21]). We just recall the sequential characterization of Besov spaces by using the
biorthogonal wavelet basis (for further details, see [13]).
Let 1 p, q and 0 < < r + 1, the B
p,q
-norm of f is equivalent to the norm
||f||
,p,q
=
_
_
_
||(
k
)
k
||
p
+
_
j0
2
jq(+
1
2
1
p
)
||(
j,k
)
k
||
q
p
_
1/q
if q < ,
||(
k
)
k
||
p
+ sup
j0
2
j(+
1
2
1
p
)
||(
j,k
)
k
||
p
if q = .
We use this norm to dene the radius of Besov balls. For any R > 0, if 0 <
< r + 1,
1 p p
and 1 q q
, we obviously have
B
p,q
(R) B
p,q
(R), B
p,q
(R) B
p,q
(R).
Moreover
B
p,q
(R) B
,q
(R) if
1
p

1
p
. (2.5)
The class of Besov spaces B
p,
provides a useful tool to classify wavelet decomposed signals with
respect to their regularity and sparsity properties (see [24]). Roughly speaking, regularity increases
when increases whereas sparsity increases when p decreases (see Section 3.2).
3 Minimax results via the maxiset study
We present in this section the minimax results stated in Introduction. These minimax results
are deduced from maxiset results that are rst presented. Subsection 3.1 can be omitted on rst
reading.
3.1 The maxiset approach
First, let us describe the maxiset approach which is classical in approximation theory and has been
initiated in statistics by Kerkyacharian and Picard [26]. For this purpose, let us assume that we are
given f
an estimation procedure. The maxiset study of f
consists in deciding the accuracy of f
by xing a prescribed rate
and in pointing out all the functions f such that f can be estimated
by the procedure f
at the target rate
. The maxiset of the procedure f
for this rate
is the
set of all these functions. More precisely, we restrict our study to the signals belonging to L
1
L
2
and we set:
Denition 2. Let
= (
n
)
n
be a decreasing sequence of positive real numbers and let f
= (f
n
)
n
be an estimation procedure. The maxiset of f
associated with the rate
and the L
2
-loss is
MS(f
) =
_
f L
1
L
2
: sup
n
_
(
n
)
2
E||f
n
f||
2
2
< +
_
,
the ball of radius R > 0 of the maxiset is dened by
MS(f
)(R) =
_
f L
1
L
2
: sup
n
_
(
n
)
2
E||f
n
f||
2
2
R
2
_
.
So, the outcome of the maxiset approach is a functional space, which can be viewed as an
inversion of the minimax theory where an a priori functional assumption is needed. Obviously, the
larger the maxiset, the better the procedure. Maxiset results have been established and extensively
discussed in dierent settings for many classes of estimators and for various rates of convergence.
Let us cite for instance [26], [3] and [5] for respectively thresholding rules, Bayes procedures and
kernel estimators. More interestingly in our framework, [2] derived maxisets for thresholding rules
with data-driven thresholds for density estimation.
The goal of this section is to investigate maxisets for

f
= (

f
n,
)
n
and we only focus on rates
of the form
s
= (
n,s
)
n
, where 0 < s <
1
2
and for any n,
n,s
=
_
log n
n
_
s
.
So, in the sequel, we investigate for any radius R > 0:
MS(

f
,
s
)(R) =
_
f L
1
L
2
: sup
n
_
_
log n
n
_
2s
E||
f
n,
f||
2
2
_
R
2
_
and to avoid tedious technical aspects related to radius of balls, we use the following notation. If
F
s
is a given space
MS(

f
,
s
) :=: F
s
means in the sequel that for any R > 0, there exists R
> 0 such that

MS(

f
,
s
)(R) L
1
(R) L
2
(R) F
s
(R
) L
1
(R) L
2
(R)
and for any R
> 0, there exists R > 0 such that

F
s
(R
) L
1
(R
) L
2
(R
) MS(

f
,
s
)(R) L
1
(R
) L
2
(R
).
To characterize maxisets of

f
, we set for any ,

2
=
_

2
(x)f(x)dx and we introduce the

following spaces.
Denition 3. We dene for all R > 0 and for all 0 < s <
1
2
,
W
s
=
_
f =
: sup
t>0
t
4s
1
|
t
<
_
,
the ball of radius R associated with W
s
is:
W
s
(R) =
_
f =
: sup
t>0
t
4s
1
|
t
R
24s
_
,
and for any sequence of spaces = (
n
)
n
included in ,
B
s
2,
=
_
_
_
f =
: sup
n
_
_
_
log n
n
_
2s
_
_
<
_
_
_
and
B
s
2,
(R) =
_
_
_
f =
: sup
n
_
_
_
log n
n
_
2s
_
_
R
2
_
_
_
.
These spaces just depend on the coecients of the biorthogonal wavelet expansion. In [14], a
justication of the form of the radius of W
s
and further details are provided. These spaces can be
viewed as weak versions of classical Besov spaces, hence they are denoted in the sequel weak Besov
spaces. Note that if for all n,
n
= { = (j, k) : j j
0
}
with
2
j
0
_
n
log n
_
c
< 2
j
0
+1
, c > 0
then, B
s
2,
is the classical Besov space B
c
1
s
2,
if the reconstruction wavelets are regular enough. We
have the following result.
Theorem 2. Let us x two constants c 1 and c
R, and let us dene for any n, j

0
= j
0
(n) the
integer such that 2
j
0
n
c
(log n)
c
< 2
j
0
+1
. Let > c. Then, the procedure dened in (1.5) with
the sequence = (
n
)
n
such that
n
= { = (j, k) : j j
0
}
achieves the following maxiset performance: for all 0 < s <
1
2
,
MS(

f
,
s
) :=: B
s
2,
W
s
.
In particular, if c
= c and 0 < sc
1
< r + 1, where r is the parameter of the biorthogonal basis
introduced in Section 2.2,
MS(

f
,
s
) :=: B
sc
1
2,
W
s
.
The maxiset of

f
is characterized by two spaces: a weak Besov space that is directly connected

to the thresholding nature of

f
and the space B

s
2,
that handles the coecients that are not
estimated, which corresponds to the indices j > j
0
. This maxiset result is similar to the result
obtained by Autin [2] in the density estimation setting but our assumptions are less restrictive (see
Theorem 5.1 of [2]).
Now, let us point out a family of examples of functions that illustrates the previous result. For
this purpose, we only consider the Haar basis that allows to have simple formula for the wavelet
coecients. Let us consider for any 0 < <
1
2
, f
such that, for any x R,

f
(x) = x
1
x]0,1]
.
The following result points out that if s is small enough, f
belongs to MS(

f
,
s
) (so f
can be
estimated at the rate
s
), and in addition f
. This result illustrates the fact that the classical

assumption ||f||
< is not necessary to estimate f by our procedure.

Proposition 2. We consider the Haar basis and we set c
= c. For 0 < s < 1/6, under the

assumptions of Theorem 2, if
0 <
1
2
(1 6s),
then for c large enough,
f
MS(

f
H
,
s
).
Let us end this section by explaining the links between maxiset and minimax theories. For this
purpose, let F be a functional space and F(R) be the ball of radius R associated with F. F(R)
is assumed to be included in a ball of L
1
L
2
. The procedure

f
is said to achieve the rate

s
on
F(R) if
sup
n
_
(
n,s
)
2
sup
fF(R)
E||
f
n,
f||
2
2
_
< .
So, obviously,

f
achieves the rate

s
on F(R) if and only if there exists R
> 0 such that

F(R) MS(

f
,
s
)(R
) L
1
(R
) L
2
(R
).
Using previous results, if c
= c and if properties of regularity and vanishing moments are satised

by the wavelet basis, this is satised if and only if there exists R
> 0 such that

F(R) B
c
1
s
2,
(R
) W
s
(R
) L
1
(R
) L
2
(R
). (3.1)
This simple observation will be used to prove some minimax statements of the next section.
3.2 Minimax results
To the best of our knowledge, the minimax rate is unknown for B
p,q
when p < . Let us investigate
this problem by pointing out the minimax properties of

f
on B
p,q
. For this purpose, we consider
the procedure

f
= (

f
n,
)
n
dened with
n
= { = (j, k) : j j
0
}
and j
0
= j
0
(n) is the integer such that
2
j
0
n
c
(log n)
c
< 2
j
0
+1
.
The real number c is chosen later. We also set for any R > 0,
L
1,2,
(R) = {f : ||f||
1
R, ||f||
2
R, ||f||
R} .
In the sequel, minimax results depend on the parameter r of the biorthogonal basis introduced in
Section 2.2 to measure the regularity of the reconstruction wavelets (
,

). We rst consider the
case p 2.
Theorem 3. Let R, R
> 0, 1 p 2, 1 q and R such that max

_
0,
1
p

1
2
_
< < r+1.
Let c 1 large enough such that
_
1
1
c(1 + 2)
_
1
p

1
2
.
If > c, then for any n,
sup
fB
p,q
(R)L
1,2,
(R
)
E(||
f
n,
f||
2
2
) C(, c, R, R
, , p, )
_
log n
n
_ 2
2+1
(3.2)
where C(, c, R, R
, , p, ) depends on R
, , c, on the parameters of the Besov ball and on .

When p 2, the rate of the risk of

f
n,
corresponds to the minimax rate (up to the logarithmic
term) for estimation of a compactly supported intensity of a Poisson process (see [30]), or for
estimation of a compactly supported density (see [18]). Roughly speaking, it means that it is
not harder to estimate non-compactly supported functions than compactly supported functions
from the minimax point of view. In addition, the procedure

f
achieves this classical rate up to

a logarithmic term. When p > 2 these conclusions do not remain true and we have the following
result.
Theorem 4. Let R, R
> 0, 2 < p , 1 q and R such that 0 < < r +1. Let c 1.

If > c, then for any n,
sup
fB
p,q
(R)L
1
(R
)L
2
(R
)
E(||
f
n,
f||
2
2
) C(, c, R, R
, , p, )
_
log n
n
_
+1
1
p
(3.3)
where C(, c, R, R

For p > 2, we can note that it is not necessary to assume that signals to be estimated belong
to L
to derive rates of convergence for the risk. Note that when p = , the risk is bounded
by
_
log n
n
_
1+
up to a constant. In the density estimation setting, this rate was also derived by
[25] for their thresholding procedure whose risk was studied on B
,
(R). Now, combining upper
bounds (3.2) and (3.3), for any R, R
> 0, 1 p , 1 q and R such that

max
_
0,
1
p

1
2
_
< < r + 1, we have:
sup
fB
p,q
(R)L
1,2,
(R
)
E(||
f
n,
f||
2
2
) C(, c, R, R
, , p, )
_
log n
n
_
+
1
2
+
(
1
2
1
p
)
+
under assumptions of Theorem 3. The following result derives lower bounds of the minimax risk
and states that

f
n,
is rate-optimal up to a logarithmic term.
Theorem 5. Let R, R
> 0, 1 p , 1 q and R such that max

_
0,
1
p

1
2
_
< <
r + 1. Then,
lim inf
n+
n
+
1
2
+
(
1
2
1
p
)
+
inf
f
sup
fB
p,q
(R)L
1,2,
(R
)
E(||
f
n
f||
2
2
)

C(, c, R, R
, , p, )
where

C(, c, R, R

Furthermore, let p
1 and
> 0 such that
_
1
1
c(1 + 2
)
_
1
p

1
2
. (3.4)
Then,

f
is adaptive minimax up to a logarithmic term on

_
B
p,q
(R) L
1,2,
(R
) :
< r + 1, p
p +, 1 q
_
.
Table 1 gathers minimax rates (up to a logarithmic term) obtained for each situation.
1 p 2 2 p
compact support n
2
2+1
n
2
2+1
non compact support n
2
2+1
n

+1
1
p
Table 1: Minimax rates on B
p,q
L
1,2,
(up to a logarithmic term) with 1 p, q , >
max
_
0,
1
p

1
2
_
under the
2
2
-loss.
Our results show the inuence of the support on minimax rates. Note that when restricting
on compactly supported signals, when p > 2, B
p,
(R) B
2,
(

R) for

R large enough and in this
case, the rate does not depend on p. It is not the case when non-compactly supported signals are
considered. Actually, we note an elbow phenomenon at p = 2 and the rate deteriorates when p
increases. Let us give an interpretation of this observation. Johnstone (1994) showed that when
p < 2, Besov spaces B
p,q
model sparse signals where at each level, a very few number of the wavelet
coecients are non-negligible. But these coecients can be very large. When p > 2, B
p,q
-spaces
typically model dense signals where the wavelet coecients are not large but most of them can be
non-negligible. This explains why the size of the support plays a role for minimax rates as soon
as p > 2: when the support is larger, the number of wavelet coecients to be estimated increases
dramatically.
Finally, we note that our procedure achieves the minimax rate, up to a logarithmic term. This
logarithmic term is the price we pay for considering thresholding rules. In addition,

f
is near rate-
optimal without knowing the regularity and the support of the underlying signal to be estimated.
We end this section by proving that our procedure is adaptive minimax (with the exact exponent
of the logarithmic factor) over weak Besov spaces introduced in Section 3.1. For this purpose, we
consider signals decomposed on the Haar basis, and we establish the following lower bound with
respect to W
s
. We recall that for any 0 < s <
1
2
,
n,s
=
_
log n
n
_
s
.
Theorem 6. We consider the Haar basis (the spaces W
s
and B
s
2,
introduced in Section 3.1 are
viewed as sequence spaces). Let
n
= { = (j, k) : j j
0
}
with j
0
= j
0
(n) the integer such that
2
j
0
n(log n)
1
< 2
j
0
+1
.
For 0 < s <
1
2
and R, R
, R
> 0 such that R
1 and R
R
12s
1, we have
liminf
n
2
n,s
inf
f
sup
fWs(R)B
s
2,
(R
)L
1,2,
(R
)
E(||
f
n
f||
2
2
)

C(s)R
24s
,
where

C(s) depends only on s and .
Using Theorem 2 that provides an upper bound for the risk of our procedure, we immediately
deduce the following result.
Corollary 1. The procedure

f
H
dened with
n
= { = (j, k) : j j
0
}
with j
0
= j
0
(n) the integer such that 2
j
0
n(log n)
1
< 2
j
0
+1
and with > 1 is minimax on
W
s
(R) B
s
2,
(R
) L
1,2,
(R
) and is adaptive minimax on

_
W
s
(R) B
s
2,
(R
) L
1,2,
(R
) : 0 < s <
1
2
, 1 R
, 1 R R
_
.
4 Proofs via the model selection approach
In this section, we use the model selection approach to provide a very general result with respect
to the estimation of a countable family of coecients. This result is stated in Theorem 7 and is
valid for various settings. Applied to the Poisson setting, it allows to establish Theorem 1.
4.1 Connections between thresholding and model selection
To describe the model selection approach, let us introduce the following empirical contrast: for any
family = {
, }, we set
C
n
() = 2
, (4.1)
which is an unbiased estimator of C() = || ||
2
2
||||
2
2
. Note that the minimum of C is achieved
for = . Model selection proceeds in two steps: rst we consider some family of models m
and we nd

(m) the mimimum of C
n
on each model m. Then, we use the data to select a value m
of m and we take

( m) as the nal estimator. The rst step is immediate in our setting: for any
m ,
(m) = (
1
{m}
)
and C
n
(
(m)) =
. Now, the question is : how to choose m? One could be tempted to

choose m as large as possible but this choice would lead to estimates with innite variance. For
this reason, Birge and Massart [8] proposed to introduce a penalty term associated to each model
m, denoted pen(m), and to choose m by minimizing
Crit(m) =
+ pen(m)
over a large class of possible models m. For instance, we can x a deterministic subset of
and consider all the subsets of . The role of the function m pen(m) is to govern the classical
bias-variance tradeo. Now, if we consider a family of thresholds (
and if we set for any

m
pen(m) =
,
then the model selection procedure is equivalent to the thresholding rule associated with the family
(
:
m = { : |
}
and
( m) = (
1
{|
}
1
{}
)
=

.
Let us note that our method has to be performed for signals with innite support. So, may be
innite, which is not usual in the literature. The following theorem is self-contained; we do not use
the Poisson setting and we do not make any assumption on the distribution of

or on the form
of the threshold
. So, Theorem 7 can be used for other settings and this is the main reason for
the following very abstract formulation.
Theorem 7. To estimate a countable family = (
, such that
2
< , we assume that
a family of coecient estimators (
, where is a known deterministic subset of , and a

family of possibly random thresholds (
are available and we consider the thresholding rule
= (
1
|
. Let > 0 be xed. Assume that there exist a deterministic family

(F
and three constants [0, 1[, [0, 1] and > 0 (that may depend on but not on )
with the following properties.
(A1) For all in ,
P(|
| >
) .
(A2) There exist 1 < p, q < with
1
p
+
1
q
= 1 and a constant R > 0 such that for all in ,
_
E(|
|
2p
)
_1
p
Rmax(F
, F
1
p

1
q
).
(A3) There exists a constant such that for all in such that F
<
P(|
| >
, |
| >
) F
.
Then the estimator

satises
1
2
1 +
2
E
2
E inf
m
_
_
_
1 +
2
1
2
+
1
2
m
(
)
2
+
_
_
_
+LD
with
LD =
R
2
__
1 +
1/q
_
1/q
+ (1 +
1/q
)
1/q
1/q
_
.
Observe that this result makes sense only when
< and in this case, if LD (which

stands for large deviation inequalities) is small enough, the main term of the right hand side is
given by the rst term.
Now, let us briey comment the assumptions of this theorem. The concentration inequality
of Assumption (A1) controls the deviation of |
| with respect to 0. The family (F
is introduced for Assumptions (A2) and (A3). Assumption (A2) provides upper bounds for the
moments of

and looks like a Rosenthal inequality if F
can be related to the variance of

.
Actually, compactly supported signals can be well estimated by thresholding if sharp concentration
and Rosenthal inequalities are satised (see Theorem 3 of [18] and Theorem 3.1. of [26]). In our set-
up where the support of f can be innite, these basic tools are not sucient and Assumption (A3)
is introduced to ensure that with high probability, when F
is small, then either
is estimated
by 0, or |
| is small. Remark 1 in Section 4.2 provides additional technical reasons for the
introduction of Assumption (A3) when the support of the signal is innite. Finally, the condition
< shows that the variations of (
around (
, as pointed out by Assumptions

(A2) and (A3), have to be controlled in a global way.
This theorem applied in the Poisson set-up with =
n
and
=
,
implies Theorem 1. In
particular the family (F
is given by F
=
_
supp(
)
f(x)dx, which is related to the variance of
(see (4.5)).
Using (2.3), without loss of generality, Theorems 1, 2, 3, 4 and 5 are established by using the
2
-norm of coecients instead of the functional L
2
-loss. In the following proofs, the values of the
constants C
1
, C
2
, K
1
, K
2
, , ... may change from one proof to another one. Finally, recall that we
have set for any ,
=
_

2
(x)f(x)dx.
4.2 Proof of Theorem 7
We use the model selection approach. By denition of m one has for any m ,
C
n
(
) + pen( m) C
n
(
(m)) + pen(m).
For any family = (
, we set
() =
).
Then, using (4.1),
C
n
() = || ||
2
2
||||
2
2
2().
So,
||
||
2
2
||
(m) ||
2
2
+ 2(

(m)) + pen(m) pen( m)
||
(m) ||
2
2
+ 2(
(m)) 2(
(m) (m)) + pen(m) pen( m),

where (m) = E(
(m)) is the projection of on the space of the vectors = (
such that
= 0 when / m for the

2
-norm. But,
||
(m) ||
2
2
= ||
(m) (m)||
2
2
+||(m) ||
2
2
= (
(m) (m)) +||(m) ||

2
2
and
2(
(m)) 2||
(m)||
2
(m m)
2||
||
2
(m m) + 2|| (m)||
2
(m m)
2
2
1 +
2
||
||
2
2
+
2
2
1
2
||(m) ||
2
2
+
1
2
(m m),
where we have set for any m ,
(m) = ||
(m) (m)||
2
=
m
(
)
2
=
_
(
(m) (m))
and we have used twice the inequality 2ab a
2
+
1
b
2
with = 2
2
(1 +
2
)
1
and =
2
2
(1
2
)
1
. Finally,
1
2
1 +
2
||
||
2
||
(m) (m)||
2
2
+
1 +
2
1
2
||(m) ||
2
2
+
1
2
(m m) + pen(m) pen( m)
1 +
2
1
2
||(m) ||
2
2
+
_
1
2
1
_
||
(m) (m)||
2
2
+ pen(m) +A,
where
A =
1
2
( m) pen( m) =
_
1
2
(
)
2
_
1
|
|>
.
Now, we introduce
A
1
=
E
_
1
2
(
)
2
1
|
|>
_
1
F
and
A
2
=
E
_
1
2
(
)
2
1
|
|>
1
|
|>
_
1
F
<
.
Therefore,
E[A] A
1
+A
2
.
By using the Holder inequality,
A
1

1
_
E(|
|
2p
)
_1
p
_
P(|
| >
)
_1
q
1
F
R
1
q
max(F
, F
1
p

1
q
)1
F
R
1
q
2
_
+
1
q
F
1
p
_
F
_1
q
_
R
1
q
2
_
1 +
1
q
_
and
A
2

1
_
E(|
|
2p
)
_1
p
_
P(|
| >
, |
| >
)
_1
q
1
F
<
max(F
, F
1
p

1
q
)F
1
q

1
q
1
F
<
2
_
F
1+
1
q

1
q
_
_1
q
+
1
q
1
q
1
q
(1 +
1
q
)
1
q
.
So,
E(A) LD
,
which proves Theorem 7.
Remark 1. When compactly supported signals are considered, it is natural to take satisfying
card() < and in this case, the upper bound of E(A) takes the simpler form:
E(A)
1
_
E(|
|
2p
_1
p
_
P(|
| >
)
_1
q
2
card() max
_
E(|
|
2p
_1
p
w
1
q
.
Even under a rough control of max
E(|
|
2p
), the term E(A) is negligible with respect to the
main term as soon as w is small enough, which occurs if the threshold is large enough. In particular,
when restricting our attention to compactly supported signals, Assumption (A3) is useless.
To prove Theorem 1, we use Theorem 7 with

dened in (1.3),
=
,
dened in (1.4) and
=
n
= { = (j, k) : 1 j j
0
} with 2
j
0
n
c
(log n)
c
< 2
j
0
+1
.
We set
F
=
_
supp(
)
f(x)dx,
so we have:
1jj
0
k
_
xsupp(
j,k
)
f(x)dx
_
f(x)dx
1jj
0
k
1
xsupp(
j,k
)
(j
0
+ 2)m
||f||
1
,
(4.2)
where m
is a nite constant depending only on the compactly supported functions and .

Finally,
is bounded by log(n) up to a constant that only depends on ||f||

1
, c, c
and the
functions and . Now, we give a fundamental lemma to derive Assumption (A1) of Theorem 7.
Lemma 1. For any u > 0
P
_
|
|
_
2uV
,n
+
||
||
u
3n
_
2e
u
. (4.3)
Moreover, for any u > 0
P
_
V
,n

V
,n
(u)
_
e
u
,
where
V
,n
(u) =

V
,n
+
_
2
V
,n
||
||
2
n
2
u + 3
||
||
2
n
2
u.
Proof. Equation (4.3) comes easily from (2.2) applied with g =
/n. The same inequality applied

with g =
2
/n
2
gives:
P
_
_
V
,n

V
,n
+
2u
_
R
(x)
n
4
nf(x)dx +
||
||
2
3n
2
u
_
_
e
u
.
We observe that
_
R
(x)
n
4
nf(x)dx
||
||
2
n
2
V
,n
.
So, if we set a = u
||
||
2
n
2
, then
P(V
,n
_
2V
,n
a a/3

V
,n
) e
u
.
We obtain
P(
_
V
,n
P
1
(
V
,n
)) e
u
where P
1
(
V
,n
) is the positive solution of
(P
1
(
V
,n
))
2
2aP
1
(
V
,n
) (a/3 +

V
,n
) = 0.
To conclude, it remains to observe that
V
,n
(u) (P
1
(
V
,n
))
2
=
_
_
V
,n
+ 5a/6 +
_
a/2
_
2
.

Let < 1. Combining these inequalities with

V
,n
=

V
,n
(log n) yields
P(|
| >
,
) P
_
|
|
_
2
2
log n
V
,n
+
log n||
||
3n
_
P
_
|
|
_
2
2
log n
V
,n
+
log n||
||
3n
, V
,n

V
,n
_
+P
_
|
|
_
2
2
log n
V
,n
+
log n||
||
3n
, V
,n
<

V
,n
_
P(V
,n

V
,n
) +P
_
|
|
_
2
2
log nV
,n
+
log n||
||
3n
_
n
+ 2n
3n
.
So, for any value of [0, 1[, Assumption (A1) is true with
=
,
if we take = 3n
. To
verify the Rosenthal type inequality (A2) of Theorem 7, we prove the following lemma.
Lemma 2. For any p 2, there exists an absolute constant C such that
E(|
|
2p
) C
p
p
2p
_
V
p
,n
+
_
||
||
n
_
2p2
V
,n
_
.
Proof. We apply (2.1). Hence,
=
k
i=1
_

(x)
n
_
dN
i
x
nk
1
f(x)dx
_
=
k
i=1
Y
i
where for any i,
Y
i
=
_

(x)
n
_
dN
i
x
nk
1
f(x)dx
_
.
So the Y
i
s are i.i.d. centered variables, each of them has a moment of order 2p. For any i, we
apply the Rosenthal inequality (see Theorem 2.5 of [23]) to the positive and negative parts of Y
i
.
This easily implies that
E
_
_
i=1
Y
i
2p
_
_
_
16p
log (2p)
_
2p
max
__
E
k
i=1
Y
2
i
_
p
,
_
E
k
i=1
|Y
i
|
2p
__
.
It remains to bound the upper limit of E(
k
i=1
|Y
i
|
) for all {2p, 2} 2 when k . Let us

introduce
k
= { i {1, . . . , k}, N
i
R
1}.
Then, it is easy to see that P(
c
k
) k
1
(n||f||
1
)
2
(see e.g., (4.6) below).
On
k
, |Y
i
|
= O
k
(k
) if
_

(x)
n
dN
i
x
= 0 and |Y
i
|
=
_
|
(T)|
n
_
+ O
k
_
k
1
_
|
(T)|
n
_
1
_
if
_

(x)
n
dN
i
x
=

(T)
n
where T is the point of the process N
i
. Consequently,
E
k
i=1
|Y
i
|
E
_
1
k
_
TN
_
_
|
(T)|
n
_
+O
k
_
k
1
_
|
(T)|
n
_
1
__
+kO
k
(k
)
__
+
_
P(
c
k
)
_E
_
_
_
k
i=1
|Y
i
|
_
2
_
_
. (4.4)
But we have
k
i=1
|Y
i
|
2
1
_
k
i=1
_
_
||
||
n
_
(N
i
R
)
+
_
k
1
_
|
(x)|f(x)dx
_
__
2
1
_
_
||
||
n
_
R
+k
_
k
1
_
|
(x)|f(x)dx
_
_
.
So, when k +, the last term in (4.4) converges to 0 since a Poisson variable has moments of
every order and
lim sup
k
E
k
i=1
|Y
i
|
E
_
_ _
|
(x)|
n
_
dN
x
_
_
||
||
n
_
2
V
,n
,
which concludes the proof.
Now,
V
,n
=
1
n
_

2
(x)f(x)dx
||
||
2
n
(4.5)
and Assumption (A2) is satised with =
1
n
and
R =
2Cp
2
2
j
0
max(||||
2
; ||||
2
)
n
since ||
||
2
2
j
0
max(||||
2
; ||||
2
) and
_
E(|
|
2p
)
_1
p
Cp
2
_
||
||
2
n
+||
||
2
F
1
p
n
1
p
2
_
Cp
2
||
||
2
n
_
F
+F
1
p
1
q
_
.
Finally, Assumption (A3) comes from the following lemma.
Lemma 3. We set
N
=
_
supp(
)
dN and C
= (
6 + 1/3)
6 + 1/3.
There exists an absolute constant 0 <
< 1 such that if nF
log n and (1
)(
6 +
1/3)log n 2 then,
P(N
nF
(1
)C
log n) F
.
Remark 2. We can take
= 0.01 and in this case, the result is true as soon as n 3

Proof. One takes
[0, 1] (for instance
= 0.01) such that

3(1
)
2
2(2
+ 1)
(
6 + 1/3) 4.
We use Equation (5.2) of [30] to obtain
P(N
nF
(1
)C
log n) exp
_
((1
)C
log n)
2
2(nF
+ (1
)C
log n/3)
_
n
3(1
)
2
2(2
+1)
C
.
If nF
n
1
, since
3(1
)
2
2(2
+1)
C
2 + 2, the result is true. If nF
n
1
,
P(N
nF
(1
)C
log n) P(N
> (1
)C
log n) P(N
2)
k2
(nF
)
k
k!
e
nF
(nF
)
2
(4.6)
and the result is true.
Now, observe that if |
| >
,
then
N
log n.
Indeed, |
| >
,
implies
C
log n
n
||
||
|
||
||
n
.
So if n satises (1
)(
6 + 1/3)log n 2, we set =
log (n) and = n
. In this case,
Assumption (A3) is fullled since if nF
log n
P(|
| >
, |
| >
) P(N
nF
(1
)C
log n) F
.
Finally, if n satises (1
)(
6 + 1/3)log n 2, Theorem 7 applies:

1
2
1 +
2
E||
||
2
2
inf
m
_
_
_
1 +
2
1
2
+
1
2
m
E(
)
2
+
m
E(
2
,
)
_
_
_
+LD
.
(4.7)
In addition, there exists a constant K
1
depending on p, , c, c
, ||f||
1
and on such that
LD
K
1
(log(n))
c
+1
n
c
q
1
. (4.8)
Since > c, one takes < 1 and q > 1 such that c <

2
q
and as required by Theorem 1, the last
term satises
LD

K
2
n
,
where K
2
is a constant. Before evaluating the rst term, let us state the following lemma.
Lemma 4. We set
S
= max{ sup
xsupp()
|(x)|, sup
xsupp()
|(x)|}
and
I
= min{ inf
xsupp()
|(x)|, inf
xsupp()
|(x)|}.
Using (2.4), we dene
=
S
2
I
2
. For all , we have the following result.

- If F
log (n)
n
, then
2
log (n)
n
.
- If F
>
log (n)
n
, then ||
||
log (n)
n

_
log (n)
n
.
Proof. We note = (j, k) and assume that j 0 (arguments are similar for j = 1).
If F
log (n)
n
, we have
|
| S
2
j
2
F
2
j
2
_
F
_
log (n)
n
S
I
1
_
log (n)
n

_
log (n)
n
,
since
2
I
2
2
j
F
. For the second point, observe that
_
log (n)
n
2
j
2
I
log (n)
n
and ||
||
log (n)
n
2
j
2
S
log (n)
n
.
Now, for any > 0,

E(
2
,
) (1 +)2log nE(
V
,n
) + (1 +
1
)
_
log n
3n
_
2
||
||
2
.
Moreover,
E(
V
,n
) (1 +)V
,n
+ (1 +
1
)3log n
||
||
2
n
2
.
So,
E(
2
,
) (1 +)
2
2log nV
,n
+ ()
_
log n
n
_
2
||
||
2
, (4.9)
with () a constant depending only on . Now, we apply (4.7) with
m =
_

n
:
2
>
2
n
log n
_
,
so using Lemma 4, we can claim that for any m, F
>
log (n)
n
. Finally, since
1,
E||
||
2
2
K
3
_
_
1
{
2
n
log n}
+
/ n
_
_
+K
3
n
_
log n
n

2
+
_
log n
n
_
2
||
||
2
_
1
>
2
n
log n,F
>
log (n)
n
+
K
4
n
K
3
_
_
n
_
1
{
2
V
,n
log n}
+ 2log nV
,n
1
{
2
>
2
V
,n
log n}
_
+
/ n
_
_
+
K
4
n
2K
3
_
_
n
min(
2
,
2
V
,n
log n) +
/ n
_
_
+
K
4
n
,
where the constant K
3
depends on and c and K
4
depends on , c, c
, ||f||
1
and on . Theorem 1
is proved by using properties of the biorthogonal wavelet basis.
Let us assume that f belongs to B
s
2,
(R
12s
)W
s
(R)L
1
(R)L
2
(R). Inequality (1.6) of Theorem 1
implies that, for all n,
E(||
f
n,
f||
2
2
) C
1
_
_
n
_
1
|
q
log n
n
+V
,n
log n1
|
|>
q
log n
n
_
+
_
_
+
C
2
n
where C
1
and C
2
are two constants. But we have
n
V
,n
log n1
|
|>
q
log n
n
=
log n
n
+
k=0
1
2
k1
log n
n
<2
k
k=0
2
k
1
|
|2
k+1
2
q
log n
n
k=0
2
k
R
24s
_
2
k+1
2
_
log n
n
_
4s
R
24s
2
n,s
+
k=0
2
k+2s(k+1)
and
R
24s
2
n,s
.
So,
E(||
f
n,
f||
2
2
) C(, c, , s)R
24s
2
n,s
+
C
2
n
,
where C(, c, , s) depends on , c, and s. Hence,
E(||
f
n,
f||
2
2
) C(, c, , s)R
24s
2
n,s
(1 +o
n
(1))
and f belongs to MS(

f
,
s
)(R
) for R
large enough.
Conversely, let us suppose that f belongs to MS(

f
,
s
)(R
) L
1
(R
) L
2
(R
). Then, for any n,

E(||
f
n,
f||
2
2
) R
2
_
log n
n
_
2s
.
Consequently, there exists R depending on R
and such that for any n,
R
2
_
log n
n
_
2s
.
This implies that f belongs to B
s
2,
(R).
Now, we want to prove that f W
s
(R) if R is large enough. We have
1
|
q
log n
2n
1
|
q
log n
2n
.
But

1
|
|
,
, so,
|
|1
|
,
2
|
|.
So, for any n,
1
|
q
log n
2n
+E
_
_
_
1
|
q
log n
2n
[1
|
,
2
+ 1
|
|>
,
2
]
_
_
_
n
E[(
)
2
] +
1
|
q
log n
2n
E(1
|
|>
,
2
)
n
E[(
)
2
] +
P
_
_
log n
2n
>

,
2
_
E(||
||
2
2
) +
P
_
_
log n
2n
>

,
2
_
.
Using Lemma 1,
P
_
_
2log n
n
>
,
_
P(
V
,n
V
,n
) n
and
1
|
q
log n
2n
c
1
()
1
(R
)
2
_
_
log n
n
_
4s
+
2
2
n
.
Since this is true for every n, we have for any t 1,
1
|
t
R
24s
__
2
t
_
4s
,
where R is a constant large enough depending on R
and . Note that

sup
t1
t
4s
1
|
t

2
2
.
We conclude that
f B
s
2,
(R) W
s
(R)
for R large enough.
4.5 Proof of Proposition 2
Since <
1
2
, f
L
1
L
2
. If the Haar basis is considered, the wavelet coecients
j,k
of f
can
be calculated and we obtain for any j 0, for any k
_
0, . . . , 2
j
1
_
,
j,k
= 0 and for any j 0,
for any k
_
0, . . . , 2
j
1
_
,
j,k
= (1 )
1
2
j(
1
2
)
_
2
_
k +
1
2
_
1
k
1
(k + 1)
1
_
and there exists a constant 0 < c
1,
< only depending on such that
lim
k
2
j(
1
2
)
k
1+
j,k
= c
1,
.
Moreover the
j,k
s are strictly positive. Consequently they can be upper and lower bounded, up
to a constant, by 2
j(
1
2
)
k
(1+)
. Similarly, for any j 0, for any k
_
0, . . . , 2
j
1
_
,
2
j,k
= (1 )
1
2
j
_
(k + 1)
1
k
1
_
and there exists a constant 0 < c
2,
< only depending on such that
lim
k
2
j
k
2
j,k
= c
2,
.
There exist two constants () and
() only depending on such that for any 0 < t < 1, if
j,k
= 0
|
j,k
| t
j,k
k ()t
2
+2
2
j
1
+2
and
()t
2
+2
2
j
1
+2
2
j
2
j

()t
2
3
.
So, if 2
j

()t
2
3
, since
jk
= 0 for k 2
j
,
kZ
2
j,k
1
j,k
t
j,k
= 0.
We obtain
1
|
|t
C()
+
j=1
2
j(12)
1
2
j
>
()t
2
3
2
j
1
k=1
k
22
C
()t
24
3
,
where C() and C
() denote two constants only depending on . So, for any 0 < s <
1
6
, if we take

1
2
(1 6s), then, for any 0 < t < 1, t
24
3
t
4s
. Finally, there exists c 1, such that for any n,
R
2
2
n,
,
where R > 0. And in this case,
f
, f
B
s
2,
W
s
:= MS(

f
H
,
s
).
Using the maxiset results of Section 3.1, since
MS(

f
,

1+2
) :=: B
c(1+2)
2,
W

1+2
,
it is enough to show that
B
p,q
(R) L
1,2,
(R
) B
c(1+2)
2,
(R
) W

1+2
(R
)
for R
> 0 (see (3.1)). Let f B
p,q
(R) L
1,2,
(R
). We rst prove that f W

1+2
(R
) for R
large enough. Since for any = (j, k),
min
_
max(2
j
; 1)||||
2
F
j,k
; ||f||
||||
2
2
_
,
where {, } according to the value of j, we have for any t > 0 and any

J
1
|
j<
2
j,k
t
2
+
2
j,k
_
j,k
t
|
j,k
|
_
2p
max(||||
2
; ||||
2
)t
2
j<
J
max(2
j
; 1)
k
F
j,k
+
2
j,k
_
t
_
||f||
||||
2
2
|
j,k
|
_
2p
C(, R
)
_
_
2
J
t
2
+t
2p
k
|
j,k
|
p
_
_
,
where C(, R
) is a constant only depending on and on R
. Indeed, we have used that
k
F
j,k
m
||f||
1
, (4.10)
by similar arguments to (4.2)). Now, since f belongs to B
p,
(R) (that contains B
p,q
(R), see Section
2.2), with +
1
2

1
p
> 0,
1
|
t
C
1
(, , p, R
)
_
2
J
t
2
+t
2p
R
p
2
Jp(+
1
2
1
p
)
_
,
where C
1
(, , p, R
) depends on , , p and R
. With

J such that
2
J
R
2
1+2
t
2
1+2
< 2
J+1
,
1
|
t
C
2
(, , p, R
)R
2
1+2
t
4
1+2
where C
2
(, , p, R
. So, f belongs to W

1+2
(R
) for R
large enough.
Furthermore, using (2.5), if p 2 and
_
1
1
c(1 + 2)
_
1
p

1
2
B
p,
(R) B
c(1+2)
2,
(R).
Finally, for R
large enough,
B
p,q
(R) L
1,2,
(R
) B
p,
(R) L
1,2,
(R
) B
c(1+2)
2,
(R
) W

1+2
(R
).
In this subsection since > 0 and p > 2, we set
s =

2 + 2
2
p
.
Using the maxiset results of Section 3.1, since
MS(

f
,
s
) :=: B
c
1
s
2,
W
s
,
it is enough to show that
B
p,q
(R) L
1
(R
) L
2
(R
) B
c
1
s
2,
(R
) W
s
(R
)
for R
> 0 (see (3.1)). By using (2.5), since c 1, we have

B
p,q
(R) B
p,
(R) B
c
1
s
2,
(R).
Let f B
p,q
(R) L
1
(R
) L
2
(R
). We prove that f W
s
(R
) for R
large enough. Using

computations of Section 4.6, we have for any t > 0 and any

J 0
1
|
t
C(, R
)
_
_
2
J
t
2
+
2
j,k
_
_
,
where C(, R
. Now, let us bound for all j

J
2
j,k
=
k
|
j,k
|
p
p1
|
j,k
|
2
p
p1
.
Let us apply the Holder inequality. Since p > 2, we have 2
p
p1
> 0 and
2
j,k

_
k
|
j,k
|
p
_ 1
p1
_
k
|
j,k
|
_
2
p
p1
.
Since f B
p,
(R),
_
k
|
j,k
|
p
_ 1
p1
R
p
p1
2
jp
p1
(+
1
2
1
p
)
.
Since f L
1
(R
),
k
|
j,k
| =
2
j
2
_
f(x)(2
j
x k)dx
2
j
2
||||
k
F
jk
2
j
2
||||
||f||
1
by using (4.10). Hence
2
j,k
R
p
p1
(||||
)
2
p
p1
2
j
p
p1
.
Finally,
1
|
t
C
1
(, R
)
_
2
J
t
2
+R
p
p1
2
J
p
p1
_
where C
1
(, R
. With

J such that
2
J
R
p
p+p1
t
2(p1)
p+p1
< 2
J+1
,
1
|
t
C
2
(, , p, R
)R
1
+1
1
p
t
2
+1
1
p
where C
2
(, , p, R
. So, f belongs to W
s
(R
) for R
large enough.
Finally, for R
large enough,
B
p,q
(R) L
1
(R
) L
2
(R
) B
c
1
s
2,
(R
) W
s
(R
).
To establish the lower bound stated in Theorem 5, we rst consider p 2 and 0 < < r + 1. As
usual, the lower bound of the risk
R
n
(, p) = inf
f
sup
fB
p,
(R)L
1
(R
1
)L
2
(R
2
)L(R)
E
_
||f

f||
2
2
_
,
where R, R
1
, R
2
and R
are positive real numbers, can be obtained by using an adequate version

of Fanos lemma based on the Kullback-Leibler divergence. We rst give classical lemmas that
introduce constants useful in the sequel. The rst result recalls the Kullback-Leibler divergence for
Poisson processes (see [10]).
Lemma 5. Let N and N
be two Poisson processes on R whose intensities with respect to the

Lebesgue measure are respectively s and s
. We denote P (respectively Q) the probability measures

associated with s (respectively with s
). Then, the Kullback-Leibler divergence between P and Q is

K(P, Q) =
_
X
s(x)
_
log
_
s
(x)
s(x)
__
dx
where (u) = exp(u) u 1.
Now, let us give the following version of Fanos lemma, derived from [6].
Lemma 6. Let (P
i
)
i{0,...,n}
be a nite family of probability measures dened on the same measur-
able space . One sets
K
n
=
1
n
n
i=1
K(P
i
, P
0
).
Then, there exists an absolute constant B (B = 0.71 works) such that if

is a random variable on
with values in {0, ..., n}, one has
inf
0in
P
i
(
= i) max
_
B,
K
n
log(n + 1)
_
.
Finally, we recall a combinatorial lemma due to Gallager (see Lemma 8 in [30]).
Lemma 7. Let be a nite set with cardinal Q. Let D Q. There exist absolute constants
and such that there exists M
D
P(), satisfying log |M
D
| D if D = Q and log |M
D
|
Dlog(Q/D) if D < Q and such that for all distinct sets m and m
belonging to M
D
we have
|mm
| D.
Now, we are ready to provide a lower bound for R
n
(, p). For this purpose, for a given n large
enough, we set j the largest integer such that
2
j
_
R
2Bc
2
()
1
c
_ 1
+1
1
p
_
R
1
2Bc
2
()
1
c
2
_ 1
p+p1
n
1
1
p
+1
1
p
.
The constant c
2
() was dened in Section 2.2 and c
is a constant depending only on

such that
||
kZ
0,k
||
.
We set for any ,
g
(x) =
_
x+1
0
exp
_
1
u(1u)
_
du
_
1
0
exp
_
1
u(1u)
_
du
1
[1,]
(x) + 1
],+1]
(x).
Note that = ||g
||
1
does not depend on . We also introduce the integer D such that D2
j
is
the largest integer satisfying
D2
j
R
1
n2
j
2Bc
2
()
1
c
2
2. (4.11)
In particular, D2
j
goes to when n goes to . Using Lemma 7 with = {0, 1, . . . , D 1} and
Q = D, we extract M
D
for which both properties stated in Lemma 7 are satised and we set
C
j,D
=
_
f
m
=

f
j,D
+a
j
km
j,k
: m M
D
_
,
with
a
j
=
Bc
2
()
1
c
2
j
2
n
.
The function

f
j,D
is dened by
f
j,D
(x) = 1
[0,D2
j
]
(x) +g
1
(x) +g
D2
j
1
(x)
where
=
R
1
2
j
D
1
1 + 22
j
D
1
.
Let f
m
C
j,D
. Observe that the support of
km
j,k
is included in [1, D2
j
+ 1] for n large
enough. In this case, since 2a
j
2
j
2
c
(see (4.11)), we have for x in the support of
km
j,k
f
m
(x)

2
. (4.12)
In addition for any x, f
m
(x) 0. Now, we verify that f
m
belongs to B
p,
(R) L
1
(R
1
) L
2
(R
2
)
L
(R
). We have:
||f
m
||
,p,
||
f
j,D
||
,p,
+||a
j
km
j,k
||
,p,
||
f
j,D
||
,p,
+D
1
p
a
j
2
j(+
1
2
1
p
)
||
f
j,D
||
,p,
+
_
R
1
n2
j
2Bc
2
()
1
c
2
_1
p
Bc
2
()
1
c
2
j
2
n
2
j(+
1
2
)
= ||
f
j,D
||
,p,
+ 2
j(+1
1
p
)
_
R
1
2Bc
2
()
1
c
2
_1
p
Bc
2
()
1
c
n
1
p
1
||
f
j,D
||
,p,
+
R
2
.
Finally,

f
j,D
has an innite number of continuous derivatives bounded (up to constants) by and
||
f
j,D
||
,p,
is bounded (up to a constant) by (D2
j
)
1/p
that goes to 0 when n goes to . So, for
n large enough,
||f
m
||
,p,
R.
Now, it remains to verify that f
m
L
1
(R
1
) L
2
(R
2
) L
(R
). We have
||f
m
||
+c
2
j
2
a
j
R
1
2
j
D
1
+
Bc
2
()
1
c
2
2
j
n
R
for n large enough. Using again (4.11),

||f
m
||
2
2
2||
f
j,D
||
2
2
+ 2||a
j
km
j,k
||
2
2
2
2
(D2
j
+ 2) + 2c
2
()Da
2
j
2R
1
+
R
1
B2
j
n
R
2
2
for n large enough. Since f
m
0,
||f
m
||
1
=
_
+
f
j,D
(x) +a
j
km
j,k
(x)
_
dx = D2
j
+ 2 = R
1
.
Finally, we have:
R
n
(, p) inf
f
sup
fC
j,D
E
_
||f

f||
2
2
_
.
If

f is an estimator, we can dene

f
= arg min
tC
j,D
||t

f||
2
. Then, for f C
j,D
,
||
f||
2
||

f||
2
+||
f f||
2
2||
f f||
2
and
R
n
(, p)
1
4
inf
fC
j,D
sup
fC
j,D
E
_
||f

f||
2
2
_
.
Moreover if m and m
belong to M
D
with m = m
,
||f
m
f
m
||
2
2
c
1
()a
2
j
|mm
| c
1
()Da
2
j
where c
1
() has been dened in Section 2.2. Hence
R
n
(, p)
c
1
()
4
Da
2
j
inf
fC
D
sup
fC
D
P
f
(

f = f).
To apply Lemma 6, we need to compute

K
n
. For any distinct sets m and m
belonging to M
D
,
since for any x > 1, log(1 +x) x/(1 +x) and by using (4.12), we have
K(P
f
m
, P
fm
) =
_
f
m
(log
f
m
f
m
)ndx
=
_
[f
m
f
m
f
m
log(1 +
f
m
f
m
f
m
)]ndx
_
(f
m
f
m
)
2
f
m
(x)ndx
n||f
m
f
m
||
2
2
(4.13)
2na
2
j
Dc
2
()
and

K
n

2na
2
j
Dc
2
()
. By applying Lemma 6, since

2c
2
()nDa
2
j
D
B,
we have
R
n
(, p)
c
1
()
4
(1 B)Da
2
j
c
1
()
4
(1 B)
R
1
n
2Bc
2
()
1
c
2
(Bc
2
()
1
c
)
2
2
j
n
2
(1 +o
n
(1))
CR
1
+1
1
p
n

+1
1
p
(1 +o
n
(1)),
where C is a constant that depends on , p, c
2
(), c
, , B, and R
1
, which is the stated result.
For the case p 2, by using computations similar to those of Theorem 2 of [18], it is easy to
prove that the minimax risk associated to the set of functions supported by [0, 1] and belonging to
B
p,q
(R) for 0 < < r + 1 is larger than n
2
1+2
up to a constant.
Finally, the adaptive properties of

f
are proved by combining Theorems 3 and 4 and the pre-

vious lower bound.
Let us consider the Haar basis. For j 0 and D {0, 1, . . . , 2
j
}, we set
C
j,D
= {f
m
= 1
[0,1]
+a
j,D
km

j,k
: |m| = D, m N
j
},
where
N
j
= {k :
j,k
has support in [0, 1]}.
The parameters j, D, , a
j,D
is chosen later to fulll some requirements. Note that
N
j
= card(N
j
) = 2
j
.
We know that there exists a subset of C
j,D
, denoted M
j,D
, and some universal constants, denoted
and , such that for all m, m
M
j,D
,
card(mm
) D, log (card(M
j,D
)) Dlog
_
2
j
D
_
(see Lemma 7). Now, let us describe all the requirements necessary to obtain the lower bound of
the risk.
To ensure f
m
0 and the equivalence between the Kullback distance and the L
2
-norm (see
below), the f
m
s have to be larger than /2. Since the
j,k
s have disjoint support, this means
that
2
1+j/2
|a
j,D
|. (4.14)
We need the f
m
s to be in L
1
(R
) L
(R
). Since ||f||
1
= and ||f||
= + 2
j/2
|a
j,D
|, we
need
+ 2
j/2
|a
j,D
| R
. (4.15)
The f
m
s have to belong to B
s
2,
(R
) i.e.
+ 2
js
D|a
j,D
| R
. (4.16)
The f
m
s have to belong to W
s
(R). We have
2
= . Hence for any t > 0
2
1
t
+Da
2
j,D
1
|a
j,D
|
t
R
24s
t
4s
.
If |a
j,D
| , then it is enough to have
2
+Da
2
j,D
R
24s
2s
(4.17)
and
Da
2
j,D
R
24s
_
a
2
j,D
_
2s
. (4.18)
If the parameters satisfy these equations, then
R(W
s
(R) B
s
2,
(R
) L
1,2,
(R
)) R(M
j,D
),
where R(W
s
(R)B
s
2,
(R
)L
1,2,
(R
)) and R(M
j,D
) are respectively the minimax risks associated
with W
s
(R) B
s
2,
(R
) L
1,2,
(R
) and M
j,D
. By similar arguments to those of the proof of
Theorem 5, one obtains
R(M
j,D
)
1
4
Da
2
j,D
inf
fM
j,D
(1 inf
fM
j,D
P(
f = f)).
We now use Lemma 6. Recall that (see (4.13))
K(P
f
m
, P
fm
)
2
nDa
2
j,D
.
Hence
R(M
j,D
)
(1 B)
4
Da
2
j,D
as soon as the mean Kullback Leibler distance is small enough, which is implied by
2
nDa
2
j,D
BDlog (2
j
/D). (4.19)
Let us take j such that 2
j
n/log n 2
j+1
and with D 2
j
,
a
2
j,D
=

2
4n
log (2
j
/D).
First note that (4.19) is automatically fullled as soon as 2B, that is true if an absolute
constant small enough. Then
+ 2
j/2
|a
j,D
| + 2
j/2
_
2
log n
4n
1.5.
So, if is an absolute constant small enough, (4.15) is satised. Moreover
2
1+j/2
|a
j,D
| 2
1+j/2
_
2
log n
4n
.
This gives (4.14). Now, take an integer D = D
n
such that
D
n

n
R
24s
_
n
log n
_
12s
.
For n large enough, D
n
2
j
and D
n
is feasible. We have for R xed,
a
2
j,Dn

n
C
s
2
log n
n
,
where C
s
is a constant only depending on s. Therefore,
+ 2
js
_
D
n
|a
j,Dn
| = +
_
C
s
R
12s
+o
n
(1).
Since R
12s
R
it is sucient to take small enough but constant depending only on s to obtain

(4.16). Moreover,
D
n
a
2
j,Dn

n
C
s
2
R
24s
_
log n
n
_
2s
.
Hence (4.17) is equivalent to
2
< R
24s
2s
. Since R 1, this is true as soon as < 1. Finally
(4.18) is equivalent, when n tends to +, to
C
s
2
(C
s
)
2s
.
Once again this is true for small enough depending on s. As we can choose not depending on
R, R
, R
, this concludes the proof.

Corollary 1 is completely straightforward once we notice that if R
R then for every s, R
R
24s
.
Acknowledgment. The authors acknowledge the support of the French Agence Nationale de la
Recherche (ANR), under grant ATLAS (JCJC06 137446) From Applications to Theory in Learning
and Adaptive Statistics. We would like to warmly thank Lucien Birge for his advises and his
encouragements.
References
[1] Antoniadis, A., Sardy, S., Tseng, P. Automatic smoothing with wavelets for a wide class of
distributions, Journal of Computational and Graphical Statistics, 13(2), 399421, (2004).
[2] Autin, F. Maxiset for density estimation on R, Math. Methods Statist. 15(2), 123145, (2006).
[3] Autin F., Picard D., Rivoirard V. Large variance Gaussian priors in Bayesian nonparametric
estimation: a maxiset approach, Mathematical Methods of Statistics, 15(4), 349-373, (2006).
[4] Baraud, Y., Birge L. Estimating the intensity of a random measure by histogram type estima-
tors, 2006, manuscript.
[5] Bertin K., Rivoirard V. Maxiset in sup-norm for kernel estimators, to appear in Test, (2008)
[6] Birge, L. A new look at an old result: Fanos Lemma, 2001, manuscript.
[7] Birge, L. Model selection for Poisson processes, 2006, manuscript.
[8] Birge, L., Massart P. Minimal penalties for Gaussian model selection, Probab. Theory Related
Fields,138(1-2),3373 (2007).
[9] Bretagnolle, J., Huber, C. Estimation des densites: risque minimax, Z. Wahrsch. Verw. Gebiete
47(2), 119137, (1979).
[10] Cavalier, L., Koo, J.Y. Poisson intensity estimation for tomographic data using a wavelet
shrinkage approach, IEEE Trans. Inform. Theory 48(10), 27942802, (2002).
[11] Cohen, A., Daubechies, I., Feauveau, J.C. Biorthogonal bases of compactly supported wavelets,
Comm. Pure Appl. Math. 45(5), 485560, (1992).
[12] Coronel-Brizio, H.F., Hernandez-Montoya, A.R. On tting the Pareto-Levy distribution to
stock market index data: Selecting a suitable cuto value Physica A: Statistical Mechanics and
its Applications, 354, 437-449, (2005).
[13] Delyon, B., Juditsky, A. On the computation of wavelet coecients, J. Approx. Theory 88(1),
4779, (1997).
[14] DeVore, R.A., Lorentz, G.G. Constructive approximation, Springer-Verlag, Berlin, 1993.
[15] Donoho, D.L. Nonlinear wavelet methods for recovery of signals, densities, and spectra from
indirect and noisy data, Dierent perspectives on wavelets (San Antonio, TX, 1993), 173205,
Proc. Sympos. Appl. Math., 47, Amer. Math. Soc., Providence, RI, (1993).
[16] Donoho, D.L. Smooth wavelet decompositions with blocky coecient kernels, Recent advances
in wavelet analysis, Wavelet Anal. Appl., 3, Academic Press, Boston, MA, 259308, (1994).
[17] Donoho, D.L., Johnstone, I.M. Ideal spatial adaptation by wavelet shrinkage, Biometrika,
81(3), 425455, (1994).
[18] Donoho, D.L., Johnstone, I.M., Kerkyacharian G., Picard D. Density estimation by wavelet
thresholding, Annals of Statistics, 24(2), 508539, (1996).
[19] Figueroa-Lopez, J.E., Houdre, C. Risk bounds for the non-parametric estimation of Levy pro-
cesses, IMS Lecture Notes-Monograph series High Dimensional Probability, 51, 96-116, (2006).
[20] Golubev, G.K. Nonparametric estimation of smooth densities of a distribution in L
2
, Problems
Inform. Transmission 28(1), 4454, (1992).
[21] Hardle, W., Kerkyacharian, G., Picard, D., Tsybakov, A. Wavelets, approximation and statis-
tical applications, Lecture Notes in Statistics, 129, Springer-Verlag, New York, 1998.
[22] Houghton, J.C., Use of the truncated shifted Pareto distribution in assessing size distribution
of oil and gas elds, Mathematical geology 20(8), 907937, (1988).
[23] Johnson, W.B. Best Constants in Moment Inequalities for Linear Combinations of Independent
and Exchangeable Random Variables, Annals of probability 13(1), 234253, (1985).
[24] Johnstone, I.M. Minimax Bayes, asymptotic minimax and sparse wavelet priors. Statistical
decision theory and related topics, V (West Lafayette, IN, 1992), 303326, Springer, New
York, 1994.
[25] Juditsky, A., Lambert-Lacroix S. On minimax density estimation on R, Bernoulli 10(2), 187
220, (2004).
[26] Kerkyacharian, G., Picard, D. Thresholding algorithms, maxisets and well-concentrated bases,
Test 9, 283344, (2000).
[27] Kingman, J.F.C. Poisson processes. Oxford studies in Probability, 1993.
[28] Kolaczyk, E.D. Wavelet shrinkage estimation of certain Poisson intensity signals using cor-
rected thresholds, Statist. Sinica 9(1), 119135, (1999).
[29] Merton, R.C. Option pricing when underlying stock returns are discontinuous Working paper
(Sloan School of Management), 787-75 (1975).
[30] Reynaud-Bouret, P. Adaptive estimation of the intensity of inhomogeneous Poisson processes
via concentration inequalities, Probability Theory and Related Fields 126(1), 103153, (2003).
[31] Rudemo, M. Empirical choice of histograms and density estimators, Scand. J. Statist. 9(2),
6578, (1982).
[32] Uhler, R.S., Bradley, P. G., A Stochastic Model for Determining the Economic Prospects of
Petroleum Exploration Over Large Regions Journal of the American Statistical Association,
65(330), 623630, (1970).
[33] Willett, R.M., Nowak, R.D.,Multiscale Poisson Intensity and Density Estimation, IEEE Trans-
actions on Information Theory, 53(9), 3171-3187, (2007).

Near Optimal Thresholding Estimation of A Poisson Intensity On The Real Line

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Near Optimal Thresholding Estimation of A Poisson Intensity On The Real Line

Încărcat de

Drepturi de autor:

Formate disponibile

a

coincides with the previous upper bound up to

. Model selection may allow to remove the assumption on the

. Now, let us dene the

. Then, given some parameter > 0, we dene the threshold

. Finally, the estimator of f is

, where for any

R, and let us dene for any n, j

, we prove that the maximal risk of our procedure

> 0 such that

an estimation procedure. The maxiset study of f

consists in deciding the accuracy of f

by xing a prescribed rate

at the target rate

. The maxiset of the procedure f

for this rate

associated with the rate

> 0 such that

> 0, there exists R > 0 such that

, we set for any ,

(x)f(x)dx and we introduce the

R, and let us dene for any n, j

is characterized by two spaces: a weak Besov space that is directly connected

and the space B

such that, for any x R,

. This result illustrates the fact that the classical

< is not necessary to estimate f by our procedure.

= c. For 0 < s < 1/6, under the

is said to achieve the rate

achieves the rate

> 0 such that

= c and if properties of regularity and vanishing moments are satised

> 0 such that

> 0, 1 p 2, 1 q and R such that max

, , c, on the parameters of the Besov ball and on .

achieves this classical rate up to

> 0, 2 < p , 1 q and R such that 0 < < r +1. Let c 1.

, , c, on the parameters of the Besov ball and on .

> 0, 1 p , 1 q and R such that

> 0, 1 p , 1 q and R such that max

, , c, on the parameters of the Besov ball and on .

> 0 such that

is adaptive minimax up to a logarithmic term on

> 0 such that R

) and is adaptive minimax on

. Now, the question is : how to choose m? One could be tempted to

and if we set for any

, where is a known deterministic subset of , and a

are available and we consider the thresholding rule

. Let > 0 be xed. Assume that there exist a deterministic family

< and in this case, if LD (which

| with respect to 0. The family (F

and looks like a Rosenthal inequality if F

can be related to the variance of

is small, then either

< shows that the variations of (

, as pointed out by Assumptions

(m) (m)) + pen(m) pen( m),

(m)) is the projection of on the space of the vectors = (

= 0 when / m for the

(m) (m)) +||(m) ||

is a nite constant depending only on the compactly supported functions and .