Documente Academic
Documente Profesional
Documente Cultură
Spcialit Statistique
Arrt ministriel : 7 aot 2006
le (//2012)
JURY
Jacques Demongeot
Prsident
Pascal Sarda
Rapporteur
Elias Ould-Sad
Rapporteur
Mustapha Rachdi
Directeur de thse
Ali Laksaci
Examinateur
Idir Ouassou
Examinateur
Sophie Lambert-Lacroix
Examinateur
Thse prpare au sein du laboratoire AGe Imagerie et Modlisation (AGIM) dans l'cole
Doctorale Mathmatiques, Sciences et Technologies de l'Information, Informatique.
1.3
Donnes fonctionnelles . . . . . . . . . . . . . . . . . . . . . .
Donnes fonctionnelles vs semi-mtrique . . . . . . . . . . . .
1.2.1 Probabilits des petites boules . . . . . . . . . . . . .
1.2.2 Champs d'application des donnes fonctionnelles . . .
Quelques rsultats sur l'estimation non-paramtrique pour des
tionnels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Notations et hypothses . . . . . . . . . . . . . . . . .
1.3.2 Estimation de la loi conditionnelle . . . . . . . . . . .
1.3.3 Estimateur noyau de la densit conditionnelle . . . .
1.3.4 Estimation du mode conditionnel . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
modles fonc. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
.
.
.
.
15
18
21
22
.
.
.
.
.
28
28
29
30
31
2.4
2.5
2.6
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Global and local bandwidth selection rules . . . . . . . . . . .
Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Some interpretations and examples on our hypotheses
2.3.3 Two theorems on global and local criteria . . . . . . .
Discussion and applications . . . . . . . . . . . . . . . . . . .
2.4.1 On the applicability of the method . . . . . . . . . . .
2.4.2 On the nite-sample performance of the method . . .
2.4.3 A real data application . . . . . . . . . . . . . . . . . .
Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix : Proofs of technical lemmas . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
38
40
40
41
43
44
44
47
51
54
56
Bibliography
62
Introduction . . . . . . . . . . . . . . . .
Model . . . . . . . . . . . . . . . . . . . .
Pointwise almost complete convergence . .
Uniform almost complete convergence . .
Application : Conditional mode estimation
Appendix . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
4
67
68
69
71
73
73
85
Introduction . . .
Main results . . .
Concludes remarks
Appendix . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
87
89
91
93
102
5 On the quadratic error of the functional local linear estimate of the conditional density
107
5.1
5.2
5.3
5.4
5.5
Introduction .
The model . . .
Main results . .
Some comments
Proofs . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
and discussion
. . . . . . . . .
Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
107
108
109
111
112
119
7 Conclusion et Perspectives
133
8 Bibliographie gnrale
135
7.1
7.2
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Rsum
Dans cette thse, nous nous intressons l'estimation non paramtrique de la densit conditionnelle d'une variable rponse relle conditionne par une variable explicative
fonctionnelle de dimension ventuellement nie.
Dans un premier temps, nous considrons l'estimation de ce modle par la mthode
double noyaux. Nous proposons une mthode de slection pour le choix du paramtre de
lissage (global ou local) des paramtres de lissage et nous montrons son optimalit asymptotique dans le cas o les observations sont indpendantes et identiquement distribues. Le
critre adopt est issu du principe de validations croises. Dans cette partie nous comparons
galement les deux types de choix (local et global).
Dans la deuxime partie, nous estimons la densit conditionnelle par la mthode des
polynmes locaux. Sous certaines conditions, nous tablissons des proprits asymptotiques
de cet estimateur tel la convergence presque complte et la convergence en moyenne quadratique dans le cas o les observations sont indpendantes et identiquement distribues. Nous
traitons aussi le cas o les observations sont de type - mlangeantes, dont on montre la
convergence presque complte (avec vitesse) de l'estimateur propos. Les rsultats obtenus
sont galement illustrs par des exemples sur des donnes simules montrant l'applicabilit
rapide et facile de cette mthode d'estimation dans le cadre fonctionnel.
Summary
In this thesis, we consider the problem of the nonparametric estimation of the conditional
density when the response variable is real and the regressor is valued in a functional space.
In the rst part, we use the double kernels method as a estimation method where we
focus on the choice of the smoothing parameters. We construct data a driven method to select
optimally bandwidths parameters. As main results, we study the asymptotic optimality
of this selection's method in the case where observations are independent and identically
distributed. Our selection rule is based on the classical cross-validation procedure and it
deals with the both (global or local ) choice. The nite sample performance of our approach
is illustrated by some simulation results where we give a comparison between the two types
of choice (local or global).
In the second part, we estimate the conditional density by the local linear method.
Under some general conditions, we establish the almost complete convergence of the proposed
estimator (with rate) in the both cases ( i.i.d. case and the -mixing case) . As application,
we use the conditional density estimator to estimate the conditional mode estimation and
we derive the same asymptotic proprieties.
Further, we study the quadratic error of this estimator by giving the asymptotic expansion of the exact expression involved in the leading in the bias and variance terms.
Recent Advances in
Functional Data Analysis and Related Topics Contributions to Statistics, PhysicaVerlag/Springer, 2011, 85-90, DOI : 10.1007/978-3-7908-2736-1_13
4. A. Laksaci, F. Madani and M. Rachdi. Kernel conditional density estimation when the
regressor is valued in a semi-metric space. Accept pour publication dans : Communications Statistics-Theory and Methods, 2012.
Local bandwidth selection for kernel conditional density estimation when the regressor
2.
Local bandwidth selection for kernel conditional density estimation when the regressor
3.
Some asymptotics for conditional parameters when the data are curves. International
Conference on Statistics,
Introduction gnrale
0.1
La statistique non paramtrique connat un grand essor chez de nombreux auteurs et dans
dirents domaines. En eet, celle-ci possde un champ d'application trs large permettant,
ainsi, l'explication de certains phnomnes mal modliss jusqu' prsent, tels que les sries
chronologiques, et prdire les ralisations futures.
Il faut mentionner, par ailleurs, que les progrs atteints dans les procds de recueil de donnes ont permis d'orir la possibilit aux statisticiens de disposer de plus en plus souvent
d'observations de variables dites fonctionnelles, c'est--dire de courbes. Ces donnes sont
modlises comme tant des ralisations d'une variable alatoire prenant ses valeurs dans
un espace abstrait de dimension ventuellement nie. Dans cette thse, nous nous intressons l'estimation non paramtrique de la densit conditionnelle et les paramtres qui en
dcoulent, comme le mode conditionnel, pour des variables alatoires fonctionnelles.
Dans le but de prsenter les travaux que nous avons ralis durant la ralisation de cette
thse, celle-ci est organis comme suit :
Le chapitre suivant, est un chapitre Introductif, qui prsente une tude bibliographique des
problmes lis l'analyse statistique des variables fonctionnelles ainsi qu' l'estimation non
paramtrique des paramtres conditionnels que ce soit dans le cadre de dimension nie ou
innie. Ensuite, dans le chapitre 1, nous abordons l'tat de l'art des variables fonctionnelles
et leurs champs d'application. De plus, an de rendre la lecture de cette thse simple, nous
exposons les rsultats obtenus, dans la littrature, concernant l'estimation de la densit et
du mode conditionnels, tout en fournissant et discutant les hypothses qui ont permis d'obtenir ces rsultats.
Dans le chapitre 2, nous commenons par construire et tudier les proprits asymptotiques
de l'estimateur noyau de la densit conditionnelle quand la variable explicative est valeurs dans un espace norm. Ensuite, nous proposons deux critres (le premier global et
le second local ) de choix automatique du paramtre de lissage an de rendre ecace notre
9
10
estimation. Enn, nous tablissons les rsultats thoriques ainsi que pratiques d'optimalit
asymptotique du paramtre slectionn.
Une suite logique de ce chapitre veut que l'on amliore les rsultats obtenus. C'est pourquoi le chapitre 3 est consacr l'tude d'une mthode d'estimation non paramtrique de
la densit conditionnelle d'une variable scalaire Y sachant une variable fonctionnelle X i.e.,
une variable valeurs dans un espace semi-mtrique. Cette mthode est base sur une estimation par polynmes locaux. Une fois la construction de notre estimateur, l'image de ce
qui se fait en dimesnion nie, est acheve, nous nous sommes attel tablir sous certaines
conditions, les convergences ponctuelle et uniforme presques compltes ainsi que les vitesses
de convergence de cet estimateur. Nous avons utilis, ensuite, les rsultats obtenus an de
dterminer les proprits asymptotiques de l'estimateur local linaire du mode conditionnel.
Le chapitre 4 quant lui, est destin l'tude, sous certaines conditions de dpendance
faible (mlange fort), de la convergence forte de l'estimateur du chapitre prcdent, ainsi
qu' la prvision d'une srie temporelle par l'estimation du mode conditionnel.
Tandis que dans le chapitre 5, nous avons tabli les vitesses de convergence dans l'estimation en moyenne quadratique de l'estimateur tudi dans les deux chapitres prcdents, le
chapitre 6 est consacr la mise en application de ces rsultats pour des donnes simules
puis pour des donnes relles.
Enn, dans le chapitre 7 nous exposons des perspectives de recherche permettant d'tendre
et parfois de gnraliser les rsultats de cette thse.
0.2
Contexte bibliographique
L'analyse statistique pour des variables fonctionnelles a pris une ampleur considrable ces
dernires annes. Ce domaine de recherche en statistique connat actuellement un grand
succs auprs de la commuaut des statisticiens. La preuve de cet intrt est la publication
de nombreuses publications scientiques sur ce sujet ainsi que les nombreuses applications
pratiques auquelles ces donnes s'y prtent. C'est le cas, notamment, lorsque l'on s'intresse aux techniques d'estimation quand les donnes sont fonctionnelles (cf. Kneip et Gasser
(1992), Ramsay et Li (1996), Rice et Silverman (1991)). Il existe, en fait, deux principales
raisons l'engouement suscit par le traitement statistique des variables fonctionnelles : (1)
cela permet d'utiliser et de dvelopper des outils thoriques performants, (2) cela ore un
norme potentiel en terme d'applications, notamment, en imagerie, en agro-alimentaire, en
reconnaissance de formes, en gophysique, en conomtrie, en environnement, . . .. De plus,
cette thmatique de recherche couvre tous les domaines concerns par la comunaut de statisticiens : des plus appliqus aux plus thoriques sans prdominance de l'une sur l'autre.
D'abord, signalons les eorts considrables qui ont t dploys pour la gnralisation des
0.2.
Contexte bibliographique
11
rsultats connus et tablis en dimension nie grce l'ouvrage de Ferraty et Vieu (2006).
Celui-ci est devenu une rfrence en statistique non-paramtrique pour des donnes fonctionnelles. Notons que, l'analyse des donnes statistiques fait toujours intervenir le facteur
dimension dans le comportement asymptotique des estimateurs tablis. D'autant plus qu'il
est connu que les vitesses de convergence se dgradent au fur et mesure que la dimension
augmente. Rappelons ici que les mthodes bases sur la dicrtisation des donnes fonctionnelles ont t adoptes pour adapter les rsultats de la statistique non-paramtrique au cas
de donnes multivaries.
Vu l'avance qu'a connu l'outil informatique dans la faon de rcolter les donnes, d'autres
alternatives sont devenues obligatoires an de surmonter cette dicult et d'tudier les donnes dans leurs propre dimensions.
D'ailleurs, le traitement des donnes en tant que courbes remonte aux annes soixantes
lorsque plusieurs tudes dans direntes disciplines se sont confrontes des observations
sous forme de trajectoires (cf. entre autres, Holmstrom (1961) en climatologie, Deville (1974)
en dmographie, Molenaar et Boomsma (1987) puis Kirkpatrick (1989) en gntique,...)
Il est bien connu qu'en statistique, le modle de rgression (paramtrique ou non-paramtrique)
en dimension nie, constitue un champ de recherche et d'application trs important, nous
renvoyons ici aux travaux de Collomb (1981, 1985) qui ds le dbut des annes quatre-vingt
font dj tat de nombreux dveloppements varis sur ce thme. Il convient, galement, de
se rfrer aux ouvrages de Hrdle (1990), Bosq et Lecoutre (1987) et Schimek (2000) qui
dressent un bilan presque exhaustif sur les diverses techniques en la matire. Ces champs
de la recherche en statistique sont encore potentiellement porteurs la fois au niveau des
dveloppements thoriques et cause des multiples possibilits d'application.
Par ailleurs, les applications lies au modle de rgression ont une place trs importante
dans la prvision des sries chronologiques issues de direntes disciplines telles que la communication, les systmes de contrle, la climatologie ainsi que l'conomtrie. Il s'agit, donc,
de domaines de prvision pour lesquels les premiers rsultats consquents furent implants
par Collomb (1981) et Robinson (1983). Ce domaine de la statistique connat des dveloppements continus, comme en tmoignent les nombreuses ralisations (cf. Gyet al. (1989),
Yoshihara (1994), Hrdle et al. (1997) et Bosq (1991),...)
Commenons par signaler que, l'estimation de la loi de probabilit ou de la fonction de
distribution joue un rle important dans l'estimation d'autres paramtres fonctionnels. Les
premiers travaux concernant l'estimation de la loi de probabilit des variables fonctionnelles
ont t raliss par Geroy (1974), Gasser et al. (1998). Notons aussi que, Cadre (2001) s'est
intress l'tude de la mdiane d'une distribution pour une variable fonctionnelle valeurs
dans un espace de Banach.
12
Nous faisons remarquer que les paramtres conditionnels, tels que la distribution conditionnelle, la densit conditionnelle, le mode conditionnel, le quantile conditionnel et la fonction
de hasard conditionnelle, sont largement tudis en dimension nie. A travers ces paramtres, la prvision dans les modles non-paramtriques ore une vritable alternative
la rgression non paramtrique. Il faut dire qu'en dimension nie, il existe une litrature
abondante pour ces paramtres conditionnels. Roussas (1968) fut le premier tablir des
proprits asymptotiques pour l'estimateur noyau de la distribution conditionnelle, pour
des donnes markoviennes, pour lesquelles il a montr la convergence en probabilit. Youndj
(1993) quant lui, il s'est intress l'tude de la densit conditionnelle pour des donnes
dpendantes ou indpendantes. On peut, notamment, citer le travail men par Laksaci et
Yousfate (2002) et dans lequel ils ont tabli, pour un processus markovien stationnaire, la
convergence en norme Lp de l'estimateur noyau de la densit conditionnelle.
Vu l'intrt que revt l'estimation du mode et du mode conditionnel dans le domaine de
la prvision, plusieurs auteurs s'en sont intresss. Nous pouvons citer par exemple, Perzen
(1962) qui a t l'un des premiers considrer le probme de l'estimation du mode d'une
densit de probabilit univarie. Il a montr que, sous certaines conditions, l'estimateur du
mode obtenu en maximisant un estimateur noyau est convergent et est asymptotiquement
normal quand les donnes sont indpendantes et identiquement distribues (i.i.d). Les techniques de base qu'il a developp pour cette tude ont t reprises par de nombreux auteurs
dans le cas de la densit de probabilit ou de la rgression. Nous n'avons mentionn ici que
les principales contributions, en ayant essentiellement en vue la normalit asymptotique.
Notons aussi que Nadaraya (1965) et VanRyzin (1969) ont dmontr la convergente forte de
l'estimateur du mode mis en place par Perzen, alors que Samanta (1973) et Konakov (1974)
ont tudi des versions multivaries de cet estimateur. Les travaux d'Eddy (1980 et 1982),
quant eux, ils ont permis d'aaiblir les conditions susantes de normali asymptotique qui
aurait t donnes initialement. Par ailleurs, grce des conditions locales, Romano (1980),
a aaibli les hypothses prcedentes. Notons aussi que Vieu (1996) a compar deux estimateurs noyau du mode dont le premier est dni partir du maximun d'un estimateur de la
densit de probabilit et le second partir du zero d'un estimateur de la drive de celle-ci.
Ce travail a t repris par Rachdi et Sabre (2000) an d'estimer le mode de la densit de
probabilit quand les donnes sont entaches d'erreurs additives (les problmes de dconvolution). Il y a aussi, entre autres, Louani (1998) qui a tabli la normalit asymptotique pour
la densit et ses drives avec application au mode.
Concernant le mode conditional, les proprits de convergence et de normalit asymptotiques ont t tablies par Samanta et Thaavaneswaran (1990) dans le cadre de donnes
indpendantes et identiquement distribues, alors que des conditions de convergence dans
le cas de donnes -mlangeantes ont t tablies par Collomb et al. (1987), dans le cas
de donnes -mlangeantes par Ould-Sad (1993), dans le cas de donnes ergodiques par
Rosa (1993) et Ould-Sad (1997). De leur cot, Quintela et Vieu (1997) ont estim le mode
conditionnel comme tant le point annulant la drive d'ordre un de l'estimateur de la densit conditionnelle et ils ont tabli la convergence presque complte de cet estimateur sous
0.2.
Contexte bibliographique
13
la condition d'-mlangeance. Berlinet et al. (1998), quant eux, ils ont prsent des rsultats sur la normalit asymptotique des estimateurs convergents du mode conditionnel,
indpendamment de la structure de dpendance des donnes avec une application au cas
d'un processus stationnaire -mlangeant. Tandis que Louani et Ould-Sad (1999) ont tabli la normalit asymptotique dans le cas de donnes fortement mlageantes et dans le cas
de donnes censures. Ould-Sad et Cai (2005), quant eux, ils ont tabli la convergence
uniforme sur un compact.
Par ailleurs, dans le cadre de donnes valeurs dans un espace de dimension eventuellement
nie, les travaux de Ramsay et Silverman (2002 et 2005) constituent un recueil important de
mthodes statistiques, principalement du point de vue pratique, mais des dveloppements
thoriques peuvent tre trouvs dans Bosq (2000) et Ferraty et Vieu (2006).
Une contribution qui s'avre importante dans la construction de l'estimateur des paramtres
dans le modle de rgression linaire est celle qui est due Cardot et al. (1999). Elle consiste
en la construction d'un estimateur pour l'oprateur de rgression partir des proprits
spectrales de l'estimateur empirique de l'oprateur de covariance de la variable explicative
fonctionnelle. Ils ont tabli, galement, les convergences en probabilit et presque sre de
l'estimateur construit. Ce travail a t revisit dans Cuevas et al. (2002). Dans celui-ci,
une tude des proprits asymptotiques de l'estimateur de l'oprateur de rgression linaire
quand la variable explicative est fonctionnelle dterministe et la rponse est fonctionnelle
alatoire a t conduite. Cardot et al. (2004a, 2004b et 2005) ont propos et tudi des
mthodes d'estimation linaire de l'oprateur de rgression par quantiles conditionnels. Une
autre mthode d'estimation des quantiles conditionnels partir de l'estimation noyau de
la fonction de rpartition conditionnelle a galement t propose et tudie par Ferraty et
al. (2005), Ferraty et al. (2006), Ferraty et Vieu (2006a) et Ezzahrioui (2007). D'autres mthodes ont t proposes an d'estimer la rgression par le mode conditionnel. Celles-ci sont
bases sur l'estimation de la densi conditionnelle par des estimateurs noyau (cf. Ferraty
et al. (2005), Ferraty et Vieu (2006a), Ferraty et al. (2006), Dabo-Niang et Laksaci (2006)
et Ezzahrioui (2007)).
Donc, l'estimation de la densit conditionnelle en dimension ventuellement nie a connu
un grand intrt en statistique. Ce paramtre fonctionnel intervient pour l'estimation des
quantiles, du mode ou de la fonction de hasard.
Signalons, qu'en dimension innie, le mode conditionnel a connu tout rcemment un intrt
croissant, malgr le peu de rsultats disponibles dans la littrature. Dans ce contexte, les
premiers travaux ont t raliss par Ferraty et al. (2006). Ils ont montr, sous des conditions
de rgularit de la densit conditionnelle, la convergence presque complte des estimateurs
noyau de la densit conditionnelle et du mode conditionnel et ont tabli leurs vitesses de
convergence. Notons aussi qu'une application de leurs rsultats aux donnes issues de l'industrie agro-alimentaire a t prsente. Dans le mme contexte, Dabo-Niang et al. (2004)
ont tudi un estimateur non paramtrique du mode de la densit d'une variable explicative
14
valeurs dans un espace vectoriel semi-norm, de dimension eventuellement nie. Ils ont
tabli la convergence presque sre avec une application de ce rsulat au cas o la mesure
de probabilit de la variable explicative vrie une condition de concentration. On trouve
aussi dans Dabo-Niang et Laksaci (2007) l'tude d'un estimateur noyau du mode de la
distribution d'une variable relle Y conditionne par une variable explicative X , valeurs
dans un espace semi-mtrique. Ils ont tabli la convergence en norme Lp de l'estimateur et
ils ont montr que les rsultats asymptotiques tablis sont lis aux probabilits des petites
boules de la loi de la variable explicative ainsi que la rgularit de la densit conditionnelle.
Notons galement, qu'il y a deux autres paramtres fonctionnels qui sont d'une grande importance savoir, le quantile et le quantile conditionnel. Ces paramtres proposent une
alternative majeure dans la prvision, grce leur caractre robuste (cf. par exemple, les
travaux de Cardot et al. (2004a, 2004b, 2005 et 2006), Ferraty et al. (2005b) et (2006)).
Pour terminer ce rapide tour d'horizon, non exhaustif, armons que d'un point de vue
thorique, l'utilisation de variables alatoires fonctionnelles introduit une dicult
supplmentaire puisqu'on ne peut plus se permettre de manipuler la fonction de densit
de probabilit aussi facilement que dans le cas rel ou encore dans le cas vectoriel. On est
donc amen donner une criture probabiliste qui nous conduit des hypothses agissant
directement sur la distribution de la variable alatoire fonctionnelle plutt que sur la densit,
comme dans le cas de dimension nie.
Chapitre 1
Introduction aux donnes
fonctionnelles et l'estimation de la
densit conditionnelle
Dans ce chapitre, nous prsentons, d'abord, quelques notions sur l'analyse des donnes fonctionnelles et son champ d'application, et puis, les rsultats existants dans la littrature sur
l'estimation de la densit conditionnelle.
1.1
Donnes fonctionnelles
16
Dnition 1.1.1.
Dnition 1.1.2.
F,
F
IFF
F (IFF
F ).
De nombreux travaux ont t ddis l'tude des modles impliquant des variables alatoires multivaries. Ce domaine de la statistique connait encore une activit de recherche
soutenue. Cependant, les rcentes innovations ralises sur les appareils de mesure et les mthodes d'acquisition ainsi que l'utilisation de moyens informatique perfectionns permettent
souvent de rcolter des donnes discrtises sur des grilles de plus en plus nes, ce qui les
rend fondamentalement fonctionnelles : c'est par exemple le cas en mtorologie, en mdecine, en imagerie satellite et dans de nombreux autre domaines d'tudes. C'est une des
raisons pour lesquelles un nouveau champ de la statistique ddi l'tude de donnes fonctionnelles, a soulev un grand d au dbut des annes quatre-vingt, sous l'impulsion des
travaux de Grennder (1981), Dauxois et al. (1982) et Ramsay (1982). En fait, ce domaine
a t popularis par Ramsay et Silverman (1997), puis par les dirents ouvrages de Bosq
(2000), Ramsay et Silverman (2002, 2005) et Ferraty et Vieu (2006). Notons que c'est un des
domaines de la statistique qui est en plein essor comme en tmoignent les travaux publis
et/ou cits dans des revues de premiers rangs, , etc.
De plus, mme si les donnes dont dispose le statisticien ne sont pas de nature fonctionnelle,
celui-ci peut tre amen tudier des variables fonctionnelles construites partir de son
chantillon initial. Un exemple classique est celui o l'on observe plusieurs chantillons de
donnes relles indpendantes et o l'on est ensuite amens comparer les densits de ces
dirents chantillons ou bien considrer des modles o elles interviennent (cf. Ramsay et
Silverman, 2002). Dans le contexte particulier de l'tude des sries temporelles, l'approche
introduite par Bosq (1991) fait apparatre une suite de donnes fonctionnelles dpendantes
qui modlisent la srie chronologique observe. Cette approche consiste tout d'abord considrer le processus non pas travers sa forme discrtise mais comme tant un processus
temps continu puis le dcouper en un chantillon de courbes successives.
1.1.
Donnes fonctionnelles
17
Remarquons que la principale source de dicult, que ce soit d'un point de vue thorique
que pratique, provient du fait que les observations de ce type de variables sont supposes
appartenir un espace de dimension innie.
Les tous premiers travaux dans lesquels nous retrouvons l'ide de considrer les donnes
fonctionnelles sont relativement anciens. Rao (1958) et Tucker (1958) ont envisag l'analyse
en composantes principales et l'analyse factorielle pour des donnes fonctionnelles, en considrant explicitement les donnes fonctionnelles comme un type particulier de donnes. Par
la suite, Ramsay (1982) a dgag la notion de donnes fonctionnelles et a soulev la question
de l'adaptation des mthodes utilises en analyse statistique de donnes multivaries (en
dimension nie) au cadre fonctionnel.
A partir de l, les travaux portant sur la statistique des donnes fonctionnelles ont commenc
se multiplier pour nalement aboutir, aujourd'hui, des ouvrages devenus des rfrences en
la matire. Par exemple, les monographies de Ramsay et Silverman (2002 et 2005), Ferraty
et Vieu (2006) prsentent une collection importante de mthodes statistiques spciques
aux variables fonctionnelles dans les cadres linaire et non linaire. De mme, Bosq (1991)
a contribu au dveloppement de mthodes statistiques permettant l'analyse de variables
alatoires fonctionnelles dpendantes (processus autorgressifs hilbertiens). Citons aussi, les
travaux de Cuevas et al. (2002) qui se sont intresss au problme de la rgression linaire
d'une variable fonctionnelle sur un ensemble de donnes fonctionnelles dterministes xed
functional design. D'autre part, Benhenni et al. (2010) ont considr le problme d'estimation de l'oprateur de rgression quand les donnes fonctionnelles sont dterministes et les
erreurs sont corrles. Cardot et al. (2005) quant eux, ils ont propos un estimateur non
paramtrique de l'oprateur de rgression quand le facteur prdictif est rel et la variable
rponse est une courbe.
Par ailleurs, l'tude du modle de rgression non linaire est beaucoup plus rcente que celle
du cas linaire. Ferraty et Vieu (2000) ont tabli les premiers rsultats sur l'estimation non
paramtrique de l'oprateur de rgression non linaire. Ces rsultats ont ensuite t prolongs par Ferraty et al. (2002) en traitant le cas de donnes dpendantes et en tablissant des
convergences fortes de l'estimateur noyau de la rgression.
A leur tour, Niang et Rhomari (2003) ont tudi la convergence en norme Lp de l'estimateur
de l'oprateur de rgression et ont exprimont leur rsultats la discrimination et la
classication de courbes. Rachdi et al. (2008) ont trait le problme d'estimation non paramtrique de l'oprateur de rgression quand les erreurs vrient des proprits de longue
mmoire. Ils ont tabli aussi la convergence en probablilit ponctuelle puis uniforme de l'estimateur noyau opratoriel. Une autre contribution base sur la construction d'un critre
de choix automatique et optimal du paramtre de lissage pour l'estimateur de la rgression
quand le rgresseur est de type fonctionnel a t mene par Rachdi et Vieu (2005, 2007).
Tandis qu'El Methni et Rachdi (2011) ont tabli l'estimation locale d'une moyenne pondres de l'oprateur de rgression pour des donnes fonctionnelles dterministes. Ouassou et
18
Rachdi (2010) ont amlior ensuite cette estimation par l'estimateur de Stein.
Rappelons que, le au de la dimension rend les vitesses de convergence trs faibles. Une
manire de tenter de remdier cela est de chercher une topologie qui restitue de faon
pertinente les proximits entre les donnes. Cela peut tre fait, par exemple, l'aide d'une
semi-mtrique de projection base sur les composantes principales fonctionnelles, les dcompositions selon une base de Fourier, d'ondelettes, de splines, . . .. Lorsque la variable explicative est valeurs dans un espace de Hilbert sparable, Ferraty et Vieu (2006a, Lemme
13-6) ont montr que l'on peut dnir de manire gnrale une semi-mtrique de projection qui permet de se ramener des probabilits de petites boules de type fractal (i.e.
C, > 0, Fx (h) Cx h quand h 0). On condense ainsi les donnes en rduisant leur
dimension et on contourne ainsi le au de la dimension. En eet, on revient des vitesses
de convergence en puissance de n. Dans d'autres situations, on peut tre confront des
donnes trs lisses (comme les courbes spectromtriques de masse donnes dans la Figure
1.2). Dans ce cas de gure, il peut tre intressant d'utiliser plutt des semi-mtriques bases sur les drives (cf. Ferraty et Vieu, 2006a). Ces semi-mtriques peuvent galement tre
utiles lorsque les donnes prsentent un shift vertical articiel (i.e non informatif vis--vis
des rponses). Elles ont alors pour eet d'liminer ces dcalages verticaux qui nuisent la
qualit de la prdiction. Enn, on peut envisager d'autres types de phnomnes comme,
titre d'exemple, les dcalages horizontaux (cf. Dabo-Niang et al., 2006).
Face la grande diversit des semi-mtriques qu'on peut construire, on peut se poser la
question sur comment choisir la semi-mtrique la mieux adapte au donnes. Ceci va motiver
l'tude du problme de construction d'une semi-norme sur F .
1.2
D'une faon gnrale, l'analyse de tout type de donnes ncessite la dnition de la notion
de distance entre celles-ci. Il est bien connu que dans un espace vectoriel de dimension nie
toutes les mtriques sont quivalentes. Ceci n'est plus le cas quand l'espace d'observations
est de dimension innie. C'est pourquoi le choix de la mtrique (et donc de la topologie
associe) est un lment crucial pour l'tude des variables alatoires fonctionnelles.
De nombreux auteurs dnissent ou tudient les variables fonctionnelles comme tant des
variables alatoires de carrs intgrables c'est--dire valeurs dans L2 (0, 1) (cf. notamment,
Crambes et al., 2007) ou plus gnralement dans un espace de Hilbert (cf. par exemple,
Preda, 2007), ou de Banach (cf. Cuevas et Fraiman, 2004) ou mtrique (cf. Dabo-Niang et
Rhomari, 2003). Notons d'ailleurs que Bosq (2000), quant lui, il a considr des chantillons
de variables fonctionnelles dpendantes et valeurs dans un espace de Hilbert ou de Banach.
Ces observations fonctionnelles ont t obtenues suite au dcoupage d'un mme processus
temps continu. De plus, parmi les semi-mtriques, disponibles dans la littrature, il est
souvent plus intressant de considrer des semi-mtriques permettant un ventail plus large
1.2.
19
de topologies possibles que l'on pourra choisir en fonction de la nature des donnes et du
problme traiter.
Signalons que, l'intrt d'utiliser une semi-mtrique plutt qu'une mtrique est que cela peut
constituer une alternative aux problmes lis la grande dimension des donnes. En eet,
on peut considrer une semi-mtrique qui soit dnie partir d'une projection de nos donnes fonctionnelles sur un espace de dimension plus petite : (1) que ce soit en ralisant une
analyse en composantes principales fonctionnelles de nos donnes (cf. Dauxois et al. (1982),
Besse et Ramsay (1986), Hall et Hosseini-Nasab (2006) et Yao et Lee (2006)) ou (2) en les
projetant sur une base de cardinal ni (ondelettes, splines, . . .). Cela permet de rduire la
dimension des donnes et ainsi d'augmenter la vitesse de convergence des mthodes utilises
tout en prservant la nature fonctionnelle des donnes. D'ailleurs, on peut choisir la base sur
laquelle on projette en fonction des connaissances que l'on a de la nature de la donne fonctionnelle. Par exemple, on pourrait choisir la base de Fourier si on suppose que la variable
fonctionnelle observe est priodique. On peut se rfrer, pour cela, Ramsay et Silverman
(1997 et 2005) ou Rossi et al. (2005) pour une discussion plus complte sur les direntes
mthodes d'approximation par projection de donnes fonctionnelles. Aussi, une discussion
plus approfondie de l'intrt d'utiliser dirents types de semi-mtriques est prsente dans
le livre de Ferraty et Vieu (2006) (paragraphes 3 et 4) ainsi que dans le travail ralis par
Benhenni et al. (2007).
Pour ces direntes raisons, nous prsentons ici quelque pistes (cf. Ferraty et Vieu, 2006)
permettant de construire une semi-mtrique. En fait, nous prsentons, dans ce qui suit,
seulement deux familles de semi-mtriques mais, naturellement, beaucoup d'autres peuvent
tre construites : la premire est bien adapte aux courbes dites bruites et aux courbes
irrgulires tandis que la deuxime sera plutt employe pour le traitement de courbes tout
fait lisses (ou rgulires).
Pour ce faire, nous commenons par considrer un chantillon de n courbes X1 , . . . , Xn indpendantes et identiquement distribues de la variable alatoire fonctionnelle
X = {X(t), t [0, 1]}.
Notons que, l'analyse en composantes principales classique (ACP) est considre comme
tant un outil trs utile pour la description et la visualisation des donnes dans un espace
de dimension plus petite. Cette technique a t prolonge aux donnes fonctionnelles et plus
rcemment employe pour dirents buts statistiques. Nous verrons que le FPCA (Functional
Principal Components Analysis) est devenue un bon outil pour calculer des proximits entre
les courbes dans un espace de dimension rduite. Ainsi, partir de la semi-mtrique classique
L2 , nous pouvons construire une classe paramtrique de semi-normes, que nous noterons
SMPCA (Semi-Mtrique base sur l'ACP), de la manire suivante :
v
u q (
)2
u
ACP
t
xq
=
x(t)vk (t) dt pour tout x F
k=1
20
X (s, t) = E(X(t)X(s))
associes aux valeurs propres 1 2 q .
Signalons aussi que, l'entier q n'est pas un paramtre de lissage, mais plutt un paramtre
de rglage indiquant le niveau de rsolution auquel le problme est considr.
On en dduit une famille de semi-mtriques comme suit :
v
u q (
)2
u
ACP
t
dq (Xi , x) =
(Xi (t) x(t))vk (t) dt
(1)
k=1
Notons que, l'approximation de l'intgrale dans la formule (1) peut se faire comme suit (cf
Castro et al., 1986) :
j=1
o les poids wj = tj tj1 et la grille (t1 , ..., tJ ) est constitue de J valeurs quidistantes
dans [0,1].
Si nous discrtisons deux courbes xi et xi alors, la quantit dACP
(xi , xi ) sera approxime
q
par sa version empirique :
v
2
u
u
q
J
dACP
(xi , xi ) = t
wj (xi (tj ) xi (tj ))vk (tj )
q
k=1
j=1
En eet, cette famille de semi-mtriques peut tre utilise seulement si les donnes sont
quilibres (les courbes sont observes aux mmes points). Ceci pourrait apparatre comme
un inconvnient pour l'usage d'un tel genre de semi-mtriques mais, leur principal avantage
est d'tre utilis mme si les courbes son irrgulires. En prenant l'exemple de la prvision
de la concentration maximale de l'ozone au ple nord pendant une journe sur quatre annes successives (de 2000 2004), tant donn la courbe de cette concentration pendant la
journe prcdente (cf. Figure 1.4), nous avons choisi la norme L21,24 calcule, en utilisant ce
genre de semi- mtriques.
Une autre manire de construire une autre famille de semi-mtriques est base sur les drives, que nous allons noter par SMD (Semi-Mtrique base sur la Drive). Elle est dnie
1.2.
21
de la manire suivante :
D
dSM
(xi , xi )
q
=
0
(q)
(q)
(2)
Fx (hn ) := IP(d(X, x) hn ) o hn 0
22
Au travers des dirents rsultats de convergence concernant l'estimateur tudi dans ce mmoire (de type Nadaraya-Watson et/ou local linaire), on observe que la vitesse de convergence est fonction de la manire dont dcroissent ces probabilits de petite boules. Il existe
dans la littrature un nombre assez important de rsultats probabilistes qui tudient la manire dont ces probabilits des petites boules tendent vers 0 quand d est une norme (cf. par
exemple, Li et Shao (2001), Lifshits et al. (2006) et Gao et Li (2007)). On pourra galement
se rferer au travail de Dereich (2003, Chapitre 7) qui est consacr au comportement des
probabilits des petites boules dont les centres sont alatoires. Au travers de ces travaux
on peut voir, par exemple, que dans le cas de processus non-lisses tels que le mouvement
brownien ou le processus d'Ornstein-Uhlenbeck, ces probabilits des petites boules sont de
forme exponentielle (par rapport hn ) et que par consquent la vitesse de convergence de
nos estimateurs est en puissance de ln(n) (cf. Ferraty et al. (2006), paragraphe 5 et Ferraty
et Vieu (2006a), paragraphe 13.3.2, pour une discussion plus approfondie sur ce sujet).
Dans ce qui suit, nous allons prsenter un aperu sur l'utilit de l'analyse des donnes
fonctionnelles dans les applications.
1.2.
23
de prendre en considration la nature fonctionnelle des donnes (cf. Figure 1.1). A partir de
ces donnes, on peut s'intresser la prdiction de l'volution du phnomne partir des
donnes recueillies lors des annes prcdentes.
: Ferraty et Vieu (2002, 2003) se sont intresss des donnes
spectromtriques de masse. Ces donnes proviennent d'un problme de contrle de qualit
en industrie alimentaire. Ils ont tudi la contenance en graisse dans les morceaux de viande
tant donn les courbes d'absorption de ces morceaux de viande (cf. pour ceci Figure 1.2).
Ces donnes relles ont t utilises dans le cas o les variables sont indpendantes.
En industrie alimentaire
: dans le cadre des donnes dpendantes, on peut considrer l'exemple d'une srie chronologique qui concerne la consommation annuelle
Consommation d'lectricit aux USA
4.0
2.0
2.5
3.0
3.5
CURVES[1, ]
4.5
5.0
5.5
24
20
40
60
80
100
Index
Bref, de nombreux autres domaines d'application o l'on peut tre confront des donnes
de natures fonctionnelles existent et/sinon auent. Vu l'normit des exemples que l'on
peut citer, nous sommes incapable de prsenter dans cette thse une liste exhaustive de ces
applications. Sinon, nous nous contentons, dans la suite de ce paragraphe, d'un rapide tour
d'horizon de ces champs d'application.
: pour l'tude des variations des courbes de croissance (cf. Rao, 1958 et Figure
1.5), et plus rcemment, pour l'tude des variations de l'angle du genou durant la marche
En biologie
25
0.0
0.1
0.2
electricityconsumption[1, ]
0.1
0.2
10
12
Index
60
40
20
pollution
80
100
20002004
1.2.
10
15
20
Heure
26
: on est souvent confronts de nombreux phnomnes que l'on peut modliser par des variables fonctionnelles. Parmi ces phnomnes on peut citer la volatilit des
marchs nanciers (cf. Mller et al., 2007), le rendement d'une entreprise (cf. Kawassaki et
Ando, 2004), le commerce lectronique (cf. Jank et Shmueli, 2006) ou l'intensit des transactions nancires (cf. Laukaitis et Rackauskas, 2002). On peut se rferer Kneip et Utikal
(2001), Benko (2006) et Benko et al. (2006) pour des rfrences supplmentaires. Par ailleurs,
nous pouvons aussi citer un exemple qui consiste l'observation des uctuations d'un indice
boursier en fonction du temps : il s'agit typiquement d'une srie temporelle qu'on dcoupe
selon des sous-intervalles de l'espace temps (cf. Bosq, 2002).
En conomtrie
1.2.
Figure 1.7 Une courbe du nombre d'oeufs journaliers pondus par une mouche
27
28
Les mesures et notamment les images recueillies par satellites sont galement des donnes
dont l'tude peut tre ectue partir des mthodologies de la statistique fonctionnelle. On
peut citer, par exemple, les travaux de Vidakovic (2001) dans le domaine de la mtorologie
ou ceux de Dabo-Niang et al. (2004b, 2007) dans le domaine de la gophysique. Dans ces
travaux, on s'intresse la classication des courbes recueillies par le satellite dirents
endroits de l'amazonie, ce qui permettrait d'identier la nature du sol. Enn, citons Cardot
et al. (2003) et Cardot et Sarda (2006) qui ont tudi l'volution de la vgtation partir
de donnes satellitaires.
1.3
1.3.
29
et pour certain j 0,
Concernant la densit conditionnelle f x , on la supposera de classe C j (et telle que :
)
(H3) (y1 , y2 ) SS, (x1 , x2 ) Vx Vx , |f x1 (j) (y1 )f x2 (j) (y2 )| Cx d(x1 , x2 )b1 + |y1 y2 |b2
La condition de concentration (H1) joue un rle important. Ce genre de condition est li
la semi-mtrique d. Elle quantie et contrle les probabilites des petites boules.
b2 (1)
(t)dt < +
R |t| H
(H5) Le noyau K est support dans (0, 1), tel que, 0 < C1 < K(t) < C2 , o
C1 et C2 sont deux constantes strictement positives,
(H6) lim hK = 0 et lim
n
log n
= 0,
nx (hK )
(
) (
)
d(x,Xi )
yYi
K
H
i=1
hK
hH
(
)
, y R
n
d(x,Xi )
K
i=1
hK
n
F x (y) =
1. soit
(zn )nN
> 0,
I
P
(|z
|
>
0)
<
de convergence implique la convergence presque sure et la convergence en probabilit (cf. [13] pour plus de
dtails).
30
Thorme 1.3.1.
( )
( )
sup |IFxn (y) F x (y)| = O hbK1 + O hbH2 + O
yS
log n
n x (hK )
)
, p.co.
f(j)(y|x) =
hj1
H
i=1 K
d(x,Xi )
hK
i=1 K
H (j+1)
)
yYi
hH
d(x,Xi )
hK
)
, y R
Notons que, cet estimateur est analogue celui introduit par Rosenblatt (1969) dans le
cas o X est une variable alatoire relle. Il est aussi largement tudi depuis ce temps (cf.
Youndj, 1996). An d'tablir quelques rsultats de convergence, les hypothses suivantes
seront ncessaires :
(j+1)
H
est born
(H9) lim hK = 0 avec lim
n
log n
2j+1
nhH
x (hK )
= 0.
1.3.
31
Thorme 1.3.2.
ona :
log n
n h2j+1
x (hK )
H
, p.co.
= sup fx (y)
fx ()
yS
Notons que, l'estimateur n'est pas ncessairement unique, pour assur cette unicit et la
convergence de n , on suppose :
(H10) > 0, f x dans [ , ] et f x dans [, + ].
(H11) f x est j -fois continment direntiable par rapport y sur [ , + ],
et
x(l)
f () = 0, si 1 l < j
(H12)
Signalons que ces conditions ont une grande inuence sur la vitesse de convergence de l'estimateur (cf. le thorme ci-dessous). De plus la convergence de cet estimateur peut tre
obtenue par l'hypothse (H10) (cf. Laksaci (2005), Lemme 2.4.1).
32
Thorme 1.3.3.
vries, alors :
( b1 )
( b2 )
(
= O hKj + O hHj + O
log n
n hH x (hK )
)1
2j
, p.co.
Deux exemples d'application sont tudis. Le premier correspond au cas i.i.d. Il concerne
l'industrie agro-alimentaire (courbes spectromtriques de masse). L'autre exemple correspond au cas dpendant. Celui-ci concerne un problme de pollution (les courbes de la concentration de l'ozone sur le ple nord) (cf. Laksaci , 2005 pour plus de dtails).
Les hypothses suivantes sont ncessaires dans l'enonc du Thorme 1.3.4 :
(H13) supi=j P ((Xi , Xj )) B(x, r)XB(x, r) = x (r)x (r) > 0,
(H14) Les coecients de -mlange de la suite (Xi , Yi ) vrient la condition :
a > (5 +
,
(H15) lim hH = 0 et 1
n
(H16)
4
tel que lim n1 hH = ,
n
(a + 1)(a 2)
logn
= 0.
n hH x (hK )
et
lim hK = 0, lim
3a
x (hK ) dsigne le maximum de la concentration entre la loi marginale et les lois conjointes
de chaque couple d'observations fonctionnelles dans la boules de centre x et de rayon hK .
1.3.
33
Thorme 1.3.4.
vries, alors :
((
( b1 )
( b2 )
j
j
= O hK + O hH + O
o
b1
et
b2
log n
n hH x (hK )
)1)
2j
, p.co.
34
Chapitre 2
Kernel conditional density estimation
when the regressor is valued in a
semi-metric space
paper deals with the conditional density estimation when the explanatory variable is
functional. In fact, nonparametric kernel type estimator of the conditional density has been recently
introduced when the regressor is valued in a semi-metric space. This estimator depends on a smoothing parameter which controls its behavior. Thus, we aim to construct and study the asymptotic
properties of a data-driven criterion for choosing automatically and optimally this smoothing parameter. This criterion can be formulated in terms of a functional version of cross-validation ideas.
Under mild assumptions on the unknown conditional density, it is proved that this rule is asymptotically optimal. Finally, a simulation study and an application on real data are carried out to
illustrate, for nite samples, the behavior of our method. Finally, mention our results can also be
considered as novel in the nite dimensional setting and several other open questions are raised in
this article.
Cross-validation, functional data, kernel estimator, nonparametric model, bandwidth selection, small balls probability
Keywords.
1. Universit Djillali Liabs, BP. 89, Sidi Bel-Abbs 22000, Algeria. E-mail : alilak@yahoo.fr
2. Laboratoire AGIM FRE
3405
UFR SHS, BP. 47, 38040 Grenoble Cedex 09, France. E-mails : Mustapha.Rachdi@upmf-grenoble.fr and
Fethi.Madani@imag.fr
3. Corresponding author
35
36
2.1
Introduction
Conditional density estimation is a statistical technique that allows for a better understanding of the relationship between a response variable and a set of covariates, in comparison
with usual regression methods. Therefore, this technique is of great importance in many
scientic elds where knowledge about conditional means, obtained by regression methods,
is not enough to draw valuable conclusions about the problem at hand. Moreover, conditional density functions arise in a variety of areas. One of the more useful applications involves
density forecasting, where the probability density of the forecast of a time series, such as the
rate of ination, can be used to make probability statements regarding the future course of
that series. However, the probability density, and its resulting interpretation, is conditional
on the hypothesis that the model used to produce the forecasts is correctly specied.
Recall that, if g(x, y) denotes the joint density of (X, Y ) and h(x) denotes the marginal
density of X , then the conditional density of Y given X = x is obtained by f (x, y) =
g(x, y)/h(x). The standard nonparametric regression does not allow the analysis of changes
in modality, and standard density estimation does not allow conditioning on an explanatory
variable. Notice also that conditional density estimation is, in some ways, a generalization
of both nonparametric regression and standard univariate density estimation. The kernel
conditional density estimation was rst considered by Rosenblatt (1969) who studied the
problem of estimating the density of Y given X = x where X is an univariate random
variable.
On the other hand, estimators of the conditional mode, the conditional distribution and
the conditional median can be derived directly from estimators of f (x, y). For instance in
Collomb et al. (1987) it is shown how one can get an estimator of the conditional mode
and how such an estimator can be used for forecasting problems (cf. to cite a few, Hrdle
(1990), Gannoun (1990), Youndj (1993 and 1996) and the references therein). Moreover, It
is important to mention that estimators of conditional modes are of particular interest for
prediction (cf. Collomb et al. (1987) and Ferraty et al. (2005)).
Furthermore, the problem of the conditional density estimation appears to have lain free of
scrutiny until it was revisited and some improved estimators were proposed (cf. Hyndman
et al. (1996), and references therein for some developments). Indeed, the following modied
form of Rosenblatt's estimator was considered :
2.1.
Introduction
37
The kernel function, K(u), is assumed satisfying some specic conditions. Popular choices
of K(u) are dened in terms of univariate and unimodal probability density functions. Moreover, Youndj (1993 and 1996), Hyndman et al. (1996) and others give the bias, variance,
mean squared error (MSE) and convergence properties of the estimator (1) and proposed
also an alternative kernel estimator with smaller MSE than the standard estimator in some
commonly occurring situations. On the other, we can not continue our introduction without
mentioning the work by Fan et al. (1996), who proposed an alternative conditional density
estimator by generalizing Rosenblatt's estimator using local polynomial techniques. Then,
Hyndman and Yao (1998) introduced two further local parametric estimators which improve
on the estimators given by Fan et al. (1996). Stone (1994), meanwhile, followed a dierent
path by using tensor products of polynomial splines to obtain conditional log density estimators. For other studies on the nonparametric estimation of the conditional density we
refer also to Gannoun (1990), Youndj (1993 and 1996), Hall et al. (1999), Hrdle et al.
(1991), Bashtannyk and Hyndman (2001), Gannoun et al. (2003), El Ghouch and Genton
(2009) and the references therein.
In this paper, we are interested in the ecient estimation of the conditional probability
density when the explanatory variables are of functional type. It should be noticed that,
these questions in the innite dimensional framework are particularly interesting, at once
for the fundamental problems they formulate, but also for many applications they may allow
(cf. Bosq (2000), Ramsay and Silverman (2005), Ferraty and Vieu (2006) and references
therein). In fact, in this conditional context, the rst results were obtained by Ferraty and
Vieu (2005) and Ferraty et al. (2006). They established the almost-complete consistency,
in both cases i.i.d. and strongly mixing data, of the kernel estimators of the conditional
distribution function and of the conditional probability density. Moreover, they presented
some applications of their results on both the conditional mode and on conditional quantiles.
Among the lot of papers which are concerned with the nonparametric modelization related
to the conditional distribution of a real variable given a random variable taking values in
innite dimensional spaces, we refer only to Dabo-Niang and Laksaci (2007) for the conditional mode estimation, and to Laksaci (2007) for the asymptotic expression of leading terms
in the quadratic error of conditional density kernel estimators.
On the other hand, it is well known that kernel estimators have some nice asymptotic properties when the curse of dimensionality is controlled by means of suitable considerations
on the small ball probabilities of the functional variable (cf. Ferraty and Vieu 2006 and
references therein). However it is also well-known that, as in the standard nite dimensional
framework, the smoothing parameter has to be selected suitably for insuring good practical
performances (cf. Laksaci, 2007). Notice that, some papers, (cf. for instance, Youndj et al.,
1993), have treated the problem of the smoothing parameter selection in the nonparametric
estimation of the conditional density, by using some techniques quite dierent from ours, but
only in the nite dimensional setup. Furthermore, the selection of the smoothing parameter
in the innite dimensional setting is much more complicated. In particular, the so-called
scatterplot which is a graphical tool for exploring the relationship between the explanatory
variables and the scalar response is not available, and hence it becomes very hard to have
some informations on the shape of the relationship between the functional variable and the
38
scalar response. Therefore, various areas with dierent (low/high) concentrations can appear
in such a relationship even though it does not appear in the functional data sample (cf. for
instance, the simulated curves in Section 2.4.2). It is also clear, in the innite dimensional
setup, that the concentration of the distribution of the functional explanatory variable will
have an inuence on the value of some appropriate bandwidth (the variance of the estimator
increases when the concentration of the distribution of the functional covariates decreases
which is the case when the bandwidth value's decreases (cf. conditions (17) and (14)). Moreover, in areas where the functional covariates have low concentration, the bandwidth has
to be taken suciently large to include enough data curves, while a smaller bandwidth can
be used in areas where the functional covariates have high concentration. It should, thus, be
noted that Rachdi and Vieu (2007) (respectively Benhenni et al., 2007) proposed a global
(respectively a local adaptive) cross-validation procedure for the regression operator estimation for functional data, which has inspired this work.
The main aim of this paper is then the construction of both global and local functional crossvalidation procedures. We remark that a local bandwidth choice can signicantly improve
the precision of the prediction in the functional setting than the global one. In section 2, the
data-driven methods are dened. The main hypotheses and results are enounced in section 3.
In section 4, we propose a simulation study showing how an optimal local bandwidth choice
improves the usual global selection rule for some irregular functional covariates. Finally,
asymptotic theoretical support is given in section 5, and the proofs of the auxiliary results
are relegated to the Appendix.
2.2
Let us introduce a sample of independent pairs (Xi , Yi )1in identically distributed as (X, Y )
which is valued in F R, where (F, d) is a semi-metric space equipped with a semi-metric d.
Assume that there exists a regular version of the conditional probability of Y given X , which
is absolutely continuous with respect to the Lebesgue measure on the real line R. Let f (x, )
denote the conditional probability density of the random variable Y given X = x F ,
which we have to estimate. For this aim, we dene the kernel estimator fb(a,b) of f as in (1),
but by considering two dierent kernel functions as follows :
2.2.
39
majority of the earlier works on the bandwidth selection, our rule is based on the minimization of the integrated squared error which is weighted by the probability measure, dPX (x),
of the functional variable X and some nonnegative weight functions W1 and W2 :
(
)2
d1 (fb(a,b) , f ) =
fb(a,b) (x, y) f (x, y) W1 (x)W2 (y) dPX (x) dy
(3)
A discrete approximation of (3) is the averaged squared error given by :
d2 (fb(a,b) , f ) =
n
)2 W (X )W (Y )
1 (b
1
i
2 i
f(a,b) (Xi , Yi ) f (Xi , Yi )
n
f (Xi , Yi )
(4)
i=1
(5)
However, these loss functions depend on the conditional density f , so the smoothing parameter that minimizes these errors is not computable in practice. Thus, we must nd another
loss function which is asymptotically equivalent to the quadratic distances (3), (4) and (5).
Following the same ideas as in Youndj (1996) for the real case, we can write that :
d1 (fb(a,b) , f ) = A + B 2C
where
A =
B =
C =
2
fb(a,b)
(x, y)W1 (x)W2 (y)dPX (x)dy
Since the second term B is independent of (a, b), the problem of minimizing d1 is equivalent to
that of minimizing A2C . A straightforward way to construct a computational procedure to
select the optimal bandwidths (a, b) with respect to the error measure d1 is to estimator the
both quantities A and C . For this aim, as mentioned above, we adopt the standard leaveout-one-curve technique as in Rudemo (1982) for the probability density estimation and
Rachdi and Vieu (2007) for the regression operator estimation, by considering the following
criteria :
n
n
1
2 bi
i2
b
GCV (a, b) =
W1 (Xi ) f(a,b) (Xi , y)W2 (y)dy
f(a,b) (Xi , Yi )W1 (Xi )W2 (Yi ) (6)
n
n
i=1
i=1
n
n
1
2 bi
i2
LCVx,y (a, b) =
W1,x (Xi ) fb(a,b)
(Xi , z)W2,y (z)dz
f(a,b) (Xi , Yi )W1,x (Xi )W2,y (Yi )
n
n
i=1
i=1
(7)
40
where W2,x (respectively W2,y ) is some positive local weight function around x (respectively
y ), and for any i = 1, . . . , n :
i
fb(a,b)
(x, y) =
b1
1
1
j=i K(a d(x, Xj ))H(b (y
n
1
j=i K(a d(x, Xj ))
Yj ))
(8)
and
(
A = IEX
2
fb(a,b)
(X, y)W1 (X)W2 (y)dy
where IEZ denotes the expectation with respect to the distribution of the random variable
Z.
Finally, our global (respectively, local) cross-validation procedure consists in choosing the
bandwidths (a, b) which minimize GCV (a, b) (respectively, LCVx,y (a, b)) on a given set
Hn IR+2 (respectively, Hn (x, y) IR+2 ).
2.3
Main Results
2.3.1 Assumptions
In order to deduce the asymptotic optimality of the bandwidth selected by the rule GCV
(respectively, LCVx,y ), we will assume that the weight function W1 (respectively W2 ) is
bounded with support in some subset SX of F (respectively on a compact subset SY of IR)
and the conditional density f (, ) is bounded on SX SY . In the sequel of this paper, when
no confusion is possible, we will denote by C and C some strictly positive generic constants
and we will make the following assumptions :
The weight functions are taken, for each curve x, such that for some positive real w :
(9)
where B(x, h) denotes the closed ball with center x and radius the real h > 0,
(10)
2.3.
41
Main Results
There exist some strictly positive constants b1 , b2 and , such that : (x0 , y0 ) SX SY ,
(x1 , x2 ) SX SX and (y1 , y2 ) SY SY , we have :
(
)
f (x0 , y0 ) > and |f (x1 , y1 ) f (x2 , y2 )| C db1 (x1 , x2 ) + |y1 y2 |b2 (11)
The kernel K is a bounded and Lipschitzian kernel on its support (0, 1), and there exist
some positive constants C and C such that :
(12)
and if K(1) = 0, the kernel K has to fulll the additional condition < C < K (t) <
C < 0, where K is the rst derivative of K .
The kernel H is bounded and a Lipschitzian continuous function, such that :
and if K(1) = 0, the function (.) has to fulll the additional condition :
(14)
For n large enough, the Kolmogorov's -entropy of SX denoted by SX (cf. for instance,
Kolmogorov and Tikhomiros (1959) and Theodoros and Yannis (1997)) satises, for some
(0, 1) :
(
{
)}
log n
n(3+1)/2 exp (1 )SX
< for some > 1
(15)
n
n=1
lim n b =
n+
and
(16)
42
xSX
Moreover, the function g can be specied for several well known continuous-time processes, by using the Onsager-Machlup function (cf. Ferraty et al., 2010). For instance,
it is shown in Corollary 4.7.8 in Bogachev (1999, page 186), that the expression of
the Onsager-Machlup function of the couple (x, z), for the Gaussian measures on a
semi-normed space (F, ), is given by :
(
)
1
1
IP (X B(x, h))
F (x, z) = log lim
= (z)2H (x)2H
h0 IP (X B(z, h))
2
2
where H is the Hilbert norm on the Cameron-Martin space of F associated to
a Gaussian measure, denoted by H , and () is the orthogonal projection onto the
orthogonal complement of the set {a H, such that a = 0}. So, in this case g(x) =
exp( 21 (x)2H ), therefore, condition (10) is veried for subsets such as :
(
SX
log n
n
)
= O((log n))2 )
2.3.
43
Main Results
2. The unit ball of the Cameron-Martin space associated to the standard stationary
Ornstein-Uhlenbeck process viewed as a map in the Sobolev space W21 (0, 1) with
the covariance operator :
(
SX
log n
n
)
= O(log n)
3. The closed ball B(0, r) in the Sobolev space dened by the class of functions x(t)
on T = [0, 2p), such that :
2
2
1
1
2
2
x (t)dt +
x(m) (t)dt r
2 0
2 0
where x(m) () denotes the mth derivative of x. In this case :
(
)
log n
= O(n1/m )
SX
n
4. The compact subsets in the nite dimensional spaces, or in the projection semimetric in Hilbert spaces where :
(
)
log n
SX
= O(log n)
n
Notice that, the inequality (H5b) in Ferraty et al. (2010) is not necessary here because
such assumption is used to precise the convergence rate of the uniform consistency
which is not necessary. In other words, the uniform consistency of the kernel estimator
of the conditional density (without any precision on the convergence rate) is sucient
to show our results.
Conditions (9) and (16) are equivalent to those used by Rachdi and Vieu (2007) and
Benhenni et al. (2007) for the global and local cross-validation procedures in the
operatorial regression estimation. In fact, these hypotheses are the functional versions
of those used by Hrdle and Marron (1985) and Youndj (1996) in the usual real case.
The condition (9) on the weight function is similar to that in Vieu (1991), and allows
to give more importance to observations around the curve x.
Hn
of bandwidths
(a, b)
is nite
with :
#(Hn ) = O(n )
for some
> 0,
where
(17)
44
k = 1, 2, 3,
that :
dk (fb(a1 ,b1 ) , f )
1
dk (fb(a ,b ) , f )
0
where
n +
(18)
inf
(a,b)Hn
dk (fb(a,b) , f )
inf
GCV (a, b)
(a,b)Hn
On the local framework, we suppose that (15) is veried for SX = B(x, w) and we deduce
the same optimality results, for the local criterion.
Theorem 2.3.2.
Hn (x, y)
of bandwidths
(a, b)
is nite
with :
k = 1, 2, 3,
for some
(x, y) > 0,
(19)
n +
(20)
that :
dk (fb(a1 ,b1 ) , f )
1,
dk (fb(a ,b ) , f )
0
a.s., as
where
2.4
inf
(a,b)Hn (x,y)
dk (fb(a,b) , f )
inf
(a,b)Hn (x,y)
LCVx,y (a, b)
2.4.
45
(21)
yR
where fb(a,b) (x, y) is given in (2). Clearly, the behaviour of the conditional mode estimator
depends on the choice of the two smoothing parameters a and b. In this prediction context,
a naive L2 -criterion is given by :
{ n
}
1
i
2
(aopt , bopt ) = arg min
(Yi b (a, b, Xi ))
a,b
n
i=1
where
i
bi (a, b, Xi ) = arg sup fb(a,b)
(Xi , y)
yR
i
with f(a,b)
(Xi , y) is the leave-one-out-curve estimator dened by (8). This selection method
has been used by De Gooijer and Gannoun (2000) in the multivariate case and by Ferraty
and Vieu (2006) in the functional setting. Although this selection method is very adequate
in several practical situations but, to the best of our knowledge, their asymptotic optimality
has not been addressed so far. Moreover, this selection procedure has serious problems if
the estimator b is not unique. A reasonable way to overcome this problem is to use our
bandwidth selection procedure by computing the conditional mode estimator as follows :
where (a1 , b1 ) are dened in Theorem 2.3.1. Similarly to the previous criterium, the present
procedure shows a great compatibility in practice (cf. Section 4.3), but their asymptotic
optimality remains an open question. Furthermore, the choice of the smoothing parameter
in the conditional mode estimation is one of the natural prospects of the present work.
in many particular situations predictions's user can also be interested in the construction of a predictive interval (or region)
since the latter is often more informative than a pointwise prediction. Notice that, there are
several ways to determine these regions (cf. for instance De Gooijer and Gannoun, 2000). In
this paragraph we focus on the maximum conditional density predictive region (MCDR) or
The maximum conditional density predictive region :
46
highest conditional density region (HCDR) introduced by Hyndman (1995). This region is
dened, for any given (0, 1, by :
f (x, y)dy .
Recall that, the MCDR is of the smallest Lebesgue measure among all the predictive regions
with the some coverage probability (cf. De Gooijer and Gannoun, 2000). In the unconditional case, the estimation of the maximum density predictive region has been widely studied
(cf. Samworth and Wand (2010) and the references therein). In our functional conditional
context, we use the kernel estimator fb(a,b) of the conditional density f to give a plug-in estimator of R . However, as for all estimations by the kernel method, the performance of this
estimation depends heavily on the choice of the bandwidth parameters (a, b). As that has
been mentioned before, many data-driven bandwidth selection have been proposed in the
multivariate case. For instance, Gooijer and Gannoun (2000) compare four selection methods
based on the classical leave-out-one cross-validation procedure associated to the cumulative
conditional distribution, the conditional mode, the conditional mean and the conditional
median. At this stage, it seems more reasonable to use a cross-validation criterion of the
conditional density instead to that of the predictors (the conditional mode, the conditional
mean or the conditional median). In other words, the best approximation of the MCDR can
be obtained by computing :
{
}
b = y : |y| < , fb(a ,b ) (x, y) b
R
l
(x)
1 1
where
b
l (x) = max l > 0 :
11{fb
(a1 ,b1 )
}
b(a ,b ) (x, y)dy .
f
1 1
(x,y)l}
Similarly to the conditional mode estimation, the asymptotic optimality of this selection
procedure is also an important prospect of this work. Finally, let us note that, in the real
unconditional framework Samworth and Wand (2010) show the asymptotic optimality of
some selection method based on the minimization of the probability of the symmetric difb . The adaptation of these ideas in the functional
ference between R and its estimator R
conditional case is an other important prospect of this work.
the expected shortfall (ES) is recently considered as one of the most common risk measures in Finance. This model was introduced by
Acerbi (2002) for a given level in (0, 1) by :
+
ES = 1
tfY (t)dt
The conditional expected shortfall estimation :
VAR
where VAR = FY1 () with fY (respectively, FY ) denotes the density (respectively, the
cumulative distribution) of the random variable Y representing returns on a given portfolio,
2.4.
47
stock, bond or market index. In many situations, we have to analyze the nancial risk
conditionally to an exogenous variable which is continuously observed. To do that, we use
the conditional expected shortfall for which the expectation is taken with respect to the
conditional distribution of Y given this exogenous variable X and the VAR is the conditional
quantiles of order of Y given X . Accurate estimation of this conditional model depends
crucially on the estimation method of the conditional density. Thus, if the kernel method is
used to estimator the conditional density function, the best approximation of the conditional
expected shortfall (CES) is then given by :
+
1
d
CES =
ybf(a1 ,b1 ) (x, y)dy
d (x)
VAR
Yi = r(Xi ) + i , for i = 1, . . . , n
(22)
where the i 's are generated independently according to a N (0, 1) distribution. The sampled
functional explanatory variables Xi for i = 1, . . . , n, which is assumed to be independent of
i for i = 1, . . . , n, is generated according to the following expressions :
48
15
10
10
15
20
40
60
80
100
Time
i=1,...n
W1 (t) =
0 otherwise
0 otherwise
2.4.
49
n
b
d2 (f(aGCV ,bGCV ) , f )
d2 (fb Gd2 Gd2 , f )
50
0.1183
0.0593
100
0.0850
0.04284
150
0.0774
0.0392
200
0.0395
0.03333
250
0.0262
0.02361
d2 (fb(aGd2 ,bGd2 ) , f )
d2 (fb GCV GCV , f )
1.9949
1.9859
1.9744
1.1851
1.1096
(a
,b
(a
,b
1
if
d(t,
x)
<
a(x)
1 if |z y| < b(y)
W1,x (t) =
0 otherwise
0 otherwise
We choose the global bandwidths over the same set Hn dened above and we use the same
50
RSS(local)
0.0044
0.0046
0.0048
0.0050
RSS(global)
RSS(local) =
50 100
]2
1 [
f (xi , yj ) fb(aLCV ,bLCV ) (xi , yj )
500
i=1 j=1
and
50 100
]2
1 [
RSS(global) =
f (xi , yj ) fb(aGCV ,bGCV ) (xi , yj )
500
i=1 j=1
We have carried out several tests (exactly 25 tests) by changing observations between the
learning and the test samples. In Figure 2 we plot the box-plot of the given RSS errors in
both cases. It appears clearly that, the local bandwidth choice outperforms better than the
global selection method.
2.4.
51
While for the Ferraty and Vieu's method (F-V-Method, say) we use the R-routine named
5
funopare.mode.lcv . Recall that, the parameters (a, b) in this R-routine are locally chosen
over the same type of set Hn (x, y) as follows :
where xi = arg
xj
min
d(xi , xj ) and
learning sample
(aF Vi , bF Vi ) = arg
4. in the fttp address :
min
b b, x ).
|Yi (a,
i
ftp ://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly
www.lsp.ups-tlse.fr/staph/npfda
52
200
400
600
800
1000
1200
20
40
60
80
Time
2.4.
53
LCDEMethod (MSE=0.35)
9.0
9.5
Estimates
9.5
9.0
Estimates
10.0
10.0
10.5
FVMethod (MSE=0.47)
9.0
9.5 10.0
9.0
Observations
9.5 10.0
Observations
Figure 2.4 Comparison of the prediction results between the F-V-Method and the LCDE-Method
The performance of the both selection rules, in terms of prediction, is evaluated by computing
the mean squared prediction errors (MSE), dened by the following quantities :
MSE(LCDE) =
and
(
)2
LCV , bLCV , X )
b
Y
(a
i
i
iI1
18
(
MSE(FV) =
iI1
)2
b F V , bF V , X i )
Yi (a
18
54
the superiority of our local procedure to the local one used in Ferraty and Vieu (2006)
can be justied by the fact that, for each curve xi in the sample test, our local procedure
minimizes over all k - nearest neighbors smoothing parameter, while in Ferraty and Vieu
(2006) consider, only, the k -nearest neighbors of the nearest curve of xi in the learning
sample.
2.5
Proofs
Recall that, C denotes a generic constant. As specied above, the technical parts which
are similar to the nite dimensional or regression operator estimation for functional data
techniques are omitted. Thus, we encourage readers who are interested in these proofs to
keep at hand the standard nite dimensional literature (that is, the paper by Hrdle and
Marron, 1985) or those relative to the innite dimensional framework (that is, the papers
by Rachdi and Vieu (2007), and Benhenni et al. (2007)).
Proof of theorem 2.3.1.
)2
(
)2
fb(a,b) (x, y) f (x, y)
=
fbN (x, y) f (x, y)fbD (x)
(
)
(
)2
+2 1 fbD (x) fbD (x) fb(a,b) (x, y) f (x, y)
(
)2 (
)2
+ 1 fbD (x)
fb(a,b) (x, y) f (x, y)
where
fbD (x) =
i=1
K(a1 d(x, Xi ))
(23)
and
fbN (x, y) =
i=1
(24)
[
]
It follows from the uniform consistency 6 of fbD (x) to IE fbD (x) = 1 (cf. Ferraty et al., 2010),
that :
k = 1, 2, 3 dk (fb(a,b) (x, y), f (x, y)) = dk (fbN (x, y), f (x, y)fbD (x))+oa.s. (dk (fb(a,b) (x, y), f (x, y)))
6. In Ferraty et al. (2010), the main aim is to state the rate of the uniform almost-complete convergence
of the functional component. Such result can be easily extended here (without precision of the convergence
rate) to
supaHn
2.5.
55
Proofs
1
1
1
K (x, Xi ) =
K(a d(x, Xi ))H(b (y Yi ) bf (x, y)K(a d(x, Xi ))
bIE [K(a1 d(x, X))]
we obtain that :
dk (fb(a,b) , f ) dl (fb(a,b) , f )
0, a.s., for all k = l
sup
hHn
d3 (fb(a,b) , f )
where
fbi (Xi , Yi )
i
fb(a,b)
(Xi , Yi ) = N i
fbD (Xi )
with
i
fbN
(x, y) =
j=i
and
i
fbD
(x) =
Remark that :
j=i
(
)
K a1 d(x, Xj )
where
1
CT (a, b) =
W1 (Xi )
n
n
i=1
1 bi2
W1 (Xi )W2 (Yi )
i2
fb(a,b)
(Xi , y)W2 (y) dy
f(a,b) (Xi , Yi )
n
f (Xi , Yi )
n
i=1
and
1
f (Xi , Yi )W1 (Xi )W2 (Yi )
n
n
T =
i=1
56
Thus, the proof of this theorem is complete if we can prove that d5 is asymptotically equivalent to d3 and that :
CT (a, b)
sup
0, a.s., as n +
b
hHn d3 (f(a,b) , f )
That is why the demonstration of Theorem 2.3.1 is achieved by the following Lemmas 2.5.1,
2.5.2 and 2.5.3, for which the proofs are given in the Appendix (cf. Section 6).
Lemma 2.5.1.
d3 (fb(a,b) , f ) C
Lemma 2.5.2.
1
nb(a)
Under hypotheses (10), (11), (12), (14) and (17), we obtain that :
sup
a.s., as
n +
(a,b)Hn
Lemma 2.5.3.
Under hypotheses (10), (11), (12), (14) and (17), we have that :
d (fb , f ) d (fb , f )
(a,b)
5 (a,b)
3
sup
0,
d3 (fb(a,b) , f )
(a,b)Hn
a.s., as
n +
The main ideas of the proof are essentially contained in Rachdi and
Vieu (2007) and, il the same way, the proof of Theorem 2.3.1 above, but the computations
here are more complicated since we have the additional problems of dealing with non constant
weight functions and functional data. The reader should have the above mentioned papers
at hand in order to get all the details of this proof. To make things easier, from hypothesis
(9), the weight function is bounded and has a compact support with nonempty interior (the
closure of B(x, w)). Thus, hypothesis (5) in Rachdi and Vieu (2007) is satised and the proof
of this theorem is shown using the same steps as in Theorem 2.3.1's proof. It is therefore
omitted here. We just mention that the rst step of the proof consists in showing the result
over a nite subset of Hn , and in the second step, the result is extended to the continuous
set of bandwidths Hn by using the Hlder continuity property of the functions K and f (cf.
Vieu (1991) and Youndj et al. (1993)).
2.6
and
Hi (y) = H(b1 (y Yi ))
It is clear that :
[
]
d3 (fb(a,b) , f )
Var fb(a,b) (x, y) W1 (x)W2 (y) dPX (x) dy
2.6.
57
[
]
Therefore, it is sucient to evaluate the variance term Var fb(a,b) (x, y) . To do that, by
using a similar computational
techniques as in Laksaci (2007) and by taking into account
[
]
b
the fact that IE fD (x) = 1, we get that :
[
]
[
]
Var fb(a,b) (x, y) = Var fbN (x, y) 2IE(fbN (x, y)) Cov(fbN (x, y), fbD (x))
)
(
)2
(
1
b
b
+ IE(fN (x, y)) Var(fD (x)) + o
nb(a)
(25)
According to the denitions (24) and (23) of the estimators fbN and fbD , we obtain that :
(
)
1
Var (K1 (x)H1 (y))
Var fbN (x, y) =
n(bIE [K(a1 d(x, X))])2
1
nb(IE [K(a1 d(x, X))])2
(
)
Var fbD (x) =
1
n(IE [K(a1 d(x, X))])2
Moreover, under (12) and after some simple calculations, we can show, for all i, j = 1, 2,
that :
[
]
[ i
]
j
i
IE K1 (x)H1 (y) = bIE K1 (x)f (X1 , y)
H j (t)dt + o(bIE[K1 (x)])
(26)
and
[
]
0 < C(a) IE K1i (x) C (a)
(27)
By comparing asymptotically the three quantities in (25), we can see that the rst term is
leading. Hence, from (26) and (27) we obtain that :
)
(
[
]
[ 2
]
1
1
2
b
Var f(a,b) (x, y) =
IE K1 (x)f (X1 , y)
H (t)dt + o
nb(IE [K(a1 d(x, X))])2
nb(a)
Furthermore, by using the fact that the conditional density does not vanish on a neighbor
of SX SY , we obtain that :
1
d3 (fb(a,b) , f ) C
nb(a)
which completes the proof of this lemma.
From the denition of CT (a, b), we have for all (a, b) Hn :
n
n
1
2
2
W
(X
)W
(Y
)
2
1
i
2
i
i
i
|CT (a, b)| =
fb(a,b)
(Xi , Yi )
fb(a,b)
(Xi , y)W2 (y)W1 (Xi )dy
n
n
f (Xi , Yi )
i=1
i=1
n
]
[
2
2
1
W
(X
)W
(Y
)
1
i
2 i
i
i
=
fbN
(Xi , y)W2 (y)W1 (Xi )dy 2fbN
(Xi , Yi )
i2
b
f
(X
,
Y
i
i)
i=1 nfD (x)
58
i
This combined with the uniform consistency of fbD
(x) to 1 (cf. Ferraty et al., 2008), is
enough to prove that, as n + :
n [
]
2
2
W
(X
)W
(Y
)
1
i
2
i
i
i
sup b(a)
fbN (Xi , y)W2 (y)W1 (Xi )dy 2fbN (Xi , Yi )
0, a.s.
f
(X
,
Y
)
i i
(a,b)Hn
i=1
To do this, one can use similar arguments as in the real case (cf. Youndj, 1996). Indeed, let
for all 1 i, j, k n :
aij
bij
Uij
W2 (Yi )
=
Hj2 (y)W2 (y)dy Hj2 (Yi )
f (Xi , Yi )
= aij bij
W2 (Yi )
dijk =
Hj (y)Hk (y)W2 (y)dy Hj (Yi )Hk (Yi )
f (Xi , Yi )
Vijk = cijk dijk
Now, we have to examine the following limits :
b(a)
Uij 0, a.s.
sup
2
(a,b)Hn (n 1)
i=j
and
b(a)
sup
2
(a,b)Hn (n 1)
Vijk 0, a.s.
i=j=k=i
The proof of the above two limits follows by adopting the same steps as in the proof of Lemma
3 in Rachdi and Vieu (2007). Notice that, by the Borel-Cantelli Lemma, it is enough to show
that there is a , > 0 such that for all p IN there are constants C, C so that :
2p
IE n2 b(a)
Uij Cnp
(28)
i=j
and
IE n2 b(a)
2p
Vijk )
C n p
i=j=k=i
2p
(
)
IE n2 b(a)
Uij = (n2 b(a))2p
i1 =j1
i2p =j2p
(29)
2.6.
59
(30)
b2 H 2 (b1 (y z))
W2 (y)dydz
f (Xi , y)
= 0
and thus, we may write :
IE[Uij |X1 , . . . , Xn ] = aij IE[bij |X1 , . . . , Xn ] = 0
and
So, if m > 2p there exists an a {1, . . . , 2p} such that ia (or ja ) appears only once in
{i1 , j1 , . . . , i2p , j2p }, therefore, it suces to compute the expectation by conditioning with
respect to Xia (or Xja ) to show (30).
On the other hand, for 2 m 2p, we deduce from (10) and (12) that :
m
m
(
)
C
|IE Ui1 j1 Ui2p j2p | 4p
IE
K ij (a1 d(Xi , Xj ))
W1i (Xi )
b (a)4p
i,j=1
i=1
(
)
for all m = 2, . . . , 2p, |IE Ui1 j1 Ui2p j2p |
m
C
2 (a)
b4p ((a))4p
2p
2p
m
2
m=2
2p (
)m2p
1
p
((a))
= C
n
((a)
(nb(a))2p
m=2
1
((a))p
C
(nb(a))2p
1
C 1 2p
.
(n ) ((a))p
(31)
60
It suces now to combine (31) together with the assumption (16), to get (28).
Concerning (29), we use analogous arguments as for showing (28). By denoting m the
cardinality of the set {i1 , j1 , k1 , . . . , i2p , j2p , k2p } and because IE[dijk |X1 , . . . , Xn ] = 0 we
deduce that, if m > 3p, we have that :
(
)
IE Vi1 j1 k1 Vi2p j2p k2p = 0
Moreover, when 3 m 3p, a straightforward modication of the proof of Lemma 5.4.2
in Youndj (1996), gives :
[
]
IE Vi1 j1 k1 Vi2p j2p k2p
C
b4p (a)4p
(b(a))m /2
IE n2 b(a)
2p
Vijk
3p
m =3
i=j=k=i
= C
3p
(n(b(a))1/2 )m 3p
(1/2)
p
((n(b(a))
) )
m =3
1
(n(1/2)p (a)(p/2)
)2 W (X )W (Y )
1 ( bi
1
i
2 i
i
fN (Xi , Yi ) f (Xi , Yi )fbD
(Xi )
d5 (fb(a,b) , f ) =
n
f (Xi , Yi )
n
i=1
we can write
Therefore, because of the asymptotic equivalence between d3 and d2 the proof of this lemma
will be completed as soon as we show that :
d (fb , f ) d (fb , f )
5 (a,b)
2 (a,b)
sup
0, a.s.
hHn
d3 (fb(a,b) , f )
To do that, we consider the following decomposition :
(
)2
(
)2
(
)(
)
i
i
i
i
fbN
f fbD
=
fbN f fbD + 2 fbN
fbN + f (fbD fbD
) fbN f fbD
)2
(
i
i
)
fbN + f (fbD fbD
+
fbN
2.6.
61
and
i
fbD
(x) fbD (x) =
1 b
fN (x, y)
n1
(
) (
)
1
d(x, Xi )
y Yi
K
H
(n 1)bIE [K(a1 d(x, X))]
a
b
1
1 b
fD (x)
K
n1
(n 1)IE [K(a1 d(x, X))]
d(x, Xi )
a
Since K is a bounded function, we deduce from the uniform consistency of fbD and fbN that
i
|fbN
(x, y) fbN (x, y)|
and
i
|fbD
(x) fbD (x)|
C
C
, a.s.
(n 1)bIE [K(a1 d(x, X))]
(n 1)b(a)
C
C
, a.s.
1
(n 1)bIE [K(a d(x, X))]
(n 1)b(a)
Hence, we get :
b
d5 (f(a,b) , f ) d2 (fb(a,b) , f )
C
b
sup
fN (x, y) f (x, y)fbD
(n 1)b(a) (x,y)SX SY
+
C
(n
1)2 b2 ((a))2
By combining this last result with Lemma 2.5.1 we obtain the claimed result.
62
Bibliographie
64
[15] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Statist. Inf. for Stoch.
Proc., 9, Pages 4776.
[16] Ferraty, F. and Vieu, P. (2006).
Practice. Springer-Verlag.
Pub.
[18] Gannoun, A., Saracco, J. and Yu, K. (2003). Nonparametric prediction by conditional
median and quantiles. J. of Statist. Plan. and Inf., 117, No. 2, Pages 207223.
[19] Hall, P., Wolk, R.C. and Yao, Q. (1999). Methods for estimating a conditional distribution function. J. Amer. Statist. Assoc., 94, Pages 154163.
[20] Hrdle, W. (1991).
York.
Springer, New
[21] Hrdle, W., Jenssen, P. and Sering, R. (1991). Strong consistency rates for estimators
of conditional functionals. Ann. Statist., 16, No. 4, Pages 14281449.
[22] Hrdle, W. and Marron, J. S. (1985). Optimal bandwidth selection in nonparametric
regression function estimation. Ann. Statist., 13, No. 4, Pages 14651481.
[23] Hyndman, R.J. (1995). Highest-density forecast regions for non-linear and non-normal
time series models. J. Forecast., 14, Pages 431441.
[24] Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996). Estimating and visualizing conditional densities. J. Comput. Graph. Statist., 5, Pages 315336.
[25] Hyndman, R.J. and Yao, Q. (1998). Nonparametric estimation and symmetry tests for
conditional density functions. Working paper 17/98, Department of Econometrics and
Business Statistics, Monash University.
[26] Kolmogorov, A.N. and Tikhomirov, V.M. (1959). -entropy and -capacity. Uspekhi
Mat. Nauk., 14, Pages 386. (Eng. Transl. Amer. Math. Soc. Transl. Ser., 2, Pages
277364, (1964)).
[27] Laksaci, A. (2007). Convergence en moyenne quadratique de l'estimateur noyau de
la densit conditionnelle avec variable explicative fonctionnelle. Pub. Inst. Stat. Univ.
Paris, 3, Pages 6980.
[28] Ouassou, I. and Rachdi, M. (2009). Stein type estimation of the regression operator
for functional data. Advances and Applications in Statistical Sciences, 1, No. 2, Pages
233-250.
[29] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data : automatic smoothing parameter selection. J. of Statist. Plan. and Inf., 137, Pages 27842801.
[30] Ramsay, J.O. and Silverman, B.W. (2005).
Springer-Verlag. New York.
Second Edition,
2.6.
65
Kernel Smoothing.
[37] Theodoros, N. and Yannis, G.Y. (1997). Rates of convergence of estimate, Kolmogorov
entropy and the dimensionality reduction principle in regression. Ann. Statist., 25, No.
6, Pages 24932511.
[38] Vieu, P. (1991). Nonparametric regression : optimal local bandwidth choice.
Soc., 53, No. 2, Pages 453464.
J. R. Stat.
par la
66
Chapitre 3
Functional data : Local linear
estimation of the conditional density
and its application
,4
C. R., Math., Acad. Sci. Paris, 348, Issues 15-16, Pages 931-934, (2010).
Statistics, DOI : 10.1080/02331888.2011.568117 ( paratre en 2012)
Abstract.In
3.1
Introduction
This paper deals with local polynomial modeling of the conditional density function when
the explanatory variable is of functional type. It is well known that a local polynomial smoothing has various advantages over the kernel method, namely this method has superior bias
properties to the previous one (cf. [4] and [6] for an extensive discussion on the comparison
1. Laboratoire AGIM FRE
3405
3405
UFR SHS, BP. 47, 38040 Grenoble Cedex 09, France. E-mails : Mustapha.Rachdi@upmf-grenoble.fr and
Fethi.Madani@imag.fr
4. Corresponding author
67
68
between both these methods). Notice that these questions in innite dimensional spaces are
particularly interesting, at once for the fundamental problems they formulate, but also for
many applications they may allow (cf. [5, 11, 29, 30]). Moreover, the kernel method is known
for being a particular case of the local polynomial method.
Except the fact that the conditional density plays an important role in nonparametric prediction, there are several tools in nonparametric statistic, such as the conditional mode, the
conditional median or the conditional quantiles, which are based on the preliminary estimator of the functional parameter proposed in this paper. In the nonparametric functional
statistics, the rst results about the almost-complete consistency were obtained in [10], for
conditional density/distribution functions estimation when the data are independent and
identically distributed. The strong mixing case has been studied by [16]. On the other hand,
the convergence in Lp -norm mode of the kernel estimator of the conditional mode was stated
in [8], and some asymptotics of the conditional quantile and mode estimators in [10]. While
in [12] the asymptotic expansion of the exact expression involved in the leading terms of the
quadratic error of the kernel estimators of the conditional density is established, in [9] the
uniform almost complete convergence of some nonparametric conditional models is showed.
For some more recent advances in the nonparametric statistics for functional data we refer
to [3, 5, 25] and the references therein.
In this work, we introduce a new nonparametric estimation of the conditional density for
functional data. Our estimator is based on the local linear approach. Notice that the local
linear estimator of the conditional density has been widely studied, when the explicative
variable lies in a nite dimensional space, and there are many references on this topic (cf.
for instance [12, 19]). For a general treatment/study of local polynomial estimation, we refer
to [13, 24]. Thus, in this paper, we are concerned in proving, under some general conditions,
the almost complete convergence with rates of the constructed estimator. More precisely, in
Section 3, we show the pointwise consistency. The uniform version of this asymptotic result
is given in Section 4. The interest of the uniform consistency comes mainly from the fact
that the pointwise performance of all estimators is not sucient to quantify its eciency,
but, some stability is needed, in the sense that this performance should be uniform over
a neighborhood. Notice that, in the functional statistics, the uniform convergence is not a
direct extension of the previous pointwise results, but, it requires some additional tools and
conditions. In Section 5, we will emphasize the consequence of the previous results to the
estimation of the conditional mode.
3.2
Model
Let us introduce n pairs of random variables (Xi , Yi ) for i = 1, . . . , n that we assume drawn
from the pair (X, Y ) which is valued in F R, where F is a semi-metric space equipped
with a semi-metric d.
Furthermore, we assume that there exists a regular version of the conditional probability of
Y given X , which is absolutely continuous with respect to Lebesgue measure on R and has
bounded density, denoted by f x . Local polynomial smoothing is based on the assumption
that functional parameter is smooth enough to be locally well approximated by a polynomial.
In functional statistics, there are several ways for extending the local linear ideas (cf. [1, 2, 5]).
3.3.
69
Here we adopt the fast functional locally modeling, that is, we estimate the conditional
density f x by b
a which is obtained by minimizing the following quantity :
min
(a,b)IR2
)2
1
h1
H(h
(y
Y
))
b(X
,
x)
K(h1
i
i
H
H
K (x, Xi ))
(1)
i=1
f (y) =
(2)
hH ni,j=1 Wij (x)
where
1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))
Remark 3.2.1.
Obviously, if
b=0
in the functional case, in [9, 10, 28] and the references therein.
The minimization of (1) may be achieved by a wiggly
to all the data points in a neighborhood of
bb
(see Chapter
that forces
15
fbx
to adapt
reasoning in the context of linear regression). In [6] is expressed the same idea stating
that optimizing in
3.3
is an innite-dimensional problem.
In what follows x denotes a xed point in F , Nx denotes a xed neighborhood of x, SIR will
be a xed compact subset of IR, and x (r1 , r2 ) = IP(r2 (X, x) r1 ).
Notice that our nonparametric model will be quite general in the sense that we will just
need the following assumptions :
(H1) For any r > 0, x (r) := x (r, r) > 0
2
(H2) The conditional density f x is such that : there exist b1 > 0, b2 > 0, (y1 , y2 ) SIR
and (x1 , x2 ) Nx Nx
(
)
|f x1 (y1 ) f x2 (y2 )| Cx db1 (x1 , x2 ) + |y1 y2 |b2 ,
b2
|t| H(t)dt < and
H 2 (t)dt < .
70
1
n > n0 ,
x (hK )
and
x (zhK , hK )
)
d ( 2
z K(z) dz > C3 > 0
dz
hK
)
2
(u, x) dP (u)
B(x,hK )
ln n
= 0.
nhH x (hK )
Observe that these conditions are very standard in this context. The conditions (H1), (H3)
and (H6) are the same as those used in [1]. Assumptions (H2) is a regularity condition which
characterizes the functional space of our model and is needed to evaluate the bias term in
the asymptotic results of this paper. The hypotheses (H5) and (H7) are technical conditions
and are, also, similar to those considered in [10].
The following theorem gives the almost-complete convergence 5 (a.co.) of fbx .
Theorem 3.3.1.
Under assumptions (H1), (H2), (H3), (H4), (H5), (H6) and (H7), we
have that :
(
)
sup |fbx (y) f x (y)| = O hbK1 + hbH2 + O
ySIR
ln n
n hH x (hK )
)
, a.co.
Remark that, the proof of Theorem 4.2.1 is a direct consequence of the decomposition :
x
fbN
(y) =
) (
)}
1 {( bx
x
x
fN (y) IE[fbN
(y)] f x (y) IE[fbN
(y)]
x
fbD
)
f x (y) (
x
1 fbD
,
x
fbD
(3)
1
Wij (x)H(h1
H (y Yi )).
n(n 1)hH IE [W12 (x)]
i=j
and
x
fbD
=
1
Wij (x)
n(n 1)IE [W12 (x)]
i=j
and of Lemmas 5.3.1, 5.3.2 and 5.3.3 below, for which the proofs are given in the Appendix.
(zn )nN bea sequence of real r.v.'s ; we say that zn converges almost completely (a.co.) to zero if,
> 0,
(un )nN be a sequence of positive real numbers ;
n=1 IP (|zn | > 0) < . Moreover,
let
5. Let
implies both almost sure convergence and convergence in probability (cf. [13] for details).
3.4.
71
Lemma 3.3.1.
(cf. [1])
(
x
1 fbD
=O
and
> 0,
such that
ln n
n x (hK )
)
, a.co.
(
)
x
IP fbD
< < .
i=1
Lemma 3.3.2.
(
)
x
sup f x (y) IE[fbN
(y)] = O hbK1 + hbH2 .
ySIR
Lemma 3.3.3.
x
x
(y) IE[fbN
(y)] = O
sup fbN
ySIR
3.4
ln n
n hH x (hK )
)
, a.co.
This Section is devoted to the uniform version of Theorem 4.2.1. More precisely, our purpose
is to establish the uniform almost complete convergence of fbx on some subset SF of F , such
that :
dn
SF
B(xk , rn ),
k=1
)
db1 (x1 , x2 ) + |y1 y2 |b2 .
72
(U3) The function (., .) satises (H3) and, for some strictly positive constant C , the
following Lipschitz's condition :
n(3+1)/2 d1
< , for some > 1.
n
n=1
Notice that conditions (U1) and (U2) are, respectively, the uniform versions of (H1) and
(H2). Indeed, conditions (U1) and (U5) are linked with the topological structure of the
functional variable. Therefore, as in the pointwise case, the choice of the topological structure, controlled here by means of the function (., .), plays a crucial role. So, a right choice
of this function improves the convergence rate of the estimator. More precisely, we will see
thereafter, that a good semi-metric is that increases the concentration of the probability
measure of the functional variable X as well as minimizes dn . It should be noticed that,
both conditions (U1) and (U2) are veried for several continuous time-processes (cf. for
instance [9] for some examples).
Theorem 3.4.1.
Under assumptions (U1), (U2), (U3), (U4), (U5), (H5) and (H6), we
have that :
(
bx
xSF ySIR
O(hbK1 )
O(hbH2 )
+ Oa.co.
ln dn
n1 (hK )
)
.
(4)
It Clear that, as for Theorem 4.2.1, the Theorem 3.4.1's proof can be deduced directly
from the decomposition (1) and from the following intermediate results which correspond to
the uniform versions of Lemmas 5.3.1, 5.3.2, 5.3.3 and for which the proofs are, also, given
in the Appendix.
Lemma 3.4.1.
Under assumptions (U1), (U3), (U4), (U5) and (H6), we obtain that :
(
x
1| = Oa.co.
sup |fbD
xSF
Corollary 3.4.1.
)
.
n=1
Lemma 3.4.2.
ln dn
n(hK )
1
x
IP inf fbD
<
xSF
2
)
< .
[
]
x
sup sup |f x (y) IE fbN
(y) | = O(hbK1 ) + O(hbH2 ).
xSF ySIR
3.5.
73
Lemma 3.4.3.
(
x
x
(y) IE[fbN
(y)]| = Oa.co.
sup sup |fbN
xSF ySIR
3.5
ln dn
1
n (hK )
)
.
Let us now study the almost complete convergence of the kernel estimator of the conditional
mode of Y given X = x, denoted by (x), uniformly on a xed compact subset SF of
F . For this aim, we assume that (x) satises, on SF , the following uniform uniqueness
property (cf. [27, 34] for the univariate case and [26] for the multivariate case).
(U6) 0 > 0, > 0, r : S SIR , we have that :
xSF
Moreover, we suppose, also, that there exists some integer j > 1 such that x SF , the
function f x is j -times continuously dierentiable on interior(SIR ) with respect to y , and
that :
(U7)
x(l)
f ((x)) = 0, if 1 l < j
and f x(j) () is uniformly continuous on SIR
d dened by :
We estimate the conditional mode (x) by the random variable (x)
d = arg sup fbx (y).
(x)
ySIR
Corollaire 3.5.1.
fx
(
d (x)| =
sup |(x)
j
xSF
3.6
O(hbK1 )
O(hbH2 )
+ Oa.co.
ln dn
n1 (hK )
)
.
Appendix
In what follows, when no confusion is possible, we will denote by C and C some strictly
positive generic constants. Moreover, we put, for any x F , and for all i = 1, . . . , n :
74
h1
IE
[H
(y)/X]
=
H(t)f X (y hH t)dt,
1
H
R
therefore
H(t)|f X (y hH t) f x (y)|dt.
Since H is a probability density function, then the claimed result in this lemma is a direct
consequence of (H5).
Proof of lemma 5.3.3. The compactness property of SIR , allows usto write that :
n
there exists a sequence of real numbers (tk )k=1,...,sn , such that SIR sk=1
(tk ln , tk + ln )
3
1
2 2
1
with ln = n
and sn = O(ln ).
Let ty = arg
min
|y t| and consider the following decomposition :
t{t1 ,...,tsn }
x
x
sup fbN
(y) IE[fbN
(y)]
ySIR
x
x
x
x
(ty ) IE[fbN
(ty )] +
sup fbN
(y) fbN
(ty ) + sup fbN
ySIR
ySIR
{z
} |
{z
}
|
A
1
x
x
+ sup IE[fbN
(ty )] IE[fbN
(y)] .
ySIR
|
{z
}
A2
(5)
A3
Firstly, for the terms A1 and A3 , we use the Lipschitz's condition on the kernel H to show
that :
1
x
x
sup fbN
(y) fbN
(ty ) sup
|Hi (y) Hi (ty )| Wij (x),
ySR
ySR n(n 1) hH IE[W12 (x)] i=j
C|y ty |
1
Wij (x) ,
sup
hH
n(n 1) hH IE[W12 (x)]
yS
i=j
ln bx
f .
h2H D
(6)
ln
x
x
sup fbN
(y) fbN
(ty ) C 2
hH
ySIR
Since ln = n
3
12
2
, then
x
x
sup fbN
(y) fbN
(ty ) = o
ySIR
ln n
n hH x (hK )
)
.
(7)
3.6.
75
Appendix
Indeed, the term A3 may be considered as a direct consequence of the following known
inequality :
[
]
bx
x
x
x
sup IE[fN (y)] IE[fbN (ty )] IE sup fbN (y) fbN (ty ) .
(8)
ySR
ySR
)
(
ln
n
bx
x
(ty )] >
IP sup fN (ty ) IE[fbN
n hH x (hK )
ySR
(
)
ln
n
bx
x
= IP
max
fN (ty ) IE[fbN (ty )] >
n hH x (hK )
ty {t1 ,...,tsn }
)
(
ln
n
bx
x
sn
max
IP fN (ty ) IE[fbN
.
(ty )] >
n hH x (hK )
ty {t1 ,...,tsn }
all it remains to compute is the following quantity :
)
(
ln n
bx
x
b
, for all ty {t1 , . . . , tsn }.
IP fN (ty ) IE[fN (ty )] >
n hH x (hK )
This later quantity's value is given by a straightforward adaptation of the proof of Lemma
2 in [1]. To do that, we consider the following decomposition :
(
n
)
n
1 K (x)H (t )
2 (x)
1
K
(x)
j
j
y
i
x
i
fbN
(ty ) =
2 (h )
n(n 1)IE[W12 ] n
hH x (hK )
n
h
x
K
K
j=1
|
{z
}
| i=1 {z
}
|
{z
}
T1
n2 h2K 2x (hK )
T3
T2
(
)
n
n
K
(x)
(x)H
(t
)
1
K
(x)
(x)
1
j
j
j
y
i
i
.
n
hH hK x (hK )
n
h (h )
j=1
i=1 K x K
{z
}
|
{z
}|
T4
T5
It follows that :
x
x
fbN
(ty ) IE[fbN
(ty )] = T1 ( (T2 T3 IE[T2 T3 ]) (T4 T5 IE[T4 T5 ]) ).
76
So, the claimed result will be obtained as soon as the following assertions have been checked :
{
}
ln n
< , for i = 2, 3, 4, 5,
(9)
sn IP |Ti IE[Ti ]| >
n hH x (hK )
n
T1 = O(1)
and
IE[Ti ] = O(1)
for
i = 2, 3, 4, 5,
(10)
(
|IE[T2 ]IE[T3 ] IE[T2 T3 ] IE[T4 ]IE[T5 ] + IE[T4 T5 ]| = o
ln n
n hH x (hK )
)
.
(11)
: For this aim, we use the Bernstein's exponential inequality for which the
main point is to evaluate asymptotically the mth order moment of :
(
[
])
1
Zil,k = l k
Ki (x)Hik (ty )il (x) IE Ki (x)Hik (ty )il (x)
hK hH x (hK )
Proof of (17)
for l = 0, 1, 2, and k = 0, 1.
Notice that, by the Newton's binomial expansion, we obtain :
(
[
])m
IE Ki (x)Hik (ty )il (x) IE Ki (x)Hik (ty )il (x)
m
(
)d ( [
])md
d
= IE
Cm
Ki (x)Hik (ty )il (x)
IE Ki (x)Hik (ty )il (x)
(1)md
d=0
(
)
m
d [
]md
d
Cm
IE Ki (x)Hik (ty )il (x) IE Ki (x)Hik (ty )il (x)
d=0
[
]md
d
Cm
IE K1 (x)1dl (x)IE[H1dk (ty )|X1 ] IE K1 (x)1l (x)IE[H1k (ty )|X1 ]
d=0
[
]
IE H1 (ty )/X = hH
H (t)f X (ty hH t)dt.
IR
[
]md
d
Cx (hK )m+1 .
Cm
IE K1 (x)1dl IE K1 (x)1l
d=0
3.6.
77
Appendix
Thus, to achieve this proof, it suces to use the classical Bernstein's inequality (see Corollary
A8 in [11], page 234), rst with an = (hH x (hK ))1/2 to treat the terms T2 and T4 , and
second with an = (x (hK ))1/2 for the terms T3 and T5 . In conclusion, we obtain for all
>0:
}
{
ln n
2
C nC ,
n hH x (hK )
{
}
ln n
2
IP |T4 IE[T4 ]| >
C nC
n hH x (hK )
and for i = 3, 5
{
ln n
n hH x (hK )
{
IP |Ti IE[Ti ]| >
ln n
n x (hK )
}
C nC .
2
{
}
ln n
C n1 , for i = 2, 3, 4, 5.
sn IP |Ti IE[Ti ]| >
n hH x (hK )
Proofs of (18) and (25).
Notice that, the rst part of (18) has been treated in [1]. We now
proceed in proving the second part of (18) and (25). For this aim, since the pairs (Xi , Yi ),
i = 1, . . . , n are identically distributed, we obtain that :
IE[T
]
=
,
IE[T
]
=
,
2
3
hH x (hK )
hK x (hK )
IE[T4 ] =
,
IE[T5 ] =
,
hK hH x (hK )
hK x (hK )
and
(
)
n(n 1)
2
2
= 1
h2
K x (hK ) IE[K1 (x)1 (x)]IE[K1 (x)H1 (ty )]
n2
Thus, for both equations (18) and (25), we have to evaluate :
[
]
IE Ki (x)Hik (ty )il (x) , for l = 0, 1, 2, and k = 0, 1.
As previously, we condition on X1 to show that, for all l = 0, 1, 2, and k = 0, 1, we have :
[
]
[
]
IE Ki (x)Hik (ty )il (x) = O(hkH IE Ki (x)il (x) )
and by Lemma 3 in [1], we obtain that :
[
]
IE Ki (x)Hik (ty )il (x) = O(hkH hlK x (hK )).
(12)
78
(
IE[T2 ]IE[T3 ] IE[T2 T3 ] IE[T4 ]IE[T5 ] + IE[T4 T5 ] = O
ln n
n hH x (hK )
)
.
Finally, the Lemma 5.3.3 is a direct consequence of the assertions (7), (8), (17), (18) and
(25).
Proof of Lemma 3.4.1. The proof of this lemma is based on the same decomposition's
kind as used to prove Lemma 5.3.3. Indeed,
(
n
)
n
1 K (x)
2 (x)
K
(x)
1
j
i
x
i
fbD = T1
n
x (hK )
n
h2K x (hK )
j=1
i=1
{z
}
|
{z
}|
S4 (x)
S2 (x)
(
)
n
n
K
(x)
(x)
1
K
(x)
(x)
1
j
j
i
i
n
hK x (hK )
n
h (h )
j=1
i=1 K x K
{z
}
|
{z
}|
S3 (x)
S3 (x)
and in the same fashion, all it remains to show are the following uniform convergences :
(
)
ln dn
sup |Sk (x) IE[Sk (x)]| = O
, a.co. for k = 2, 3, 4,
(13)
n x (hK )
xSF
and supxSF |IE[S2 (x)]IE[S4 (x)] IE[S2 (x)S4 (x)] var[S3 (x)]| = o
ln dn
n x (hK )
, a.co. and,
sup |Sk (x) Sk (xj(x) )| + sup |Sk (xj(x) ) IE[Sk (xj(x) )]|
xS
|
{z
} | F
{z
}
xSF
F1k
F2k
We have, then, to evaluate each term Fjk , j = 1, 2, 3. Since F1k and F3k have almost the same
treatment, we will consider the following two items :
3.6.
79
Appendix
F1k and F3k . Firstly, let us analyze the rst term F1k for k =
2, 3, 4. Since K is supported in [1, 1], we can write for all k = 2, 3, 4 that :
Treatment of the terms
sup
Ki (x)ik2 (x)11B(x,hK ) (Xi )
(h
)
nhk2
xS
x
K
F i=1
K
Ki (xj(x) )ik2 (xj(x) )11B(xj(x) ,hK ) (Xi )
n
F1k
C(k 2)
Ki (x)11B(x,hK ) (Xi )
sup
nhk2
K x (hK ) xSF i=1
ik2 (x) ik2 (xj(x) )11B(xj(x) ,hK ) (Xi )
n
1
sup
ik2 ((xj(x) )11B(xj(x) ,hK ) (Xi )
nhk2
(h
)
xS
x K
F i=1
K
Ki (x)11B(x,h ) (Xi ) Ki (xj(x) ) .
K
(Xi ).
j(x) ,hK )
(Xi )
j(x) ,hK )
Thus,
F1k
C sup
xSF
k
F11
+ F12 +
k
F13
)
+ F14 ,
where
C(k 2)
11B(x,hK )B(x ,hK ) (Xi ),
j(x)
n(hK )
n
k
F11
=
i=1
F12 =
C
n(hK )
i=1
C(k 2)
11B(x,hK )B(xj(x) ,hK ) (Xi ).
nhK (hK )
n
k
F13
=
F14 =
C
n(hK )
i=1
n
11B(x
i=1
(Xi ).
(Xi ).
80
[
]
1
k
sup
1
1
(X
)
for F11
(h
)
K xSF
[
]
k
sup
1
1
(X
)
for F12 and F13
i
Zi =
B(x,hK )B(xj(x) ,hK )
h
(h
)
K
K xSF
[
]
(Xi )
sup 11
for F14
Clearly, under the second part of (U1), we have for the rst and the last case :
(
)
(
)
(
)
1
Z1 = O
, IE[Z1 ] = O
and var(Z1 ) = O
.
(hK )
(hK )
((hK ))2
So that, we get :
(
k
F11
=O
(hK )
)
+ Oa.co.
ln n
n (hK )2
)
.
k case
In the same way, assumption (U5) allows to get, for F12 or F13
(
)
(
)
(
)
2
Z1 = O
, IE[Z1 ] = O
and var(Z1 ) = O
,
hK (hK )
hK
h2k (hK )
(
k
F12
= Oa.co.
ln dn
n (hK )
)
.
To achieve the study of the term F1 , it suces to put together all the intermediate
results and to use (U5) to obtain :
(
)
ln
d
n
F1k = Oa.co.
.
(14)
n (hK )
Furthermore, since :
[
F3k
IE
]
sup |Sk (x) Sk (xj(x) )|
xSF
we have also :
(
F3k
= O
ln dn
n(hK )
)
.
(
)
(
)
[
]
ln
d
n
ln dn
IP F2k > n(h
= IP
max |Sk (xj(x) ) IE Sk (xj(x) ) | >
K)
n(hK )
j{1,,dn }
)
(
ln dn
.
dn max IP |Sk (xj ) IE [Sk (xj )] | >
n(hK )
j{1,,dn }
3.6.
81
Appendix
Set,
ki =
(
[
])
1
k2
k2
K
(x
)
(x
)
IE
K
(x
)
(x
)
, for k = 2, 3, 4.
i
i
k i
k
k i
k
nhk2
K (hK )
By using similar proof as for showing Lemma 2 of [1], we get for all j = 1, . . . , dn and
i = 1, . . . , n, that :
(
)
IE|ki |m = O (hK )m+1 , for k = 2, 3, 4.
So, one can apply a Bernstein-type inequality (cf. Corollary A.8 in [11]) to obtain
directly :
(
)
ln dn
IP |Si (xk ) IE [Si (xk )] | > n(h
) K)
(
n
ln dn
1
= IP n | i=1 lki | > n(hK )
2 exp{C 2 ln dn }
Thus, by choosing such that C 2 = , we get :
(
dn
Since
max
k{1,,dn }
ln dn
n(hK )
)
C d1
n .
(15)
d1
< , we obtain that :
n
n=1
(
F2 = Oa.co.
ln dn
n(hK )
)
.
xSF
1
IP inf fbD (x)
xSF
2
Consequently :
n=1
1
IP sup |1 fbD (x)| >
2
xSF
1
IP inf |fb(x)| <
xSF
2
)
.
)
< .
Proof of Lemma 3.4.2. It suces to combine the proofs of the previous lemmas, and
assuming the Lipschitz's condition uniformly on (x, y) in SF SIR .
82
Proof of Lemma 3.4.3. The proof of this lemma follows the steps as for proving Lemma
3.4.1, where S2 (x) and S4 (x) are replaced by :
n
1 Kj (x)Hj (y)
x (y)
T
=
2
n
hH x (hK )
j=1
j=1
T
(y)
=
4
n
h2K hH x (hK )
j=1
To do that, we keep the notations used previously, namely, the denitions of j(x), ty and ln .
The proof is based on the the following decomposition which will be used for the three terms :
x
|Tix (y) IE[Tix (y)]| sup sup Tix (y) Ti j(x) (y)
xSF ySIR
|
{z
}
E1
x
x
+ sup sup Ti j(x) (y) Ti j(x) (ty )
xSF ySIR
|
{z
}
E2
x
x
+ sup sup Ti j(x) (ty ) IE[Ti j(x) (ty )]
xSF ySIR
|
{z
}
E3
x
x
+ sup sup IE[Ti j(x) (ty )] IE[Ti j(x) (y)]
xSF ySIR
|
{z
}
E
4
x
+ sup sup IE[Ti j(x) (y)] IE[Tix (y)] .
xSF ySIR
{z
E5
(16)
Concerning the term E2 , by using the Lipschitz's condition on the kernel H , one can write :
xj(x)
|Ti
xj(x)
(y)Ti
(ty ) C
n hlK hH
(hK ) i =1
ln
Si (xj(x) ),
h2H
where
Si () for i = 2, 3, 4, are dened and treated in Lemma 3.4.1's proof. Thus, by using the facts
1
3
that : lim n hH = and ln = n 2 2 , we obtain :
n+
(
E2 = Oa.co.
ln dn
n1 (hK )
)
and E4 = O
ln dn
n1 (hK )
)
.
(17)
Finally, for the term E3 , we use the same arguments as for proving Lemma 3.4.1, to show
that for all > 0,
(
)
ln dn
IP E3 >
n hH (hK )
3.6.
83
Appendix
)
ln
d
n
= IP
max
max |Tixk (tj ) IE[Tixk (tj )]| >
n hH (hK )
j{1,2,...,sn } k{1,...,dn }
(
)
ln dn
xk
xk
s n dn
max
max IP |Ti (tj ) IE[Ti (tj )]| >
.
n hH (hK )
j{1,2,...,sn } k{1,...,dn }
This last probability can be treated by using the classical Bernstein's inequality, with an =
(hH x (hK ))1/2 . Recall that, the choice of an is motivated by moment of order m of Zil,k
computed in Lemma 5.3.3's proof. That allows, nally, to :
)
(
ln dn
xk
xk
2 exp{C 2 ln dn }.
j sn , IP |Ti (tj ) IE[Ti (tj )]| >
n hH (hK )
( 3 1)
( )
Therefore, since sn = O ln1 = O n 2 + 2 , and by choosing C 2 = one has :
(
)
ln dn
2
xk
xk
sn dn
max
max IP |Ti (tj ) ETi (tj )| >
C sn d1C
.
n
n hH (hK )
j{1,2,...,sn } k{1,...,dn }
By using the fact that lim n hH = and the second part of condition (U5), one obtains :
n+
(
E3 = Oa.co.
ln dn
1
n
(hK )
)
.
(18)
Thus, Lemma 3.4.3's result can be easily deduced from (16), (17) and (18).
(19)
ySR
d (x))j ,
d = f x ((x)) + 1 f x(j) ( (x))((x)
f x ((x))
j!
d . It is clear that, from conditions (U6), (29) and
for some (x) between (x) and (x)
Theorem 3.4.1, we obtain that :
d (x)| 0, a.co.
sup |(x)
xSF
x(j)
IP inf f
( (x)) < < ,
n=1
xSF
and we have :
xSF ySIR
So, the claimed result is a direct consequence of this last inequality together with Theorem
3.4.1's result.
84
Acknowledgment :
Bibliographie
[1] Barrientos-Marin, J. (2007). Some Practical Problems of Recent Nonparametric Procedures : Testing, Estimation, and Application. PhD thesis from the Alicante University
(Spain).
[2] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
[3] Benhenni, K., Ferraty, F., Rachdi, M. and Vieu, P. (2007). Local smoothing regression
with functional data. Computational Statistics, 22, No. 3, Pages 353369.
[4] Bosq, D. (2000). Linear Processes in Function Spaces : Theory and applications. Lecture
Notes in Statistics, 149, Springer.
[5] Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and
scalar response, Journal of Multivariate Analysis, 100, Pages 102111.
[6] Cai, T.-T. and Hall, P. (2006). Prediction in functional linear regression,
Statistics, 34, Pages 21592179.
Annals of
[7] Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With
comments and a rejoinder by the authors. Statist. Sci., 6, Pages 404436.
[8] Ezzahrioui, M. and Ould-Sad, E. (2008). Asymptotic normality of a nonparametric
estimator of the conditional mode function for functional data. J. Nonparametr. Stat.,
20, Pages 318.
[9] Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.
87,
[10] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. London,
Chapman & Hall.
[11] Fan, J. and Yim, T.-H. (2004). A cross-validation method for estimating conditional
densities. Biometrika, 91, Pages 819834.
[12] Ferraty, F., Laksaci, A., Tadj, A., and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. Journal of statistical planning and
inference, 140, Pages 335352.
[13] Ferraty, F., Laksaci, A. and Vieu, P. (2005). Functional times series prediction via
conditional mode. C. R., Math., Acad. Sci. Paris, 340, Pages 389392.
[14] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Stat. Inference Stoch.
Process., 9, Pages 4776.
[15] Ferraty, F. and Vieu, P. (2006). Nonparametric functional
Practice. Springer Series in Statistics. New York.
85
86
[16] Ferraty, F., Van Keilegom, I. and Vieu, P. (2008). On the validity of the bootstrap
in nonparametric functional regression. Scandinavian J. of Statist., 37, No. 2, Pages
286306.
[17] Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for
conditional density functions. J. Nonparametr. Stat., 14, Pages 259278.
[18] Mller, H.-G. and Stadtmller, U. (2005). Generalized functional linear models.
Stat., 33, No. 2, Pages 774805.
Ann.
[19] Ould-Sad, E. and Cai, Z. (2005). Strong uniform consistency of nonparametric estimation of the censored conditional mode function. J. Nonparametr. Stat., 17, Pages
797806.
[20] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data : automatic smoothing parameter selection. Journal of Statistical Planning and Inference,
137, Pages 27842801.
[21] Ramsay, J.-O. and Silverman, B.-W. (1997).
in Statistics. New York.
Kernel Regression.
Springer Series
26, Pages
Chapitre 4
A fast functional locally modeled of
the conditional density and mode in
functional time series
1 3 and
Mustapha Rachdi 4
Keywords and Phrases. Functional data, Local linear estimator, Conditional density, Conditional
mode, Nonparametric model, Small balls probability, Mixing data
AMS Subject Classication. Primary : 62G05, Secondary : 62G07, 62G08, 62G35, 62G20.
4.1
Introduction
Let (Xi , Yi ) for i = 1, . . . , n be n pairs of random variables that we assume drawn from
the pair (X, Y ) which is valued in F R, where F is a semi-metric space equipped with
1. Laboratoire TIMC-IMAG, UMR CNRS
5525,
5525,
87
88 4.
A fast functional locally modeled of the conditional density and mode in functional time series
a semi-metric d. Assume that there exists a regular version of the conditional probability
of Y given X = x for a xed x F , which is absolutely continuous with respect to the
Lebesgue measure on R and has bounded density, denoted by f x . In this paper, we consider
the problem of the conditional density estimation by using locally modeling approach when
the explanatory variable X is functional and when the observations (Yi , Xi )iN are strongly
mixing. In functional statistics, there are several ways for extending the local linear ideas
. Here, we adopt the fast functional locally modeling, introduced by [2] for the regression
analysis, that is, we estimate the conditional density by b
a which is obtained by minimizing
the following quantity :
min
(a,b)IR2
)2
1
1
h1
H H(hH (y Yi )) a b(Xi , x) K(hK (x, Xi ))
i=1
4.2.
89
Main results
In this work, we introduce the local linear nonparametric estimation of the conditional
density and its mode for strongly mixed functional data. To the best of our knowledge the
local linear nonparametric estimation in functional time series has not been addressed so for.
The current work is the rst contribution in this topic. As asymptotic result we establish,
under some general conditions, the almost complete convergence rate of our estimate. The
interest of this work comes mainly from the fact that the main elds of application of
functional statistical methods relate to the analysis of continuous-time stochastic processes.
Our study, for instance, can be applied to predict future values of some process by cutting
the whole past of this process into continuous paths.
The organization of the remainder of the paper is as follows : The following Section is
dedicated to xing notations, hypotheses and the presentation of the main results. Section
3 is concentrated on some conclusions, discussions and applications of our study. The proofs
of the auxiliary results are relegated to the Appendix.
4.2
Main results
We begin by recalling the denition of the strong mixing property. For this we introduce the
following notations. Let Fik (Z) denote the algebra generated by {Zj , i j k} .
{
(n) = sup |IP(A B) IP(A)IP(B)| : A F1k (Z)
The sequence is said to be
-mixing
and
B Fk+n
(Z), k IN .
n .
(n) 0
as
There exist many processes fullling the strong mixing property. We quote, here, the usual
ARMA processes (with innovations satisfying some existing moment conditions) are geometrically strongly mixing, i.e., there exist (0, 1) and C > 0 such that, for any n 1,
(n) C n (see, e.g., Jones (1978)). The threshold models, the EXPAR models (see, Ozaki
(1979)), the simple ARCH models (see Engle (1982)), their GARCH extension (see Bollerslev
(1986)) and the bilinear Markovian models are geometrically strongly mixing under some
general ergodicity conditions. For more details we refer the reader to the monographs of
Bradley (2007) or Dedecker et al. (2007).
Throughout the paper, x denotes a xed point in F , Nx denotes a xed neighborhood of x,
f x(j) denotes the j th order derivative of the conditional density f x . S will be a xed compact
subset of IR, B(x, r) = {x F /|(x , x)| r} and x (r1 , r2 ) = IP(r2 (X, x) r1 ).
Notice that, our nonparametric model will be quite general in the sense that we will just
need the following assumptions :
(H1) For any r > 0, x (r) := x (r, r) > 0
(H2) The conditional density f x is such that : there exist b1 > 0, b2 > 0, (y1 , y2 ) S 2
and (x1 , x2 ) Nx Nx
(
)
|f x1 (y1 ) f x2 (y2 )| Cx | b1 (x1 , x2 )| + |y1 y2 |b2 ,
where Cx is a positive constant depending on x.
90 4.
A fast functional locally modeled of the conditional density and mode in functional time series
(H5) The conditional density of (Yi , Yj ) given (Xi , Xj ) exists and is bounded.
(H6) K is a positive, dierentiable function with support [1, 1].
(H7) H is a positive, bounded, Lipschitzian continuous function, such that :
b2
|t| H(t)dt < and
H 2 (t)dt < .
(H8) The bandwidth hK satises : there exists an integer n0 , such that
1
)
1
d ( 2
n > n0 ,
x (zhK , hK )
z K(z) dz > C3 > 0
x (hK ) 1
dz
and
hK
)
2
(u, x) dP (u)
B(x,hK )
(1/2)
x (hK ) log n
lim hK = 0, lim
= 0,
n
n n hH 2
x (hK )
(3a)
(H10)
+
1/2
1 +1
and 0 > 3a+1
, Cn (a+1) 0 hH x (hK )
Theorem 4.2.1.
(1/2)
(
)
(h
)
log
n
x
K
, a.co.
sup |fbx (y) f x (y)| = O hbK1 + hbH2 + O
2
n
h
H x (hK )
yS
(1)
where
x
fbN
(y) =
1
Wij (x)H(h1
H (y Yi )).
n(n 1)hH IE [W12 (x)]
i=j
(zn )nN bea sequence of real r.v.'s ; we say that zn converges almost completely (a.co.) to zero if,
> 0,
(un )nN be a sequence of positive real numbers ;
n=1 IP (|zn | > 0) < . Moreover,
let
5. Let
implies both almost sure convergence and convergence in probability (cf. Sarda and Vieu (2000) for details).
4.3.
91
Concludes remarks
and
x
fbD
=
1
Wij (x)
n(n 1)IE [W12 (x)]
i=j
and of Lemmas 5.3.1, 5.3.2 and 5.3.3 below, for which the proofs are given in the Appendix.
Lemma 4.2.1.
Under assumptions (H1), (H3), (H4), (H6), (H8) and (H10), we have that :
x
1 fbD
= O
and
> 0,
(1/2)
x (hK ) log n
,
n 2x (hK )
such that
a.co.
(
)
x
IP fbD
< < .
i=1
Lemma 4.2.2.
)
(
x
sup f x (y) IE[fbN
(y)] = O hbK1 + hbH2 .
yS
Lemma 4.2.3.
4.3
(1/2)
x (hK ) log n
x
x
sup fbN
(y) IE[fbN
(y)] = O
, a.co.
n hH 2x (hK )
yS
Concludes remarks
The hypotheses used in this work are not unduly restrictive and
they are rather classical in the setting of nonparametric functional statistics. Indeed,
the conditions (H1), (H3), (H6) and (H8) are the same as those used by [?]. Specically
(H1) is needed to deal with the functional nonparametric of our model by controlling
the concentration properties of the probability measure of the variable X . The latter,
is quantied, here with respect the bi-functional operator which can be related to
the topological structure on the functional space F by taking d = ||. While (H3) is
a mild regularity condition permits to control the shape of locating function . Such
condition is veried, for instance, if we take = . However, as pointed out in [?], this
consideration of = is not very adequate in practice, because these bi-functional
operators do not plays similar role. We return to [?] for more discussions on these
conditions and some examples of and . As usually in nonparametric problems, the
innite dimension of the model is controlled by mean of a smoothness condition (H2).
This condition is needed to evaluate the bias component of the rates of convergence.
The rst part of (H4) is a standard choice for the mixing coecient in time series. While
the second part of this condition measure the local dependence of the observations.
Let us note that this last has been exploited in the expression of the convergence rate.
Assumptions (H7), (H9) and (H7) are standard technical conditions in nonparametric
estimation. They are imposed for the sake of simplicity and brevity of the proofs.
On the assumptions
92 4.
A fast functional locally modeled of the conditional density and mode in functional time series
Corollary 4.3.1.
have :
(1/2)
)
(
b2
b1
x
x
x (hK ) log n , a.co.
sup |fbN
W (y) f (y)| = O hK + hH + O
n hH 2x (hK )
yS
where
x
fbN
W (y)
In the vectorial case, when F = Rp , p 1 and if the probability density of the random variable X (resp. the jointly density of (Xi , Xj ) ) denoted
by f (resp. by fi,j ), is of C 1 class, then x (h) = O(hp ), and x (h) = O(h2p ) which
implies that x (h) = O(h2p ). Then our Theorem leads straitforwardly to the next
Corollary,
The multivariate case
Corollary 4.3.2.
bx
yS
hbK1
hbH2
)
+O
log n
n hH hpK
, a.co.
where
We point out that, in the special case when F = R our estimate is identied to the
estimator of [12] by taking (x, X) = X x and (x, X) = x X .
The independent case In this situation, the conditions (H4), (H5) and the last part
of (H10) are automatically veried and x (h) = x (h) = 2x (h). So, we obtain the
following result
Corollary 4.3.3.
(
)
sup |f (y) f (y)| = O hbK1 + hbH2 + O
bx
yS
log n
n hH x (hK )
, a.co.
yS
In practice, we proceed as follows : let (Zt )t[0,b[ be a continuous time real valued
random process. From Zt we may construct N functional random variables (Xi )i=1,...,N
dened by :
t [0, b[,
Xi (t) = ZN 1 ((i1)b+t) ,
and a real characteristic Yi = G(Xi+1 ). So, we can predict the characteristic YN by
\
the conditional mode estimate Yb = (X
N ) given by using the N 1 pairs of r.v
(Xi , Yi )i=1,...,N 1 . Such prediction is motivated by the following consistency result :
Corollary 4.3.4.
j -times
continuously dierentiable on
(S)
with respect to
y,
x(l)
f ((x)) = 0, if 1 l < j
x(j) () is uniformly continuous
and f
fx
is
and that :
on
(2)
4.4.
93
Appendix
then we get :
4.4
(1/2)
x (hK ) log n
.
n hH 2x (hK )
Appendix
In what follows, when no confusion is possible, we will denote by C and C some strictly
positive generic constants. Moreover, we put, for any x F , and for all i = 1, . . . , n :
Proof of lemma 5.3.1. The proof is based on the same decomposition used by Barrientos
et al .
(2007)
(
n
)
n
2
2
2
2 (x)
n
h
(h
)
K
(x)
K
(x)
1
1
j
K
i
x
x
i
K
fbD =
n(n 1)IE[W12 ] n
x (hK )
n
h2K x (hK )
j=1
i=1
|
{z
}
|
{z
}
|
{z
}
T1
T4
T2
2
n
Kj (x)j (x)
1
n
hK x (hK )
j=1
|
{z
}
T3
(
)
x
x
fbD
IE[fbD
] = T1 ( (T2 T4 IE[T2 T4 ]) T32 IE[T32 ] ).
Moreover, we have
(1/2)
x (hK ) log n
IP |Tl IE[Tl ]| >
< ,
n 2x (hK )
T1 = O(1)
and
IE[Tl ] = O(1)
for
l = 2, 3, 4,
(3)
for
l = 2, 3, 4,
(4)
94 4.
A fast functional locally modeled of the conditional density and mode in functional time series
(5)
(1/2)
x (hK ) log n
.
n 2x (hK )
(6)
Cov(T2 , T4 ) = o
and V ar[T3 ] = o
It is shown that in Barrientos
et al.
(1/2)
x (hK ) log n
n 2x (hK )
(2007) that
ki =
[
])
1 (
k
k
K
(x)
(x)
IE
K
(x)
(x)
for k = 0, 1, 2.
i
i
i
i
hkK
1
ki
Tk+2 IE[Tk+2 ] =
nx (hK )
n
for k = 0, 1, 2.
i=1
1
K (x)|(Xi , x)|k
hkK i
1
K(h1 (x, Xi ))|(Xi , x)|k 1I]1,1[ (h1 (x, Xi ))
hkK
K(h1 (x, Xi )) C
(7)
So, we can apply the Fuck-Nagaev exponential inequality, to get for all r > 0 and > 0, we
have
{
}
n
1
ki >
IP {|Tk+2 IE[Tk+2 ]| > } = IP
nx (hK )
i=1
}
{ n
IP
ki > nx (hK )
i=1
C(A1 + A2 )
where
(
)r/2
2 n2 (x (hK ))2
A1 = 1 +
Sn2 r
and
Sn2
n
n
and
A2 = nr
(8)
r
nx (hK )
i=1 j=1
with
Sn2
i=1 i=j
)a+1
4.4.
95
Appendix
Next, we evaluate the asymptotic behavior of Sn2 . For this we use the technique of Masry
(1986). We dene the sets
where mn , as n . Let J1,n and J2,n be the sum of covariance over S1 and S2
respectively. Because of (H1), (H4) and (7) we can write :
jx+1 j
ux u
[
]1
= (a 1)xa1
we get, under the rst part of
[
]
nma+1
n
.
|J2,n | =
Cov ki , kj C
a
1
(i,j)E2
(9)
i=j
2 log n
Sn
nx (hK )
(10)
A2 Cn1 .
By means of (10), we show that
(
)r/2
(
(
))
2 log n
2 log n
A1 C 1 +
= C exp r/2 log 1 +
r
r
(11)
96 4.
A fast functional locally modeled of the conditional density and mode in functional time series
(
)
2
2 log n
A1 C exp
= Cn /2 .
2
Thus, for large enough :
> 0,
A1 Cn
Hence
Tl IE[Tl ] = Oa.co.
2 /2
Cn1 .
(12)
(1/2)
x (hK ) log n
n 2x (hK )
for l = 2, 3, 4.
(1/2)
(hK ) log n
. Then, the proof of our Lemma is now
n 2x (hK )
[
[ [
]]]
1
IE W12 (x) IE h1
.
H H1 (y)/X
IE[W12 ]
h1
IE
[H
(y)/X]
=
H(t)f X (y hH t)dt,
1
H
R
therefore
H(t)|f X (y hH t) f x (y)|dt.
Hence,
x
y S, |IE[fbN
(y)]f x (y)|
]
[
]
[
1
b1
b2
x
IE W12 (x) IE h1
H H1 (y)/X f (y) C(hK +hH ).
IE[W12 ]
k=1 Sk
4.4.
97
Appendix
x
x
x
x
x
x
sup fbN
(y) IE[fbN
(y)] sup fbN
(y) fbN
(ty ) + sup fbN
(ty ) IE[fbN
(ty )] +
yS
yS
yS
|
{z
} |
{z
}
A
A2
1
bx
x
b
+ sup IE[fN (ty )] IE[fN (y)] .
yS
{z
}
|
(13)
A3
Concerning (A1 ) :
x
x
sup fbN
(y) fbN
(ty ) sup
C|y ty |
1
sup
Wij (x) ,
h
n
h
IE[W
12 (x)]
H
H
yS
yS
yS
i=j
ln bx
f
h2H D
ln
C 2 .
hH
C
(14)
ln /h2H = o
(1/2)
x (hK ) log n
.
n hH 2x (hK )
(1/2)
x (hK ) log n
x
x
sup fbN
(y) fbN
(ty ) = oa.co.
.
n hH 2x (hK )
ySIR
(16)
x
x
IP sup fbN
(ty ) IE[fbN
(ty )] >
yS
= IP
max
ty {t1 ,...,tzn }
(1/2)
x (hK ) log n
n hH 2x (hK )
bx
x
(ty )] >
fN (ty ) IE[fbN
(1/2)
x (hK ) log n
n hH 2x (hK )
(1/2)
x (hK ) log n
x
x
zn
max
IP fbN
(ty ) IE[fbN
(ty )] >
.
n hH 2x (hK )
ty {t1 ,...,tzn }
98 4.
A fast functional locally modeled of the conditional density and mode in functional time series
(1/2)
x (hK ) log n
x
x
(ty ) IE[fbN
(ty )] >
IP fbN
, for all ty {t1 , . . . , tzn }.
n hH 2x (hK )
The later is given by a straightforward adaptation of the proof of Lemma (5.3.1). To
do that, we consider the following decomposition :
)
n
n
1
2 (x)
K
(x)H
(t
)
K
(x)
1
j
j
y
i
x
i
fbN (ty ) =
2 (h )
n(n 1)IE[W12 ] n
hH x (hK )
n
h
x
K
K
j=1
|
{z
}
| i=1 {z
}
|
{z
}
S1
n2 h2K 2x (hK )
S3
S2
(
)
n
n
K
(x)
(x)H
(t
)
K
(x)
(x)
1
1
j
j
j
y
i
i
n
hH hK x (hK )
n
hK x (hK )
j=1
i=1
{z
}
|
{z
}|
S5
S4
Clearly
(1/2)
x (hK ) log n
zn IP |Sk IE[Sk ]| >
< , for k = 2, 3, 4, 5, (17)
n hH x (hK )
n
S1 = O(1)
and
IE[Sk ] = O(1)
for
(h
)
log
n
K
Cov(S2 , S3 ) = o
n 2x (hK )
(1/2)
(h
)
log
n
x
K
.
and Cov(S4 , S5 ) = o
n 2x (hK )
k = 2, 3, 4, 5, (18)
(1/2)
(19)
(20)
Observe that the case k = 3, 5 has been already obtained in Lemma (5.3.1). Thus, we
focus only the case k = 2, 4.
4.4.
99
Appendix
Firstly, for (17), we use the same arguments as those invoked in the proof of Lemma
5.3.1 and we compute asymptotically
sn2 =
|Cov(ki , kj )|,
i=j=1
where
ki =
[
])
1 (
k
k
K
(x)H
(t
)
(x)
IE
K
(x)H
(t
)
(x)
for k = 0, 1.
i
i y i
i
i y i
hkK
where mn . Then
sn2
Cov(ki , kj ) +
S1
Cov(ki , kj ).
S2
mn
(
=
1
2
hH x (hK )
)1/a
,
(21)
100 4.
A fast functional locally modeled of the conditional density and mode in functional time series
The computation of the variance term can be done by following the same ideas as in
bias term given in Lemma (5.3.2) and is based in the fact that
sn =
(22)
i,j=1
Once again, similar arguments as those invoked for proving Lemma 5.3.1 can be used,
and we obtain successively, for all r > 1, and k = 2, 4 :
(
)
zn IP {|Sk IE[Sk ]| > } Cln1 A1 + A2 ,
where
A1
)r/2
(
2
= 4 1 + 2
rsn
and A2 = 4cnr1
( r )a+1
sn2 log n
and r = c log n2 and we use
nhH x (hK )
zn IP
(1/2)
x (hK ) log n
n hH 2x (hK )
< .
(23)
Let us now prove the result (18) and (25). The proof of this last follows exactly along
the same line as the proof of (22).
(1/2)
(h
)
log
n
x
K
Cov(S2 , S3 ) = O
= o
n hH 2x (hK )
(
)
(1/2)
(1/2)
x (hK )
(h
)
log
n
x
K
.
and Cov(S4 , S5 ) = O
= o
n 2x (hK )
n hH 2x (hK )
(
(1/2)
x (hK )
n 2x (hK )
IE[S2 ] =
(24)
(25)
4.4.
101
Appendix
Finally, we arrive at :
(1/2)
(h
)
log
n
x
x
K
x
.
sup fbN
(y) IE[fbN
(ty )] = Oa.co.
2
n
h
H x (hK )
ySIR
(27)
(
)
log
n
bx
x
IP sup IE[fN (y)] IE[fbN
(ty )] >
= 0.
3 n hH x (hK )
yS
(28)
Now, our lemma can be easily deduced from (16), (27) and (28).
(29)
ySR
(
)
IP f x(j) ( (x)) < < ,
n=1
and we have :
So, the claimed result is a direct consequence of this last inequality together with Theorem
3.4.1's result.
102 4.
A fast functional locally modeled of the conditional density and mode in functional time series
Bibliographie
[1] Barrientos-Marin, J. (2007). Some Practical Problems of Recent Nonparametric Procedures : Testing, Estimation, and Application. PhD thesis (in French) from the Paul
Sabatier's University (Toulouse).
[2] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
[3] Benhenni, K., Griche-Hedli, S., Rachdi, M. (2010). Estimation of the regression operator
from functional xed-design with correlated errors. Journal of Multivariate Analysis,
101, Pages 476-490.
[4] Benhenni, K., Ferraty, F., Rachdi, M. and Vieu, P. (2007). Local smoothing regression
with functional data. Computational Statistics, 22, No. 3, Pages 353369.
[5] Bosq, D. (2000). Linear Processes in Function Spaces : Theory and applications. Lecture
Notes in Statistics, 149, Springer.
[6] Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and
scalar response, Journal of Multivariate Analysis, 100, Pages 102111.
[7] Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With
comments and a rejoinder by the authors. Statist. Sci., 6 (1991), Pages 404436.
[8] Dabo-Niang, S. and Laksaci, A. (2007). Estimation non paramtrique du mode conditionnel pour variable explicative fonctionnelle. Pub. Inst. Stat. Univ. Paris, 3, Pages
2742.
[9] El Methni, M. and Rachdi, M. (2010). Local weighted average estimation of the regression operator for functional data. Commun. Stat., Theory and Methods, to appear.
[10] Ezzahrioui, M. and Ould-Sad, E. (2008). Asymptotic normality of a nonparametric
estimator of the conditional mode function for functional data. J. Nonparametr. Stat.,
20, Pages 318.
[11] Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.
87,
[12] Fan, J. and Yim, T.-H. (2004). A cross-validation method for estimating conditional
densities. Biometrika, 91, Pages 819834.
[13] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. London,
Chapman & Hall.
[14] Ferraty, F., Goia, A. and Vieu, P. (2002). Functional nonparametric model for time
series : a fractal approach to dimension reduction. TEST, 11, No. 2, Pages 317344.
[15] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, Ph. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. Journal of statistical planning and
inference, 140, Pages 335352.
103
104 4.
A fast functional locally modeled of the conditional density and mode in functional time series
[16] Ferraty, F., Laksaci, A. and Vieu, P. (2005). Functional times series prediction via
conditional mode. C. R., Math., Acad. Sci. Paris, 340, Pages 389392.
[17] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Stat. Inference Stoch.
Process., 9, Pages 4776.
[18] Ferraty, F. and Vieu, P. (2006). Nonparametric functional
Practice. Springer Series in Statistics. New York.
[19] Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for
conditional density functions. J. Nonparametr. Stat., 14, Pages 259278.
[20] Laksaci, A. (2007). Convergence en moyenne quadratique de l'estimateur noyau de
la densit conditionnelle avec variable explicative fonctionnelle. Pub. Inst. Stat. Univ.
Paris, 3, Pages 6980.
[21] Laksaci, A., Madani, F. and Rachdi, M. (2012). Kernel conditional density estimation
when the regressor is valued in a semi-metric space Communications in StatisticsTheory
and Methods. (to appear).
[22] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2011). Functional data : Local
linear estimation of the conditional density and its application Statistics, Volume 00,
Pages 00-00, DOI : 10.1080/02331888.2011.568117.
[23] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2010) Local linear estimation
of the conditional density for functional data. C. R., Math., Acad. Sci. Paris, 348,
Pages 931-934.
[24] Mller, H.-G. and Stadtmller, U. (2005). Generalized functional linear models.
Stat., 33, No. 2, Pages 774-805.
Ann.
[25] Ouassou, I. and Rachdi, M. (2010). Stein type estimation of the regression operator
for functional data. Advances and Applications in Statistical Sciences, 1, No 2, Pages
233-250.
[26] Ould-Sad, E. and Cai Z. (2005). Strong uniform consistency of nonparametric estimation of the censored conditional mode function. J. Nonparametr. Stat., 17, Pages
797-806.
[27] Rachdi, M. and Sabre, R. (2000). Consistent estimates of the mode of the probability
density function in nonparametric deconvolution problems. Statist. Probab. Lett., 47,
Pages 105114.
[28] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data : automatic smoothing parameter selection. Journal of Statistical Planning and Inference,
137, Pages 27842801.
[29] Ramsay, J.-O. and Silverman, B.-W. (1997).
in Statistics. New York.
Springer Series
Kernel Regression.
faiblement dpendants.
4.4.
105
Appendix
26, Pages
106 4.
A fast functional locally modeled of the conditional density and mode in functional time series
Chapitre 5
On the quadratic error of the
functional local linear estimate of the
conditional density
5.1
Introduction
The observation of functional variables has become usual due, for instance, to the development of measuring instruments that allow one to observe variables at ner and ner resolutions. Then, as technology progresses, we are able to handle larger and larger datasets.
At the same time, monitoring devices such as electronic equipment and sensors (for registering images, temperature, etc.) have become more and more sophisticated. This high-tech
revolution oers the opportunity to observe phenomena in an increasingly accurate way by
producing statistical units sampled over a ner and ner grid, with the measurement points
so close that the data can be considered as observations varying over a continuum. Such
continuous (or functional) data may occur in biomechanics (e.g. human movements), chemometrics (e.g. spectrometric curves), econometrics (e.g. the stock market index), geophysics
(e.g. spatio-temporal events such as El Nio or time series of satellite images), or medicine
(electro-cardiograms/electro-encephalograms). It is well known that standard multivariate
statistical analyses fail with functional data. However, the great potential for applications
has encouraged new methodologies able to extract relevant information from functional datasets. This Handbook aims to present a state of the art exploration of this high-tech eld,
by gathering together most of major advances in this area. The main statistical topics (classication, inference, factor-based analysis, regression modelling, resampling methods, time
series, random processes) are covered in the setting of functional data. The twin challenges
of the subject are the practical issues of implementing new methodologies and the theoretical techniques needed to expand the mathematical foundations and toolboxes. This chapter
and the following, therefore, mixes practical, methodological and theoretical aspects of the
subject, sometimes within the same chapter (cf. Chapter 2 before). As a consequence, these
results should appeal to a wide audience of engineers, practitioners and graduate students,
as well as academic researchers, not only in statistics and probability but also in numerous
related application areas.
107
108
It seems, then, natural to assume that the data are actually observations from a random
variable taking values in a functional space. In this chapter, we are interested in the local
polynomial modeling of the conditional density function when the explanatory variable is of
functional type. Such study is motivated by the fact that the local polynomial smoothing
has various advantages over the kernel method, namely this method has superior bias properties to the previous one (see, for example Chu and Marron (1991) and Fan (1992) for an
extensive discussion on the comparison between both these methods). Moreover, as noticed
by Fan and Yao (2003) that the conditional density provides a very informative summary
of response variable that allows us to examine the overall shape of the conditional distribution. In the nonparametric functional statistics, the rst results about the conditional
distribution were obtained in Ferraty et al. (2006). In this last, it is established the almost
complete convergence of the kernel estimator of the conditional density and its derivatives.
the quadratic error of this estimate has been studied by Laksaci (2007). The latter gave the
asymptotic expansion of the exact expression involved in the leading terms of the quadratic
error of the considered estimate. Recently, Ferraty et al. (2010) stated the uniform almost
complete convergence of the kernel estimate of some nonparametric conditional models, in
particular, of the conditional density model.
Since the open question (How can the local polynomial ideas be adapted to innite dimensional settings ?) stated by Ferraty and Vieu (2006), the local linear smoothing in the
functional data setting, have been considered by many authors. We cite, for instance Barrientos et al. (2010), Ballo and Gran (2009), El Methni and Rachdi (2011) which are concerned
with the local linear type regression operator estimation for independent and identically distributed functional data. While, the rst contribution on the local polynomial modeling of
the conditional density function when the explanatory variable is functional were considered
by Demongeot et al. (2010). The authors established the almost complete consistency (in
pointwise and uniform) of a fast functional local linear estimate of the conditional density
when the explanatory variable is functional and the observations are i.i.d. Their study is
extended to dependent case by Demongeot et al. (2011).
In this chapter, we give the convergence rate in mean square of of a fast functional local
linear estimate considered by Demongeot et al. (2010). The expression of this convergence
rate shows the superiority of this method with respect to the kernel method, namely in the
bias terms. It should be noted that the accuracy of our asymptotic results leads to interesting perspectives from a practical point of view, in particular, minimizing mean squared
errors can govern automatic bandwidth selection procedures.
We present our model in Section 5.2. In Section 5.3 we give some notations, hypotheses
and the presentation of the main results. Section 5.4 is devoted to some discussions and
comments on the result. The proofs of the results are relegated to the last section of this
chapter.
5.2
The model
Let us introduce n pairs of random variables (Xi , Yi ) for i = 1, . . . , n that we assume drawn
from the pair (X, Y ) which is valued in F R, where F is a semi-metric space equipped
with a semi-metric d.
5.3.
109
Main results
Furthermore, we assume that there exists a regular version of the conditional probability
of Y given X , which is absolutely continuous with respect to the Lebesgue measure on
R and has two-times continuously dierentiable density, denoted by f x . Local polynomial
smoothing is based on the assumption that functional parameter is smooth enough to be
locally well approximated by a polynomial. In functional statistics, there are several ways for
extending the local linear ideas (cf. Barrientos et al.(2010), Ballo. and Gran (2009)). Here
we adopt the fast functional locally modeling, that is, we estimate the conditional density
f x by b
a which is obtained by minimizing the following quantity :
min
(a,b)IR2
)2
1
1
h1
H H(hH (y Yi )) a b(Xi , x) K(hK (x, Xi ))
(1)
i=1
where (., .) (resp. (., .) ) is a known operator from F 2 into IR such that, F , (, ) = 0
(resp. (, ) = 0), with K and H are kernels and hK = hK,n (resp. hH = hH,n ) is chosen
as a sequence of positive real numbers. Clearly, by a simple algebra, we get explicitly the
following denition of fbx :
fbx (y) =
1
i,j=1 Wij (x)H(hH (y
Yi ))
(2)
where
1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))
5.3
Main results
and
t [1, 1] lim
h0
x (h, th)
= x (t)
x (h)
(H2) For l {0, 2}, the quantities l (0) exist, where l denotes the rst derivative of
l .
110
(
)
and
hK B(x,hK ) (u, x)dP (u) = o B(x,hK ) 2 (u, x) dP (u)
where B(x, r) = {z F/|(x, z)| r} and dP (x) is the probability distribution of X .
(H4) K is a positive, dierentiable function supported withim [1, 1]. Its derivative K
satises K (t) < 0, for 1 t < 1, and K(1) > 0.
1 et t2 H(t)dt < ,
H(t)dt =
|(u, x) (x, u)| (u, x)
1 .
r
(x, u)
The rest of (H3) has been introduced and commented by Barrientos et al. (2010) by giving
a several examples of bi-functional operators and satisng this condition. Conditions
(H4)-(H6) are standards and classically used in the context of quadratic errors in functional
statistic.
Theorem 5.3.1.
[
]2
2
2
E fx (y) f x (y)
= BH
(x, y)h4H + BK
(x, y)h2K
+
VHK (x, y)
+ o(h4H ) + o(h2K ) + o
nhH x (hK )
1
n hH x (hK )
with
1 2 fYX (x, y)
BH (x, y) =
t2 H(t)dt
2
y 2
(
)
1
K(1) 1 (u3 K(u)) x (u)du
)
BK (x, y) = 0 (0) (
1
K(1) 1 (u2 K(u)) x (u)du
(3)
5.4.
111
and
(
)
VHK (x, y) = fYX (x, y)
H 2 (t)dt (
)2
1
K(1) 1 (u2 K(u)) x (u)du
)
If we set that :
n
n
1
1
b
Wji Hi (y) and fD (x) =
Wij
n (n 1)EW12
n (n 1)EW12
i=,1
i=j,1
(
)
where Ki = K h1
(x,
X
)
, Hi (y) = H(h1
i
K
H (y Yi )), for all i = 1, . . . , n, then we obtain
the following lemmas which will be useful for Theorem 5.3.1's proof.
fbN (x, y) =
Lemma 5.3.1.
Lemma 5.3.2.
Lemma 5.3.3.
[
]
E fbN (x, y) fYX (x, y) = BH (x, y)h2H + BK (x, y)hK + o(h2H ) + o(hK )
(
)
[
]
V
(x,
y)
1
HK
V ar fbN (x, y) =
+o
nhH x (hK )
nhH x (hK )
Cov(fbN (x, y), fbD (x)) = O
Lemma 5.3.4.
5.4
1.
1
nx (hK )
(
[
]
b
V ar fD (x) = O
1
nx (hK )
In the present work, the functional space of our model is characterized by the regularity condition (H2). Of course this condition is closely related to the existence of
l f X (,y)
Y
the dierentiability of the operators y
and fYX (, y) (cf. Ferraty et al. (2007) for
l
more discussions on the link between the existence of the derivative of l and l ). It
should be noted that, this condition is used in order to keep the usual form of the
quadratic error (cf. Vieu, 1991). However, if we replace (H2) by a Lipschitz condition
as :
(y1 , y2 ) Ny Ny (x1 , x2 ) Nx Nx ,
(
)
fYX (x1 , y1 ) fYX (x2 , y2 ) C |(x1 , x2 )|2 + |y1 y2 |2
which is less restrictive than the condition (H2), we obtain a result as follows
)
(
[
]2
( 4
)
1
X
X
2
b
fY (x, y) fY (x, y) = O hH + hK + O
.
nhH x (hK )
But, such expression of the convergence rate is inexact and can not be used to determine the smoothing parameter. In other words, this condition of the dierentiability is
a good compromise for obtaining an expression asymptotically exact of the convergence
rate.
112
5.5
Proofs
[
]2 [ (
)
]2
[
]
E fbYX (x, y) fYX (x, y) = E fbYX (x, y) fYX (x, y) + V ar fbYX (x, y)
the proof of this Theorem is based on the separate calculate separately of the two parts :
bias and variance terms. Then we have to distinguish two stages in this proof. Firstly, for
the bias term, we recall that, for all z = 0, p N , we can write :
1
(z 1)p+1
= 1 (z 1) + . . . + (1)p (z 1)p + (1)p+1
z
z
By using this decomposition for z = fbD (x) and p = 1 we show that
(
)
fbYX (x, y) fYX (x, y) = fbN (x, y) fYX (x, y)
(
)(
)
fbN (x, y) E fbN (x, y) fbD (x) 1
(
)
(E fbN (x, y)) fbD (x) 1
(
)2
+ fbD (x) 1 fbYX (x, y)
(4)
(
)2
+E fbD (x) E fbD (x) fbYX (x, y)
As the kernel H is bounded, we can nd a constant C > 0 such that fbYX (x, y) Ch1
H .
Hence,
[
]
(
)
E fbX (x, y) f X (x, y) = E fbN (x, y) f X (x, y) Cov(fbN (x, y), fbD (x))
Y
(
)
+V ar fbD (x) O(h1
H ).
Secondly, concerning the variance term, we use similar ideas as those used by Sarda and
Vieu (2000), and Bosq et Lecoutre (1987) to deduce that :
[
]
[
]
V ar fbYX (x, y) = V ar fbN (x, y)
1
nhH (hK )
5.5.
113
Proofs
Finally, the proof of Theorem 5.3.1 becomes a direct consequence of Lemmas 5.3.1, 5.3.2,
5.3.3 and 5.3.4.
[
]
E fbN (x, y) = E
1
n(n 1)E[W12 ]
Wij Hi =
j=i,1
E[W12 H1 ]
1
=
E [W12 E[H1 /X]]
E[W12 ]
E[W12 ]
(5)
To evaluate the quantity E[H1 /X], we use the usual change of variable t = h1
H (y z). Thus,
(
)
yz
1
X
H
fY (X, z)dz = H(t)fYX (X, y hH t)dt.
E[H1 /X] =
hH
hH
As fYX (X, .) is of class C 2 in y , then, we can use the Taylor development of order two as
follows
E[H1 /X] =
fYX (X, y)
h2 2 fYX (X, y)
+ H
2
y 2
h2
E[H1 /X] = 0 (X, y) + H
2
It follows, from (5), that :
[
]
E fbN (x, y) =
t2 H(t)dt + o(h2H ).
)
(
)
1
E [W12 0 (X, y)] + E [W12 2 (X, y)] + o(h2H )
E[W12 ]
Now, by the same arguments as those used by in Barrientos et al. (2010) for the regression
function, we show that :
114
E [W12 (l ((x, X))] = l (0)E [(x, X)W12 ] + o(E [(x, X)W12 ]).
Hence, [
(
)
]
h2 2 fYX (X,y) 2
2 IE[(x,X)W12 ]
E fbN (x, y) = fYX (x, y) + 2H
t
H(t)dt
+
o
h
2
H
E[W12 ]
y
IE [(x, X)W12 ]
+0 (0)
E[W12 ]
(
+o
E [(x, X)W12 ]
E[W12 ]
It is clear that,
[
]
E [(x, X)W12 ] = E K1 12 1 EK1 E[K1 1 ]E[K1 1 1 ]E [W12 ]
E[K1a 1 ] C
(u, x)dP (u)
B(x,hK )
hK E[K1a 1 ]
)
2
=o
= o(h2K x (hK ))
(6)
[
]
E[K1a 1b ] = E[K1a b (x, X)] + E K1 ( b (X, x) b (x, X))
=E
(x, X))(
bl
l=1
sup
uB(x,hK )
[
]
E K1a 11B(x,hK ) ||bl (X, x)||l (x, X))
l=1
then,
5.5.
115
Proofs
v b K a (v)dP hK
1 [
1
K a (1)
(x,X) (v)
1(
v
) ]
1
(ub K a (u)) du dP hK (x,X) (v)
(
)
1
= K(1)x (hK ) 1 (ub K a (u)) x (hK , uhK )du
(
)
1
K ,uhK )
du
= x (hK ) K(1) 1 (ub K a (u)) x (h
x (hK )
Finally, under (H1), we obtain :
(
)
1
a b
b
b a
(7)
It follows that :
(
)
1
E[W12 ] = h2K 2x (hK ) K(1) 1 (u2 K(u)) x (u)du
(
K(1)
and
)
(u) (u)du + o(h2 2 (h ))
K
x
K x K
1
(
)
1
E [(x, X)W12 ] = h3K 2x (hK ) K(1) 1 (u3 K(u)) x (u)du
(
K(1)
)
(u) (u)du + o(h3 2 (h )).
K
x
K x K
1
Consequently
[
]
E fbN (x, y) = fYX (x, y) +
+hK 0 (0)
t2 H(t)dt + o(h2H )
1
(u3 K(u)) x (u)du)
(K(1) 1
1
+ o(h2K )
(K(1) 1 (u2 K(u)) x (u)du)
1
V ar
(nhH (n 1)E[W12 ])2
Wij Hi
(8)
i=j =1
(
1
2
2
2 n(n 1)E[W12 H1 ] + n(n 1)E[W12 W21 H1 H2 ]
(n(n 1)hH (EW12 ))
n(n 1)(n 2)E[W12 W13 H12 ] + n(n 1)(n 2)E[W12 W23 H1 H2 ]
(9)
n(n 1)(n 2)E[W12 W31 H1 H3 ] + n(n 1)(n 2)E[W12 W32 H1 H3(] 10)
n(n 1)(4n 6)(E[W12 H1 ])2
(11)
116
E[W12 H1 ]
= O(1). Furthermore, by a simple maniE[W12 ]
pulation and by using (6) and (7) we arrive at :
Observe that the previous lemma gives
2 H 2]
E[W12
1
E[W12 W21 H1 H2 ]
E[W12 W23 H1 H2 ]
hf ill
E[W12 W31 H1 H3 ]
E[W12 W32 H1 H3 ]
(
)
Therefore, the second quantity is the leading term in V ar fbN (x, y) . This term can be
evaluated, by the same arguments used in the pervious proof. Indeed :
[
]
E[14 K12 H12 ] = E 14 K12 E(H12 /X)
)
]
[
(
X (X, z)dz
f
= E 14 K12 2 yz
Y
hH
[
]
= hH E 14 K12 2 (t)fYX (X, y hH t)dt .
From the rst order Taylor's expansion, we have
[
]
(
)
H 2 (t)dt E 14 K12 fYX (X, y) + o hH E[14 K12 ] .
Once again we follow the same steps as in the previous Lemma to write
[
]
E 14 K12 fYX (X, y) = fYX (x, y)E[14 K12 ] + o(E[14 K12 ])
which implies that
(
)
(
)
H 2 (t)dt E[14 K12 ] + o hH E[14 K12 ] .
5.5.
117
Proofs
Proof of Lemma 5.3.3 The proof of this lemma is very similar to the proof of the Lemma
n
n
1
Cov(
W
H
,
ij
i
2
i =j =1 Wi j )
i
=
j
1
hH (n(n1)E[W12 ])
=
1
hH (n(n1)EW12 )2
2 H ]
n(n 1)E[W12
1
2 H ]
E[W12
= O(h4K hH 2x (hK )),
(13)
118
1
(n(n1)(E[W12 ]))2
(
V ar
(
n
1
(n(n1)(EW12 ))2
))
i=j =1 Wij
2 ]
n(n 1)E[W12
2 ]
E[W12
= O(h4K 2x (hK )),
4 2
E[W W ] = O(h4 2 (h ))
12 32
K x K
we have that :
(
)
b
V ar fD (x) = O
(
1
nx (hK )
)
.
Bibliographie
[1] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
[2] Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and
scalar response, Journal of Multivariate Analysis, 100, Pages 102111.
[3] Bosq, D., Lecoutre, J. P., (1987), Thorie de l'estimation fonctionnelle., Ed. Economica.
[4] Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With
comments and a rejoinder by the authors. Statist. Sci., 6, Pages 404436.
[5] El Methni, M. and Rachdi, M. (2010). Local weighted average estimation of the regression operator for functional data. Commun. Stat., Theory and Methods, Volume 00,
Pages 0000.
[6] Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.
87,
[7] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densiities and senscitivity measures in nonlinear dynamical systems. Biometrika, 83, Pages 189-206.
[8] Fan, J. and Yao, Q. (2003). Nolinear
Methods, Springer-Verlag, New York.
[9] Ferraty, F., Laksaci, A., Tadj, A., and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. Journal of statistical planning and
inference, 140, Pages 335352.
[10] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Stat. Inference Stoch.
Process., 9, Pages 4776.
[11] Ferraty, F. and Vieu, P. (2006). Nonparametric functional
Practice. Springer Series in Statistics. New York.
119
J.
120
Chapitre 6
Estimation locale linaire des
paramtres conditionnels pour des
donnes fonctionnelles : Application
sur des donnes simules et relles
La densit conditionnelle est un outil fondamental pour dcrire la relation entre deux variables alatoires. Dans ce chapitre, nous allons dterminer ce lien par la mthode d'estimation non paramtrique locale linaire. L'objectif principal est de montrer l'aide de donnes
simules puis relles l'applicabilit de cette mthode dans le cadre fonctionnel. Dans un
premier temps, nous illustrons le mode conditionnel comme outil de prvision trs li l'estimation de la densit conditionnelle. Ensuite, nous proposons une implmentation facile et
rapide de l'estimateur de la densit conditionnelle propos dans le chapitre 2. Enn, une
application sur des donnes relles prouvera la supriorit de la mthode d'estimation par
polynmes locaux sur la mthode noyau. Nous abordons galement, dans cette tude, de
nombreuses questions d'ordre pratique telles que le choix du paramtre de lissage et des
deux oprateurs et (cf. les chapitres prcdentes). C'est pourquoi, nous numrerons
direntes mthodes permettant de faire une slection optimale de ces lments.
6.1
121
122
20
40
60
80
100
Time
b
(x)
= argy max fb(hK ,hH ) (x, y)
1
i,j=1 Wij (x)H(hH (y
n
hH i,j=1 Wij (x)
Yi ))
1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))
6.1.
123
(1)
o i dsigne l'indice de la courbe la plus proche de Xi parmi toutes les courbes de l'chantillon d'apprentissage. Plus prcisment, ces paramtres de lissage sont slectionns en minimisant le critre (1), sur l'ensemble des plus proches voisins. Notons que cette mthode de
slection est trs compatible avec nos jeux des donnes. Cependant, notre connaissance, il
n'y a pas eu d'tude thorique qui permet de prouver l'optimalit de cette mthode, mme
lorsqu'on utilise la mthode noyau. D'autre part, le choix des oprateurs et joue un
rle essentiel. Concrtement, nos rsultats thoriques orent, sous certaines conditions, une
grande souplesse dans le choix de ces deux oprateurs (cf. les chapitres 2-4). Dans notre
cas, nous nous sommes concentrs sur le cas o et sont issus de la mtrique de l'indice
fonctionnel. Plus prcisment, nous considrons les oprateurs :
1
t (Xi X).
(Xi X)
100
100
i=1
Nous renvoyons Barrientos et al. (2010) pour plus de discussions sur l'importance et la
motivation de ce choix. Finalement, l'oprateur est choisi en fonction des deux paramtres :
q : l'ordre de la drive des donnes fonctionnelles.
q1 : l'ordre de la valeur propre associe au vecteur propre fonctionnel 1
De mme pour l'oprateur , nous pouvons prendre aussi :
124
14
12
10
Responses
16
18
20
10
12
14
16
18
Predicted values
125
10
12
14
16
18
20
Responses
6.1.
10
12
14
16
18
Predicted values
Figure 6.3 Les rsultats pour (x, x ) =< 2 , x x > o (q1 , q2 ) = (2, 1)
126
550
1
b i ))2 = 0.27
(Yi (X
50
i=501
6.2
Dans ce paragraphe, nous gardons le mme jeu de donnes que dans la section prcdente
et nous comparons l'estimateur :
1
i,j=1 Wij (x)H(hH (y
n
hH i,j=1 Wij (x)
Yi ))
1
1
f x (y) = exp( (y r(x))2 )
2
2
Cette illustration est motive par le fait que la densit conditionnelle pourrait tre utilise
d'autres nalits (le test de multi-modalit des donnes, l'estimation de la fonction de
hasard,...) et pas seulement comme tant une tape prliminaire l'estimation du mode
conditionnel. Il est donc trs intressant de montrer l'applicabilit de cette mthode sans
avoir comme objet fondamental l'obtention du mode conditionnel. Le d principal est de
trouver des critres de choix dirents de ceux proposs dans le cas prcdent. Pour cela,
nous proposons d'utiliser des ides similaires celles vues dans le deuxime chapitre pour
la mthode du noyau. Autrement dit, le critre de choix naturel des paramtres hK et hH
est bas sur la minimisation des erreurs :
(
)2
b
b
d1 (f(hK ,hH ) , f ) =
f(hK ,hH ) (x, y) f (x, y) W1 (x)W2 (y) dPX (x) dy
d2 (fb(hK ,hH ) , f ) =
)2 W (X )W (Y )
1 (b
1
i
2 i
f(hK ,hH ) (Xi , Yi ) f (Xi , Yi )
n
f (Xi , Yi )
n
i=1
d3 (fb(hK ,hH ) , f ) =
(
)2
IE fb(hK ,hH ) (x, y) f (x, y) W1 (x)W2 (y)dPX (x)dy
6.3.
127
n
n
1
2 bi
i2
b
W1 (Xi ) f(hK ,hH ) (Xi , y)W2 (y)dy
f(hK ,hH ) (Xi , Yi )W1 (Xi )W2 (Yi )
n
n
i=1
i=1
o
k
(Xk , y) =
fb(h
K ,hH )
1
k=i,j=1 Wij (Xk )H(hH (y
n
hH i,j=1 Wij (Xk )
Yi ))
Il s'agit, en fait, d'une adaptation de l'tude eectue dans le chapitre 2 sur l'estimateur
noyau de la densit conditionnelle. Avec cette technique, nous obtenons des rsultats de
simulation satisfaisants mais l'optimalit asymptotique de la mthode reste prouver (il
s'agit donc d'une question ouverte). Dans ce qui suit, nous gardons les mmes oprateurs
et , et nous supposons que W1 = W2 1. Enn, nous utilisons les 119 observations
(Xi , Yi ) pour calculer l'estimateur de f (y|X120 ) quand y appartient l'intervalle [0.9
min (Yi ), 1.1 max (Yi )]. Les rsultats de nos investigations gurent dans la Figure
i=1,...,119
6.4.
i=1,...,119
An de mettre en vidence le rle, crucial, des paramtres de lissage dans cette estimation,
nous perturbons ce choix en considrant deux couples de valeurs arbitraires : (1) un couple de
valeurs plus petites que les paramtres optimaux (hK , hH ) = (0.29, 1.40) et (2) l'autre couple
comporte des valeurs plus grandes que les valeurs fournies par notre critre, c'est--dire, nous
considrons le couple (hK , hH ) = (0.66, 3.40). Nous remarquons que le couple de paramtres
optimaux relativement notre critre fournit des rsultats d'estimation nettement meilleurs,
car on aboutit une erreur quadratique moyenne MSE = 0.002, comme erreur d'estimation,
alors que pour le cas (1) nous obtenons MSE = 0.01 et pour le cas (2) l'erreur MSE = 0.006.
6.3
Dans cette Section, nous utilisons un jeu de donnes relles an d'atteindre deux buts. La
premire nalit est de donner la bonne mise en application de notre technique sur le plan
pratique, et la seconde est de fournir une comparaison de l'estimation du mode conditionnel
par la mthode du noyau celle obtenue par la mthode d'estimation par polynmes locaux.
Pour ce faire, nous considrons les courbes spectromtriques de masse de 197 morceaux de
viande et nous xons comme objectif la prvision du taux de matire grasse, Y , dans un
morceau de viande connaissant sa courbe spectromtrique de masse, X . Ces courbes X sont
prsentes dans la Figure 6.5.
Plus prcisment, dans ce qui suit, nous comparons la prvision via le mode conditionnel, en
utilisant les deux quantits bF LM (estimation du mode par la mthode d'estimation fonctionnelle localement linaire) et bKM (estimation du mode par la mthode d'estimation
noyau). Pour ceci, nous partageons les 197 observations, disponibles sur le site internet du
groupe de travail STAPH de l'Universit Paul Sabatier (Toulouse 3) 1 , comme suit : nous
1. http ://www.math.univ-toulouse.fr/staph/npfda/
128
0.0
0.2
0.4
20
40
60
80
100
80
100
80
100
Time
0.0
0.2
0.4
20
40
60
Time
0.0
0.4
0.8
20
40
60
Time
continue
K ,hH )
est en ligne
129
Absorbances
4
6.3.
850
900
950
wavelengths
1000
1050
(x, x ) = d(x, x ) =
(x(q) x(q) )2
Il est rappeler que cette mtrique pour q = 2 peut tre considre comme la plus adapte
pour ce type de donnes. Aussi, nous constatons, ici, que l'oprateur associ aux mmes
valeurs (que prcdment) i.e., q = 2 et q1 = 69, donne des rsultats de prvision optimales
relativement l'erreur :
197
1
(Yi bF LM (Xi ))
MSE(FLM) =
17
i=171
Par ailleurs, pour calculer bKM , nous utilisons le programme d Ferraty et Vieu (2006)
tlchargeable sur le site http ://www.math.univ-toulouse.fr/staph/npfda/ STAPH pour la
mme mtrique d (cf. note de bas de page pour l'adresse Internet). En considrant un noyau
quadratique, nous obtenons les rsultats conns dans la Figure 6.6.
A partir des rsultats obtenus, nous pouvons conclure clairement que, la mthode d'estimation par polynmes locaux est nettement meilleure et plus performante que la mthode
d'estimation noyau : en fait, on obtient MSE(FLM) = 3.84 contre MSE(KM) = 5.42. De
130
30
10
20
Responses
30
20
10
Responses
40
Mthode du noyau,
M.S.E=5.42
40
10
20
30
40
Predicted values
10 20 30 40 50
Predicted values
Figure 6.6 Les rsultats de prvision pour les deux mthodes d'estimation
6.3.
131
plus, nos rsultats sont aussi comparables d'autres outils de prvision tels que la rgression
et la mdiane pour lesquelles l'erreur de prvision est de 3.5 pour la rgression et de 3.44
pour la mdiane conditionnelle (cf. Ferraty and Vieu, 2006).
132
Chapitre 7
Conclusion et Perspectives
7.1
Conclusion
Dans cette thse, nous avons abord une tude globale de l'estimation de la densit conditionnelle quand la variable explicative fonctionnelle. Deux mthodes d'estimation ont t
considres, la premire concerne la mthode de noyau tandis que la deuxime concerne la
mthode d'estimation par polynmes locaux.
Pour la premire mthode nous avons tudi une question cruciale dans l'estimation non
paramtrique par la mthode noyau. Il s'agit du problme de choix du paramtre de lissage. A ce sujet, nous avons propos une mthode de slection automatique pour les deux
paramtres de lissages. L'optimalit asymptotique de notre mthode est obtenue sous des
conditions standards en statistique fonctionnelle. En pratique, cette mthode est ecace,
trs facile implmenter, et elle s'execute rapidement. De plus, notre critre de choix est
utilisable pour d'autres modles non paramtriques lis la densit conditionnelle. Ainsi,
on peut dire que notre contribution donne une rponse pertinente la question de Ferraty
et Vieu (2006) et elle ouvre aussi des perspectives sur de nombreuses questions de recherche
(cf. le chapitre 2).
Dans la deuxime partie, nous avons considr une autre approche pour l'estimation de la
densit conditionnelle quand les donnes sont fonctionnelles. L'estimateur propos est une
gnralisation au cas fonctionnel de l'estimateur par local linaire introduit par Fan et Gijbels
(1996). Comme rsultats asymptotiques nous avons tabli la vitesse de convergence presque
complte (ponctuelle et uniforme) et nous avons donn l'expression asymptotiquement exacte
de l'erreur quadratique de cet estimateur. Les expressions des vitesses de convergence obtenues ont la mme forme que l'estimateur noyau, dont les deux structures fonctionnelles
sont bien exploites. Plus prcisment, la dimensionalit du modle dans la partie biais, alors
que la dimensionalit de l'espace fonctionnel de la variable explicative a t explicit dans
la partie dispersion. De mme que dans le cas prcdent, ces rsultats asymptotiques sont
obtenus sous des conditions trs classiques en statistique non-paramtrique fonctionnelle et
que les estimateurs sont facile utiliser en pratique. De plus l'importance de cette deuxime
partie est aussi exprim par le nombre important de perspectives de recherce qu'il ore.
Dans la section suivante, nous listons quelques exemples de ces perspectives.
133
134
7.2
Chapitre 7. Perspectives
Perspectives
Pour conclure les travaux de cette thse, nous exposons dans ce qui suit, quelques dveloppements futurs possibles en vue d'amliorer et d'tendre nos rsultats.
1
W1 (Xi )
n
n
i=1
2 bi
i2
fb(h
(X
,
y)W
(y)dy
o
k
fb(h
(Xk , y) =
K ,hH )
i=1
1
k=i,j=1 Wij (Xk )H(hH (y
n
hH i,j=1 Wij (Xk )
Yi ))
L'tude de l'optimalit asymptotique de cette mthode n'a pas t conduite, mais elle constitue une perspective de recherche court terme.
Chapitre 8
Bibliographie gnrale
Statistics &
Probability Letters.
136
8. Bibliographie gnrale
Benko, M. (2006) Functional Data Analysis with Applications in Finance Mmoire de Thse.
universit de Berlin.
Benko, M., Hardle, W. et Kneip, A. (2006) Common Functional Principal Components SFB
649 Discussion Papers SFB649DP2006-010, Humboldt University, Berlin, Germany.
Bogachev, V.I. (1999). Gaussian measures. Math surveys and monographs, 62, Amer. Math.
Soc.
Bosq, D. (1991), Modelization, nonparametric estimation and prediction for continuous time
processes. In Nonparametric functional estimation and related topics (Spetses, 1990), 509529, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 19, Kluwer Acad. Publ., Dordrecht.
Bosq, D. (2000) Linear Processes in Function Spaces
Notes in Statistics, 149, Springer-Verlag, New York.
Lecture
Economica, Paris.
Letters.
Cardot, H., Ferraty, F. and Sarda, P. (2003) Spline Estimators for the Functional Linear
ModelStatistica Sinica 13 (3) 571-591.
Cardot, H., Crambes, C. and Sarda, P. (2004) Spline estimation of conditional quantities
for functional covariates. C. R. Acad. Sci., Paris. 339, (2) 141-144.
137
Cardot, H., Crambes, C. and Sarda, P. (2004a) Conditional Quantiles with Functional Covariates : an Application to Ozone Pollution Forecasting. Contributed paper in Compstat
Prague 2004 Proceedings 769-776.
Cardot, H. et Sarda, P. (2005). Quantile regression when the covariates are functions.
Stat. 17 (7) 841-856.
J.
Nonparametr.
Cardot, H. et Sarda, P. (2005a) Estimation in generalized linear models for functional data
via penalized likelihood. J. Multivariate Anal. 92 (1) 24-41.
Cardot, H., Crambes, C. and Sarda, P. (2006)
Revue bibliographique
138
8. Bibliographie gnrale
Springer, New-York.
Dereich, S. (2003) High resolution coding of stochastic processes and small ball probabilities.
PhD Thesis.
Deville., J. C. (1974) Mthodes statistiques et numriques de l'analyse harmonique. Ann.
15, 3-101.
Insee,
139
Z. W.
Giebete,
El Ghouch, A. and Genton, M. (2009). Local polynomial quantile regression with parametric
features. J. Amer. Statist. Assoc., 104, No. 488, Pages 14161429.
Ezzahrioui, M. (2007) Prvision dans le smodles conditionnels en dimension innie. Thse
de Doctorat, Universit du Littoral Cte d'Opale.
Ezzahrioui, M. and Ould-Sad, E. (2008). Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametr. Stat., 20, Pages
318.
Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.
Fan, J. and Gijbels, I. (1996).
Chapman & Hall.
87,
London,
Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity
measures in nonlinear dynamical systems. Biometrika, 83, Pages 189206.
Fan, J. and Yao, Q. (2003). Nolinear
Springer-Verlag, New York.
Fan, J. and Yim, T.-H. (2004). A cross-validation method for estimating conditional densities. Biometrika, 91, Pages 819834.
Ferraty, F., Goia, A. et Vieu, P. (2002) Rgression non-paramtrique pour des variables alatoires fonctionnelles mlangeantes. (French) [Nonparametric regression for mixing functional
random variables] C. R. Math. Acad. Sci. Paris 334 (3) 217-220.
Ferraty F., Goia A. and Vieu P. (2002b) Functional nonparametric model for time series : a
fractal approach for dimension reduction. Test, 11, (2) 317-344
Ferraty, F., Mas, A. and Vieu, P. (2007) Advances in nonparametric regression for fonctionnal
data. Aust. and New Zeal. J. of Statist 49 1-20.
Ferraty, F., Rabhi, A. et Vieu, P. (2008) Estimation non-paramtrique de la fonction de
hasard avec variable explicative fonctionnelle. Rom. J. Pure & Applied Math. 53 (1) 1-18.
Ferraty, F., Tadj, A., Laksaci, A. and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. J. of Statist. Plan. and Inf., 140, Pages
335352 .
Ferraty, F. and Vieu, P. (2000) Dimension fractale et estimation de la rgression dans des
espaces vectoriels semi-norms C. R. Math. Acad. Sci. Paris, 330, 403-406.
Ferraty, F. et Vieu, P. (2002) The functional nonparametric model and application to spectrometric data. Comput. Statist. 17 (4) 545-564.
140
8. Bibliographie gnrale
Ferraty, F., Laksaci, A. and Vieu, P. (2006) Estimating some characteristics of the conditional
distribution in nonparametric functional models. Stat. Inference Stoch. Process 9 (1) 47-76.
Ferraty F. and Vieu P. (2006a) Nonparametric
New York.
Springer-Verlag,
Pub. Inst.
Gannoun, A., Saracco, J. and Yu, K. (2003). Nonparametric prediction by conditional median and quantiles. J. of Statist. Plan. and Inf., 117, No. 2, Pages 207223.
Gao, F. et Li, W.V. (2007) Small ball probabilities for the Slepian Gaussian elds.
Trans. Amer. Math. Soc. 359 (3) 1339-1350 (electronic).
Gasser, T., Hall, P. et Presnell, B. (1998) Nonparametric estimation of the mode of a distribution of random curves. J. R. Statomptes. Soc. Ser. B Stat. Methodol. 60 (4) 681-691.
Geroy, J. (1974) Sur l'estimation d'une densit dans un espace mtrique.
Paris, 278, 1449-1452.
C. R. Aca. Sci.,
141
Hall, P., Wolk, R.C. and Yao, Q. (1999). Methods for estimating a conditional distribution
function. J. Amer. Statist. Assoc., 94, Pages 154163.
Ha
rdle, W. (1990) Applied nonparametric regression.
UK.
Hrdle, W. (1991).
Cambridge,
Hrdle, W., Jenssen, P. and Sering, R. (1991). Strong consistency rates for estimators of
conditional functionals. Ann. Statist., 16, No. 4, Pages 14281449.
Ha
rdle, W., Lu
tkepohl, H. and Chen, R. (1997) A review of nonparametric time series
analysis. Inter. Statist. Rev. (65), 73-85.
Hrdle, W. and Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist., 13, No. 4, Pages 14651481.
Hassani, S., Sarda, P., et Vieu, P. (1995) Approche non paramtrique en thorie de abilit :
revue bibliographique. Rev. Statist. Appli. (35), 27-41.
Hastie, T., Buja, A. et Tibshirani, R. (1995) Penalized discriminant analysis.
13 435-475.
Ann. Statist.
Hedli-Griche, S. (2008) Estimation de l'oprateur de rgression pour des donnes fonctionnelles et des erreurs corrles. PhD Thesis.
Holmstrome, I. (1961) On method for parametric representation of the state of the atmosphere. Tellus. 15 127-149.
Hyndman, R.J. (1995). Highest-density forecast regions for non-linear and non-normal time
series models. J. Forecast., 14, Pages 431441.
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996). Estimating and visualizing
conditional densities. J. Comput. Graph. Statist., 5, Pages 315336.
Hyndman, R.J. and Yao, Q. (1998). Nonparametric estimation and symmetry tests for conditional density functions. Working paper 17/98, Department of Econometrics and Business
Statistics, Monash University.
Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for conditional density functions. J. Nonparametr. Stat., 14, Pages 259278.
Jank W. et Shmueli, G. (2006) Functional Data Analysis in Electronic Commerce
21 (2) 155-166.
Research
Statistical Science
Kawasaki, Y. et Ando, T. (2004) Functional data analysis of the dynamics of yield curves.
COMPSTAT 2004|Proceedings in Computational Statistics 1309-1316, Physica, Heidelberg.
Kirkpatrick, M. and Heckman, N. (1989) A quantitative genetic model for growth, shape,
reaction norms, and other innite-dimensional characters. J. Math. Biol. 27, (4) 429-450.
142
8. Bibliographie gnrale
Applied analysis.
Prentice Hall.
Lecoutre, J.P., et Ould-Sad, E. (1995) Hazard rate estimation for mixing and censored
processes. J. Nonparametric Statist. (5), 83-89.
Laukaitis, A. et Rackauskas, A. (2002) Functional data analysis of payment systems Nonlinear Analysis : Modelling and Control 7 (2) 53-68.
Li, W.V. and Shao, Q.M. (2001) Gaussian processes : inequalities, small ball probabilities
and applications. In :C.R. Rao and D. Shanbhag (eds.) Stochastic processes, Theory and
Methods. Handbook of Statitics, 19, North-Holland, Amsterdam.
Lifshits, M.A., Linde, W. et Shi, Z. (2006) Small deviations of Riemann- Liouville processes
in Lq-spaces with respect to fractal measures. Proc. London Math. Soc. (3) 92 (1) 224-250.
Louani, D. (1998) On the asymptotique normality of the function and its derivatives under
censoring. Comm. Statist., Theory and methods. 27, 2909-2924.
Louani, D. et Ould-Said, E. (1999) Asymptotique normality of kernel estimators of the
conditional mode under strong mixing hypothesis. J. Nonparametric Statist. 11, (4) 413442.
143
Meiring, W. (2005)
lysis Study of Evidence of the Quasi-Biennial Oscillation, Time Trends and Solar Cycle in
Ozoneonde Observations Technical report
Molenaar, P. et Boosma, D. (1987) The genetic analysis of repeated measures : the karhunenloeve expension. Behavior Genetics. 17, 229-242.
Mller, H.-G. and Stadtmller, U. (2005). Generalized functional linear models.
33, No. 2, Pages 774805.
Ann. Stat.,
Mu
ller, H.G., Sen, R. et Stadtmu
ller, U. (2007) Functional data Analysis for Volatility
Process.soumis
Nadaraya, E.A. (1965) On estimation of density functions and regression curves.
Prob. Appl., 10. 1861 90.
Theory
Ouassou, I. and Rachdi, M. (2009). Stein type estimation of the regression operator for
functional data. Advances and Applications in Statistical Sciences, 1, No. 2, Pages 233-250.
Ould-Sad, E. (1993) Estimation nonparamtrique du mode conditionnelle. Application la
prvision. C. R. Acad. Sci., Paris. I36, 943-947.
Ould-Sad, E. (1997) A note on ergodic processes prediction via estimation of the conditional
mode function. Scand. J. Statist. 24, 231-239.
Ould-Sad, E. and Cai, Z. (2005) Strong uniform consistency of nonparametric estimation
of the censored conditional mode function. J. Nonparametric Statist. 17, 797-806.
Perzen, E. (1962) On estimation of a probability density functionand mode.
Stat. 33, 1065-1076.
Ann. Math.
Preda, C. (2007) Regression models for functional data by reproducing kernel Hilbert spaces
methods. J. Statist. Plann. Inference 137 (3) 829-840.
Quentela Del Rio, A., Vieu, P., (1997). A nonparametric conditional mode estimate.
8, 253-266.
Non-
parametric J. Statist.,
Rachdi, M., El Methni, M. (2011) Local weighted average estimation of the regression operator for functional data. Comm. Stat., Theory and Methods, (Sous presse).
Rachdi, M., Sabre, R. (2000) Consistent estimates of the mode of the probability density
function in nonparametric deconvolution problems, Statistics & Probabability Letters. 47
105114.
Rachdi, M. et Vieu, P. (2005) Slection automatique du paramtre de lissage pour l'estimation non-paramtrique de la rgression pour des donnes fonctionnelles. C. R., Math., Acad.
Sci. Paris 341 (6) 365-368.
Rachdi, M. et Vieu, P. (2007). Nonparametric regression for functional data : automatic
smoothing parameter selection. J. Statist. Plann. and Inf., 137, 2784-2801, (2007).
144
8. Bibliographie gnrale
Psychometrika
47 (4) 379-396.
Ramsay, J.O. (2000a) Dierential equation models for statistical functions. Canad. J. Statist.
28 (2) 225-240.
Ramsay, J.O. (2000b) Functional components of variation in handwriting.
95 9-15.
Journal of the
J. R. Statist. Soc. B,
60, 351-363.
Ramsay, J.O., Munhall, K.G., Gracco V.L. and Ostry D.J. (1996) Functional data analysis
of lip motion. J Acoust Soc Am 99 3178-3727.
Ramsay, J. and Silverman, B. (1997) Functional
Data Analysis
studies
Rao, C. R. (1958) Some statistical methods for comparison of growth curves. Biometrics 14
1-17.
Rice, J. and Silverman, B. (1991) Estimating the mean and the covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. B, 53, 233-243.
Rio, E. (1990).
faiblement dpendants.
Sprin-
4,
Mul-
Rossi, F., Delannay, N., Conan-Guez, B. et Verleysen, M. (2005c) Representation of Functional Data in Neural Networks Neurocomputing 64 183-210
Roussas, G. (1968) On some properties of nonparametric estimates of probability density
functions. Bull. Soc. Math. Grce(N.S.), 9, (1), 29-43.
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators.
J. Statist., 9, Pages 6578.
Scand.
145
South
African Statist. J.
bability Letters.
Kernel Regression.
Schimek, M. (2000) Smoothing and regression : Approches, computation and application. Ed.
M.G. Schimek, Wiley Series in Probability and Statistics.
Schumaker, M. (1981)
Wiley.
Stone, C.J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat, 12 , 12851297.
Stone, C.J. (1994). The use of polynomial splines and their tensor products in multivariate
function estimation. Ann. Statist., 22, No. 1, Pages 118184.
Stute, W. (1985) Conditional Empirical processes.
Ann. Statist.
14, 638-647.
annals of mathema-
J. Multi-
variate Anal.,
Kernel Smoothing.
26, (4)
146
8. Bibliographie gnrale
applications.
vol