Documente Academic
Documente Profesional
Documente Cultură
No.6
Edited by
J. Bentley
E. Coffman
R.L.Graham
D.Kuck
N. Pippenger
Luc Devroye
School of Computer Science
McGill University
Montreal H3A 2K6
Canada
Devroye, Luc.
Lecture notes on bucket algorithms.
Devroye, Luc:
Lecture notes on bucket algorithms / Luc Devroye. -
NE:GT
All rights reserved.
No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording or otherwise, without prior
permission of the copyright owner.
ISBN 978-0-8176-3328-8
TABLE OF CONTENTS
o. INTRODUCTION. 1
3. MULTIDIMENSIONAL BUCKETING. 55
3.1. Main theorem. 55
3.2. Sorting and searching. 54
3.3. The travellng salesman problem. 73
3.4. Closest point problems. 80
REFERENCES. 135
INDEX. 143
PREFACE
Hashing algorithms scramble data and create pseudo-uniform data distribu-
tions. Bucket algorithms operate on raw untransformed data which are parti-
tioned Into groups according to membership In equl-slzed d-dlmenslonal hyperrec-
tangles, called cells or buckets. The bucket data structure Is rather sensitive to
the distribution of the data. In these lecture notes, we attempt to explain the
connection between the expected time of various bucket algorithms and the dis-
tribution of the data. The results are Illustrated on standard searching, sorting
and selection problems, as well as on a variety of problems In computational
geometry and operations research.
The notes grew partially from a graduate course on probability theory In
computer science. I wish to thank Elizabeth Van Gulick for her help with the
manuscript, and David Avis, Hanna AYukawa, Vasek Chvatal, Beatrice Devroye,
Hossam EI Glndy, Duncan McCallum, Magda McCallum, Godfrled Toussaint and
Sue Whltesldes"for making the School of Computer Science at McGill University
such an enjoyable place. The work was supported by NSERC Grant A3456 and
by FCAC Grant EQ-1679.
INTRODUCTION 1
INTRODUCTION
It Is not a secret that methods based upon the truncation of data have good
expected time performance. For example, for nice distributions of the data,
searching Is often better done via a hashing data structure Instead of via a search
tree. The speed one observes In practice Is due to the fact that the truncation
operation Is a constant time operation.
Hashing data structures have not received a lot of attention In the 1970's
because they cannot be fit Into the comparison-based computational model. For
example, there Is no generally accepted lower bound theory for algorithms that
can truncate real numbers In constant time. The few analyses that are avallable
(see Knuth (1973), Gonnet (1981,1984) and the references found there) relate to
the following model: the data pOints are uniformly distributed over either [O,IJ or
{l, ... ,M}. The uniform model Is of course motivated by the fact that It Is often
possible to find a good hash function h (.), l.e. a function of the data pOints which
distributes the data evenly over Its range. In the vast majority of the cases, h (.)
Is not a monotone function of Its argument when the argument Is an Integer or a
real number. Non monotone functions have the undesirable side-effect that the
data are not sorted. Although this Is not Important for searching, It Is when the
data need to be llsted In sorted order rather frequently. If the data form a data
base, I.e. each data point can be considered as a point In R d with d > 1, then
range queries can be conveniently handled If the data are hashed via monotone
functions. There Is an ever Increasing number of appllcatlons In computational
geometry ( see the general survey articles by Toussaint (1980,1982) where appll-
cations In pattern recognition are hlghllghted ; and the survey article on bucket
methods by Asano, Edahlro, Imal, Irl and Murota (1985)) and computer graph-
ics, In which the data pOints should preserve their relative positions because of
the numerous geometrical operations that have to be carried out on them. Points
that are near one another should stay near. In geographic data processing,
the cellular organization Is particularly helpful In storing large amounts of data
such as satelllte data (see the survey article by Nagy and Wagle, 1979). Many
tests In statistics are based upon the partition of the space In equal Intervals,
and the counts of the numbers of pOints In these Intervals. Among these, we cite
the popular chi-square test, and the empty cell test. See for example Kolchln,
Sevast'yanov and Chlstyakov (1978) and Johnson and Katz (1977) for appllca-
tlons In statistics. In economic surveys and management science, the histo-
gram Is a favorite tool for vlsuallzlng complex data. The histogram Is also a
superb tool for statisticians In exploratory data analYSis. In all these examples,
the order In the data must be preserved.
2 INTRODUCTION
Figure 0.1.
are drawn from this distribution, the number of colllding values Increases
steadlly. In fact, If n Independent Identically distributed random vectors are
considered with any atomic distribution, then N In --+ 0 almost surely as
n --+ 00 where N Is the number of different values. Meaningful asymptotlcs are
only possible If either the atomic distribution varies with n , or the distribution Is
non-atomic. There Is another key argument In favor of the use of densities: they
provide a compact description of the dIstrIbutIon, and are easIly visuallzed or
plotted.
When Independent random vectors with a common density are partitioned
by means of a d-dlmenslonal grid, the number of grid locations (or buckets) with
at least 2 points has a distribution which depends upon the density In question.
The density affects the frequency of colllsions of data pOints In buckets. For
example, If the density Is very peaked, the buckets near the peak are more likely
to contain a large number of pOints. We want to Investigate how this crowding
affects the performance of algorithms of bucket or grid algorithms.
Throughout this set of notes, we wlll consider a d-dlmenslonal array of equl-
sized rectangles (which we wlll call a grid), and within each rectangle, points are
kept In a chain (or linked list). The number of rectangles wlll be denoted by m,
and the data size by n. We wlll not consider Infinite grids such as
([i ,i +1) I i Integer} because Infinite arrays cannot be stored. However, because
data may grow not only In size but also In value as n --+ 00, we wlll consider at
times grid' sizes m that are data value dependent. In any case, m Is usually a
function of n .
4 INTRODUCTION
Figure 0.2.
2d Grid
help us In the assessment of the expected time performance for particular values
of n.
The point Is that we do not wish to gIve an exhaustIve description of known
results In the field, or to present a list of exotic applications. We start very
slowly on standard problems such as one-dImensional sorting and searchIng, and
wlll move on to multidimensional applications towards the end of the notes.
These applications are In the areas of computatIonal geometry, operatIons
research (e.g. the traveling salesman problem) and pattern recognItIon (e.g. the
all-nearest neighbor problem).
In chapter 1, we have the sImplest of all possible settings: the random vari-
ables X l' . . . ,Xn have a common density f on [0,1), and [0,1) Is dIvIded Into
P(C-E(C) ~ E)
Var (C)
P(C-E(C) ~ E) ~
E2+ Var (C)
Je- f
1
E(C) ~ 2:.. < 1
n 2 0 2
The detailed analysis of chapter 1 Is well worth the effort. The development
given there can be mimicked In more complicated contexts. It would of course be
unwise to do so In these notes. Rather, from chapter 2 on, we will look at vari-
ous problems, and focus our attention on expected values only. From chapter 2
onwards, the chapters are Independent of each other, so that Interested readers
can Immediately skip to the subject of their choice.
In chapter 2, the data XII ... I Xn determine the buckets: the Interval
[min Xi ' max Xi 1 Is partitioned Into n equal Intervals. This Introduces addi-
tional dependence between the bucket cardinalities. The new factor working
against us Is the size of the tall of the distribution. Infinite tails force min Xi
and max Xi to diverge, and If the rate of divergence Is uncontrolled, we could
actually have a situation In which the sizes of the Intervals Increase with n In
some probabilistic sense. The study of E (Ds), E (C) and other quantities
requires auxiliary results from the theory of order statistics. Under some condi-
J
tions on f , Including f 2 < 00, we will for example see that
INTRODUCTION 7
I.e. the asymptotic coeffiCient of n Is the expected range of the data (this meas-
ures the heaviness of the tall of f ) times If 2, the measure of peakedness.
Unless f vanishes outside a compact set, It Is Impossible to have
E(C)=O(n).
In chapter 3, we look at multidimensional problems In general. The appIlca-
tlons are so different that a good treatment Is only possible If we analyze
m
E g(Ni )
i=l
where g (.) Is a "work function", typically a convex positive function. The main
.result of the chapter Is that for m = n, the expected value of this sum Is 0 (n)
If and only If f has compact support, and
Ig(f)<oo
maxCNi )
Figure 0.3.
Binary trie for points distributed on [0,1].
Chapter 1
Aj =
i-I
[ -;;-' i)
-;; ,1~i ~m .
The quantItIes of Interest to us here are those that matter In sorting and search-
Ing. If sorting Is done by performing a selection sort within each bucket and con-
catenating the buckets, then the total number of element comparisons Is
m N j (Nj -1) 1
C = :E = -(T-n)
j=1 2 2
m
T=:E Nj 2 •
j=1
CHAPTER 1 11
N =2
3
Figure 1.1.
Bucket structure with n=17 points, m=12 buckets.
The other work takes time proportional to m , and Is not random. Selection sort
was only chosen here for Its simplicity. It Is clear that for quadratic comparlson-
based sorting methods, we will eventually have to study T.
To search for an element present In the data, assuming that all elements are
equally likely; to be queried, takes on the average
12 CHAPTER 1
m m
Du E
i=1
Ni Jf
A,
= E
i=1
Ni Pi
where only comparisons with non-empty cells In the data structure are counted.
Du wlll be called the AUST (Average Unsuccessrul Search Time), and Pi Is the
Integral or f over Ai .
The properties or this simple bucket structure ror sorting and searching have
been studied by Maclaren (lg66), Doboslewlcz (lg78) and Akl and MeIjer (lg82).
In this chapter, we wlll unravel the dependence upon f. To get a rough Idea or
the dependence, we wlll start with the expected values or the quantities defined
above.
Theorem 1.1.
Let f be an arbitrary density on [0,1]. Then, even Ir J f 2 = 00,
Density with low value for Density with high value for
square integral square integral
Figure 1.2.
Theorem 1.1 sets the stage for this paper. We see for example that
J
E (T) = 0 (n ) If and only If f 2 < 00. Thus, for hashing with chaining, f 2 J
measures to some extent the Influence of f on the data structure: It Is an Indi-
cator of the peakedness of f. In the best case (j f 2 < 00), we have linear
expected time behavior for sorting, and constant expected time behavior for
searching. This fact was flrst pointed out In Devroye and KlIncsek (lIl81). Under
stricter conditions on f (f bounded, etc.), the given expected time behavior was
established In a series of papers; see e.g. Doboslewlcz (1977), Weide (1978), Meijer
and Akl (1980) and Akl and Meijer (1982). Theorem 1.1 gives a characterization
J
of the densities with f 2 = 00 In terms of quantities that are Important In com-
puter science. It also provides us with the form of the "best .. density. Because
J J
f 2 2: (j f )2 = 1 (Jensen's Inequality), and f 2 = 1 for the uniform density
14 CHAPTER 1
on [0,1], we see that all the expected values In Theorem 1.1 are minimal for the
uniform density.
Theorem 1.1 does not give the rate of Increase of E (T) as a function of n
m
when If 2 = 00. However, even though T = E Ni 2 can reach Its maximal
i=l
value n 2 (Just set N 1=n, N 2=··· =Nm=o), we have E(T) = o(n2) for all
densities f. Thus, hashing with chaining when used for even the most peaked
density, must dramatically Improve the expected time for sorting and searching
when n Is large.
Lemma 1.1.
m
(II) For all r > I, n r E p/::;
i=l
r -1
.E .E p{-l) .
m
E (nPi r ( ~) n If r, and p{ = 0 (
i=l 1=1 1=1
E
i=l
(npi r = E (.E:..m Y (m I f Y ::; (.E:..m Y E m I f
i=l A, i=l A,
r = (.E:.. Y m
m
If r
CHAPTER 1 15
(lU) follows from (II) and a small additional argument: the upper bound In
c
r- J
(II) ~ (.!.. 1 n Jr. Furthermore, by Fatou's Lemma and the Lebesgue density
.
theorem (see Lemma 5.10 for one version of this theorem), we have
E
r
lim Inf.!..
R-+OO n i=1
(nPi r = lim Inf .!.. [..E:...)
n--+oo n m
E (mJ f)'
i=1 A.
r
= 11m Inf
n --+00
.!.. [..E:...)
n m
m Jf n r (where f n (x )=mpi for x EAj )
r-l
2: lim Inf [ ..E:... )
n-+oo m
J lim Inf f n
n-+oo
r
f
n
A A
1 A2 Figure 1.3. 10
Density f and its histogram approximation
The second half of (III) follows from (I) and the Inequallty
m m
.E Pi r :S max Pi· .E Pi r-1.
i=1 i=1
CHAPTER 1 17
m ) m m
E (T) = E [ i~1 Ni 2 = i~1 (n 2Pi 2 + nPi (I-Pi » = (n 2 - n) i~1Pi 2 +n
m
~ ~ (nPi)2 + n ~ .!: Jf 2 + n
i=1 c
m
by Lemma 1.1 (III). Also, by Lemma 1.1 (III), ~ Pi 2 = 0 (1), so that
i=1
E (T) = 0 (n 2). All the other statements In the Theorem follow from the rela-
tions:
1 m 1
C = - ~ (N- 2_N-) = -(T - n) .
2 i=1 I I 2
1 m 1 2 1 T
Ds = - ~ -(Ni + Ni ) = - + - ,
n i=1 2 2 2 n
and
m m
Du ~ Pi Ni (E (Du) = n ~ Pi 2) .
i=1 i=1
Theorem 1.2.
C In -+ _1_
2 c
1/ 2 In probablllty ;
and
Du -+ .!.1/ 2 In probablllty .
c
The proof of the Theorem uses Poissonization to handle the fact that
N l ' . . . ,Nm are dependent random varlables. For some propertIes of the PoIs-
son dIstrIbutIon used here, we refer to section 5.1. We proceed now by extractIng
a key Lemma:
Lemma 1.2.
Let 1/2 < 00. Let Nj be PoIsson (npj) random variables 1 <i ~ m.
Then
1 m
11m 11m sup - :E E (Yj ) = 0
K-+oo 1&-+00 n j=1
where Y j Is eIther
CHAPTER 1 19
Thus, we need not consider (II). We will deal with (III) first.
1 m n2
< n.E (-;:;;)If 2 +(nIf)Ip.?Kln (Jensen's Inequality)
1=1 A. Ai
= 1(: f 2 + f )If.?Kmln
o
Now, n /m -+ 1/ c. Also, If. ?Km In ~ If ?Kc 12 for almost all x for which
f (x) > 0, and all n large enough (this uses the fact that f n -+ f for almost
20 CHAPTER 1
all x; see sectIon 5.3.} SInce Jf 2 < 00, we thus have by the Lebesgue dom-
Inated convergence theorem,
1 1
Um sup
n -+00
J (.!!:....
0 m
f 2+ f }lj • ?Km In ~ J ( Ie f
0
2+ f }lj ?Kc 12
A sImple appllcatlon of (ll!) shows that the first term on the rlght-hand-slde has a
llmlt supermum that Is 0 (I) as L -> 00. Thus, we should choose L In such a
way that L -> 00 as K -> 00. The second term on the rlght-hand-slde Is
1
:E E j2(np,), e'" Ii!
n j =1 J?,fj{
,,2,.2+1&,; <L
ThIs tends to 0 as K -> 00 when we choose L = K1/4. The proof of Lemma 1.2
Is complete.
CHAPTER 1 21
N-n' ,
Using Theorem 5.5, we have P (N' , < n) = P ( " < -n 3/4In' , )
n
n 3/ 4
:S 2 exp(-n 3/ 2/(2n' , (1+--))). Thus, for all n large enough,
n' ,
T I T ' ,
n c
J
P ( - > (HE)(l+- f 2)) :S 0 (1) + P (-,-,-
n
>
E
(H-)(H-
2 c
1
Jf 2)).
Similarly,
all n large enough. Now, all the probabilities Involving T' and T" are 0 (1)
If both T' In' and T' , In" tend to 1+2.
c
Jf 2 In probability. Thus, the
statements about T, C and Ds are valid If we can show the statement about T
m
where T = EN; 2 and N 1> • . • • N m are Independent Poisson random varl-
;=1
abies with parameters np; , 1 :S i :S m.
First, we note that by Lemma 1.1,
22 CHAPTER 1
m
IT-E(T)I:S I:E (Nj2-E(Nj2»IIN"_E(N")I~KI
j=1
m
+ I :E (Nj 2_E (Nj 2»IIN"_E(N")I~K I = II I + II •
j=1
m
E(III):S :E [E(Nj2IN.. ~K/2) + E(Nj2)IE(N.~~K/2+E(Nj2)P(Nj2~K/2)
j=1
m m m
II = I:E Yj I :S I:E (Yj-E(Yj))1 + I:E E(Yj)1 = III + IV •
j=1 j=1 j=1
IV = IE(I)I.
and
E(IV) :S E(III).
Now. first choose K large enough so that 11m supE (II I)/n < E • where E Is an
n-->oo
arbitrary positive Integer. (Thl1> can be done In view of Lemma 1.2.) Now. we
need only show that for every K. E (lII)/n --+ o. But this Is an Immediate
consequence of the fact that the Y j -E (Yj ) terms are Independent zero mean
bounded random variables (see e.g. section 16 of Loeve (1963».
This completes the first part of the proof of Theorem 1.2. The argument for
Du Is left as an exercise: first. argue again by Poissonization that It sumces to
CHAPTER! 23
consider Independent Ni's that are Poisson (nPi) distributed. Then note that we
m
need only show that E Pi (Ni -npi )-+0 In probablllty.
i=1
1.3. VARIANCE.
The results obtained so far are more qualltatlve than quantitative: we know
n (1+- Jf
1
now for example that E (T) grows as 2) and that IT -E (T )I/n tends
c
to 0 In probablilty and In the mean. Yet, we have not establlshed just how close
T Is to E (T). Thus, we should take our analysis a step further and get a more
reflned result. For example, we could ask how large Var (T) Is. Because of the
relations between C, Ds and T, we need only consider Var (T) as
Var (C)= Var (T )/4 and Var (Ds)= Var (T )/(4n 2). Var (Du ) Is treated
separately.
Theorem 1.3.
A. For all f , we have
(J
We note that for all f , f 2)2 :s J
f 3 (Jensen's Inequallty), and that equal-
Ity Is reached for the uniform density on [0,1). Thus, once again, the uniform
24 CHAPTER 1
density mlmlnlzes the "cost", now measured In terms of variances. In fact, for
4 6
the uniform density, we have Var (Du) = 0, all n, and Var (T )=2n-4--+-
n n2
when c=l, m=n.
For the proof of Theorem 1.3, the reader should consult section 5.1 first. We
note here that the Polssonlzatlon trick of section 1.2 Is no longer of any use
because the variance Introduced by It, say, Var (T' - T) for n' =n (see nota-
tion of the proof of Theorem 1.1), grows as n, and Is thus asymptotically nonne-
gllglble.
where we used the fact that E2(Nj ) = n (n -l)Pj 2 + npj. Using various expres-
sions from Lemma 5.1, we have
= 2:, [npj +7n (n -l)pj 2+6n (n -l)(n -2)pj 3+ n (n -l)(n -2)(n -3)pj 4J
+ 2:, [n (n -l)(n -2)(n -3)pj 2p/+ n (n -l)(n -2)(pj p/+Pj 2pj )+n (n -l)pj Pj J.
j¥-j
+ 2:, [(-4n 3+10n 2_6)pj 2p/+(-2n 2+2n )(Pj 2pj +Pj p/)+(-n )Pj Pj J.
j'fj
CHAPTER 1 25
By Lemma 1.1, we have for all constants r 2': 1, :BPi r ~ (nc )-{r-l) Jf r .
J
Thus, If f 2 < 00,
so that Var (T )/n -+ 00. This concludes the proof of the first half of Theorem
1.3.
We have E (Du ) = :BnPi 2 ~ .!.. Jf 2, and
c
Thus,
26 CHAPTER 1
J J
f 3 = 00 but f 2 < 00, this Is still true. If both Integrals are Infinite, we need
an additional argument. For example, let J be the collection of Indices for which
Pi > a 1m, where a > 0 Is a constant. We have, by the Inequality
(u +V)2 ~ 2u 2+2v 2,
Var (Du) ~ nEpi 3 -2n (Epi 2)2 + nEpi 3_2n (Epi 2)2
J J J' J'
valid for all f. Unfortunately, thIs bound requIres large values of E to be useful.
By restrIctIng ourselves to smaller classes of densItIes, we can obtaIn smaller
upper bounds.
For example, by the Chebyshev-Cantelll Inequality and E (Du) ::; C n -1 Jf 2,
we have
Theorem 1.4.
Assume that sup f ~ C < 00. For all € > 0, we have
where
A (E) = sup
r >0
rEI f 2 - -I f 3 e rC
r2
2
> o.
P(D u
~ E (exp(-t (HE)_l_I f 2 +t ~ Ni Pi ))
Cn i=l
1 m
= exp(-t-J f 2(H€)) ( E Pi exp(tpi ))n .
Cn i=l
CHAPTER 1 29
Let us recall the definition of the function f n from Lemma 1.1. Using the fact
2
that e U -1 :S u + .!... e U for u > 0, we have the following chain of equalities
2
and Inequalities (where the first expression Is equal to the last expression of the
chain given above):
t t2 t
:S exp(-tc n -1(1+E)J f 2), (1+-
m
Jf n 2 + --J f n 3 exp( - f n ))n
2m2 m
t2 t
:S exp(-tc n -IE Jf + n--J
2
2m2
f 3 exp( -
m
C)).
Here we used the Inequality (1+u):S exp(u), and the fact that
also
Jf n :S Jf
8 for all 8 ~ 1 (Lemma 1.1). The first half of the Theorem follows
8
from the choice t = rm. Now, as E L 0, we see that the supremum Is reached
for r =r (E) > 0, and that A (E) Is asymptotic to the value sup r EJ f 2__ r2
r >0
1
2
Jf 3.
The latter supremum, for each € > 0, Is reached for r = € Jf 21 Jf 3. Resubstl-
tutlon gives the desired solution, A (€) ~ -€2(J
2
1
f 2)21 Jf 3.
When € i 00, It Is easy to see that the supremem In the expression for A (€)
Is reached for r (E) i 00. By standard functional Iterations, applied to the equa-
tion r (E)= ~ log( E Jf 2/(r (E)J f 3)), we see that A (E) ~ the value of the expres-
sion to be optimized, at r = ~IOg(EJ f 2/(j f 3 ~IOgE))' which gives us our solu-
tion.
30 CHAPTER 1
Remark.
The Inequality of Theorem 1.4 for fn ! 0, n fn 2 i 00, Is called a moderate
deviation inequality. It provides us with good Information about the tall of
the distribution of Du for values of the order of magnitude of the mean of Du
plus a few standard deviations of Du. On the other hand, when fn Is constant
or tends to 00, we have large deviation inequalities. As a rule, these should
give good Information about the extreme tall of the distribution, where the cen-
tral limit theorem Is hardly at work. For example, It appears from the form of
the Inequality that the extreme tall of Du drops off at the rate of the tall of the
Poisson distribution.
Step 1.
- i-I i
Let Ai =[--,-), 1
n n
i :s :s
n. For each Ai' keep a lined list of XJ.I s fai-
ling In It. Let Ni be the cardinality of Ai .
Step 2.
For i = 1 to n do : If Ni ~ 1, divide Ai Into Ni equal Intervals A ij , and
keep for each Aij linked lists of the data pOints In It. Let Nij be the cardi-
nality of Aij •
CHAPTER 1 31
Figure 1.4.
n N.
T :E:E N i/ ,
i=1 j=1
and
n N.
Du = :E :E Pij N jj
i=1 j=1
N.
where all the summatIons :E for N j = 0 must be omItted, and
j=1
Pij = J f when Aij Is defined. We note that the first dIvIsIon Is Into n
A.}
32 CHAPTER 1
Theorem 1.5.
If J1 2 < 00, then the double bucketing structure gives
1 1 1
and
E(Du) -+ 1 .
1 1
Thus, the limit of E (T)/ n Is uniformly bounded over all such I. In other
words, double bucketing has the effect of eliminating all peaks In densities with
J f 2 < 00. Let us also note In passing that the lower bound for E (T )/ n Is
reached for the uniform density on [0,1], and that the upper bound can be
approached by considering densities that are uniform on [0,1], and that the upper
bound can be approached by considering densities that are uniform on
CHAPTER 1 33
1
[0, ~ I cJ e - f =
1- ~ + ~ e -K) and letting K ---+ 00. The fact that the proper-
o
ties of the double bucketing structure are basically Independent of the density 1
was observed Independently by Tammlnen (1985). The same Is a fortiori true for
N -trees (Ehrlich (1981), Van Dam, Frenk and Rlnnooy Kan (1983), Tammlnen
(1983)).
n N,
E (T) = E E (IN,:;::1 E [(Nj 2_Nj )(pjj Ipj )2 + N j pjj Ipj ))
j=1 j=1
n n N,
E E (Nj ) + E E «Nj 2_Nj ) E (pjj Ipj )2)
j=1 j=1 j=1
n
= n + EE«Nj -l)+) (where u+=max(u,O))
j=1
n n
=n + EE(Nj -l) + EP(Nj=O)
j=1 j=1
n
= n + E P (Nj =0)
j=1
n
= n + E (l_pj)n (where Pj = J1 )
j=1 A,
n
~ n + E exp(-npj 1(I-Pj)) (because l-u ~exp(-u 1(I-u )), O::;u <1)
j=1
1
n n
E(T)=n + EE(V/)+ EE(V;")
;=1 ;=1
where
N,
V;' = (N; 2_N; ) E (p;j /p;)2 IN, ~K
j=1
and
N,
v·"I = (N; 2_N;) E (p;j /p;)2 IN, >K.
j=1
The statements about E (T). E (C) and E (Ds ) In Theorem 1.5 are proved If we
can show that
!lm !lm -
K-+oo n-+oo n ;=1
1
E
n ,
E (Vi ) = J0 e- I
1 n ,
E
I
!lm !lm sup - E ( V; ) = O.
K-+oo n-+oo n ;=1
n 1
!lm
n -+00
1.. E E (V;' ) =
n ;=1
!lm Jgn
n -+00 0
= J0 n-+oo
!lm gn
CHAPTER 1 35
provIded that the limIt of gn exIsts almost everywhere. ConsIder now a sequence
of couples (i ,j) such that x EAjj 5;Aj for all n. We have by Lemma 5.11,
nNj pjj -+ f (x) for almost all x, unIformly In N j , 1 Nj K. From thls,we :s :s
conclude that
ConsIder only those x I s for whIch f (x) > 0, and Lemma 5.11 applies.
Clearly, N j tends In dIstrIbutIon to Z where Z Is a PoIsson (f (x)) random varI-
able (thIs follows from npj -+ f (x) (Chow and TeIcher (1978, p. 36-37))). SInce
(Nj -1)+IN , ~K forms a sequence of bounded random varIables, we also have con-
vergence of the moments, and thus,
Define the functIon hn (x) = E (Nj 2 IN, >K ), x EA j , and the functIon
h (x) = E (Z2 I Z >K) where Z Is PoIsson (f (x)) dIstrIbuted. We know that
hn (x) :s
E (Nj 2) :s
npj + (npj)2 = f n (x)+ f n 2(X) -+ f (x) + f 2(x), almost
J J
all x; and that f n + f n 2 -+ f + f 2. Thus, by an extensIon of the Lebesgue
domInated convergence theorem, we have
36 CHAPTER 1
provided that the almost everywhere limit of hn exists. For almost all x, Ni
tends In distribution to Z. Thus, for such x ,
00
Ih n -h 1< :E j 2 1P(Ni =n-p (Z =j)1 --+ 0
j=1
(see e.g. Simons and Johnson, 1971). But I h --+ 0 as K --+ 00 since
o
1 1
IE (Z2) = If +f 2 < 00, and E (Z2JZ >K )--+0 for almost all x. This concludes
o 0
the proof of
1 n
:E E (Vi
I I
11m sup 11m sup - ) = O.
K --+00 n --+00 n i =1
n N, n N,
E (Du )=E (:E :E Pij N ij ) = E ( :E Pi Ni :E (Pij /Pi )2) .
i=1 j=1 i=1 j=1
n n
E (Du) ~ E ( :E Pi N,./ Ni ) = :E Pi = 1.
i=1 i=1
Also, If we follow the treatment to obtain an upper bound for E (T), we come
I I 2
across terms Vi and Vi I In which (Ni -Ni ) Is now replaced by Pi N i . Mim-
Icking the Poisson approximation arguments for E (T ), we obtain
11m sup E (Du ) ~ 1 when If 2 < 00. This concludes the proof of Theorem 1.5.
n --+00
CHAPTER 2 37
Chapter 2
Mn = min Xi'
l::;i::;n
Pi = Jf ,1~i ~m ,
x,
M••
P Jf
M.
38 CHAPTER 2
Area p.
I
• :13 points X.
I
4 buckets
Figure 2.1.
Theorem 2.1.
Let f be a densIty on R 1 wIth 1/ 2 < 00. Then
and
Theorem 2.1 shows that there ls a close relatlon between E (T) and the
range R". For densltles wlth no talls, we have a generaUzatlon of Theorem 1.1.
It ls noteworthy that 1 +.!.
e
If
2, the Umlt value of E (T)/ n, ls scale lnvarlant.
When 8 = 00, It ls not clear at all how E (mln(R" e" -11 f 2,n )) varles wlth n.
For example, ls thls quantlty close to E (R" )e" -11 f 2 (whlch ls easler to handle)?
Thus, to apply Theorem 2.1 In concrete examples, some results are needed for
R". Some of these are stated In Lemma 2.1.
We wlll work wlth the followlng quantltles: X = X 1 has densIty f and dIs-
trIbutIon functIon F (x) = P (X ::; x) = I-G (x); the Integrals
00
00 o
E(IXD = o (O)+F (0) = IG(t)dt + I F(t)dt
o -00
Lemma 2.1
Let fI > 0 be arbItrary. Then:
(Iv) E (R" ) = 00 for all n ~ 2 If and only If E (R" ) = 00 for some n > 2
If and only If E (IX D= 00.
11m sup
%-+00
Ix IP (IX I > x) > o.
(vIII) Are equivalent:
11m Inf E (mln(R n ,8n »/n > 0 for all 8 > 0 ;
n-+oo
11m Inf
%-+00
Ix IP(IXI > x) > o.
Lemma 2.1 In conjunction with Theorem 2.1 gives us quite a bit of Informa-
tion about E (T). For example, we have
The-orem 2.2.
If J/ 2 < 00, then are equivalent:
8 < 00 .
CHAPTER 2 41
(And If s < 00, thIs 11m Inf Is equal to thIs 11m sup. Its value 1+~ Jf 2.)
C
Theorem 2.2 follows from Lemma 2.1 (I), (II) and Theorem 2.1. In Devroye
and Kllncsek (lQSO), one finds a slightly stronger result: E(T)=O(n) If and only
J
If s < 00 and f 2 < 00. In the next chapter, thIs will be generalized to R d , so
we don't have to bother wIth an R 1 versIon of It here.
We also have
Theorem 2.3.
If Jf 2 < 00, then condItIon (vI) of Lemma 2.1 Implies that
Theorems 2.2 and 2.3 cover all the small-tailed dIstrIbutIons wIth little oscil-
lation In the tails. In Akl and MeIjer (lQS2) the upper bound part of Theorem
2.3 was obtaIned for bounded densItIes. The actual limItIng expressIon of E (T)
shows the InteractIon between the effect of the peaks cJ
f 2) and the effect of the
)J
tails (E (Rn )). Note that E (Rn f 2 Is a scale-InvarIant and translatlon-
InvarIant quantIty: It Is solely determIned by the shape of the densIty. It Is
perhaps InterestIng to see when condItIon (vI) of Lemma 2.1 Is valid.
and
1
(Gnedenko, 1943). In that case we can take an = Inf(x: G (x):S -) where
n
G (x )=P (X ~ x), or In short, an = G- I(..!.) (Dehaan, 1975, pp. 117). We note
n
that (I) Is equivalent to G (0) < 00, G (x )/(xG (x)) -+ 0 as x -+ 00; or to
00
we know that (III) holds (GetTroy, 1958; Dehaan, 1975, Theorem 2.9.2). condition
(Iv) comes close to being best possible because If I Is nonlncreaslng and positive
for all x, then (III) Implies (Iv) (Dehaan, 1975, Theorem 2.9.2).
E(T)
11m sup - - _ ................_ - ~ 1.
11-+00 n E (R
II
)1..c II 2
Thus, good upper bounds for E (R II ) give us good upper bounds for E (T )/ n .
For example, we have
E (R II ) ~ E (m~ Xi + - m~n Xi -)
I I
Thus, depending upon the heaviness of the taU of X, we obtain upper bounds for
E (T ). that Increase as n Hl/r. We can do better when the moment generating
function of X Is finite In a neighborhood of the origin, I.e.
where we took r = log n. For the t ' 8 In the Interval [O,f), we have as
n -+ 00,
E (Rn ) = 0 (log n ),
and thus
Theorem 2.2 treats densities with compact support, while Theorem 2.3 cov-
ers quite a few densities with finite moment. We will now skip over some densi-
ties In a gray area: some have a finite first moment but do not satisfy (vi) of
Lemma 2.1, and some have Infinite first moment E (IX I), but have relatively
small tails. The worst densities are described In Theorem 2.4:
CHAPTER 2 45
Theorem 2.4.
Let Jf 2 < 00. Then
(1) I1msupE(T)/n 2
n -t>00
> IxIP(IXI > x) > 0;
o If and only If 11m sup
Z -tooo
(11) 11m Int E (T )/n 2 > 0 If and only If 11m Int Ix IP (IX I > x) > 0;
n~oo z~oo
(Note that T:S n 2 for all densities, and that statement (1) Impl1es
E(IXI> = 00.)
and
m
E(D u ) ....... E(n E Pi 2/P)·
i=1
Nearly all that was saId about E(T) remaIns easUy extendIble to E(C), E(Ds)
and E (D u ). For example, If s < 00,
and
1
If 8 = 00, we have E(C)"""" E(Ds ),......, E(T)/(2n) and E(D u )"""" E(T)/n.
We finally note that the quantity 8 Jf 2 Is scale Invariant and that for all
densities It Is at least equal to 1, In view of
1 = ( J f )2 ~ J f 2 J dx = 8J f 2 .
support of f support of f
2.2. PROOFS.
Also, In all cases, 8 2': Rn ' and we are done. Fact (m) Is proved as (II).
For Item (Iv), we note that E (Rn ) ~ 2nE (IX D, that E (Rn) i and that
E (R 2) = E (IX eX 2D2': Inf E (IX -x D= 00 when E (IX I) = 00.
x
To show (v), It suffices to prove that E (max( I X I I ! • • '! I Xn I)) = 0 (n ).
Let IXII have distributed function F on [0,00). Then for all E > 0,
00
We wlll now prove (vI). SInce mIn(Rn ,Em) ~ Rn ' we need only show that
Um Inf E (mIn(Rn ,on ))/ E (Rn)~ 1 for all 0 > o. Let us define x +=max(x ,0),
n--+oo
x-=mln(x,O), R+=max(X 1+, ... , Xn +), R-=mln(X 1-, • . . , Xn -). We will
show that E (Rn -mIn(R n ,On ))/E (Rn ) -+ 0 for all 0 > 0 and all nondegenerate
dIstrIbutIon wIth 8 = 00 ( for otherwise, the statement Is trIvIally true). Clearly,
It sumces to show that for all 0 > 0, E (R +-mineR +,on ))/E (Rn) -+ o. If x+
has finIte support, we see that this follows from (ll). Thus, we need only consIder
the case that X+ has Infinite support. Now, E(Rn) ~ E«R+-X)IR+>o)
00
o
00 00
00 00
J1-(1-G(t))n dt / J1-(1-G(t))n dt -+ O.
on 0
1
2"mIn(nu ,I) ~ l-(l-U)n ~ mln(nu ,I) , all n ~ I, u E [0,1].
e -I <
-
1-1. for t E[O,l].
2
Thus, If an = Inf(x:G (x) ~ ~)
n
and n Is so large that
an > 0, we have
00 00
00 00
00 00
(and this In turn follows of course from the fact that JG (t )dt < 00 Implies
o
tG (t) -+ 0 as t -+ 00). This concludes the proof of (VI).
We will now prove (vII) and (vIII) for R + and 11m sup (or 11m Inr)
x -+00 X --+00
xG (x) > O. The extension of the result to Rn Is left as an exercise. For
E E (0,6) we have the following chains of Inequalities:
on €n On
1..E (mln(R + ,8n)) = 1.. J l-(l-G (t))n dt = 1..( J + J )
n non 0 En
on
:S 1.. (w + n J G (t )dt ) :S € + 8nG (w) = € + i w G (w) ;
n En €
and
on
.!..E(mln(R + ,8n)) 2:.!.. J l_e-nG(t) dt 2: (l_e-nG(on»).
n n 0
We are left with the proof of Theorem 2.1. This wlll be taken care of In
small steps. From the observation that conditional on Mn, Mn *, the Ni 's are
binomially distributed with parameters n -2, Pi Ip , we deduce the following:
CHAPTER 2 49
Lemma 2.2.
(I) T < n 2.
m
(III) E (T IMn ,Mn *) ~ (n _2)2 :E p; 2 .
;=1
m
= n-2 + [(n-2)2-(n-2)) :E (p;/p)2
;=1
m m ~B ~.
:E p; 2 = :E (J f I(Rn 1m ))2(Rn 1m? ~ (Rn 1m) J f 2 .
;=1 ;=1 XI M"
m m
:E p; 2 = :E (X; +CX; )( J f I(x; +I-X; )?
;=1 ;=1 x,
50 CHAPTER 2
II
(where f (a ,x) = Inf Jf IIY-z D
z ~:.:ill z
M. '
= 1.... R n J f 2(Rn 1m ,x).
m M.
A(6) 8 00 8
Find values A (8) and A * (8) such that Jf 2 = - f 2 J f 2 = - f 2, and a
-00 3 A '(6) 3
value B (8) such that
Thus, If A Is the event [Mn < A (8), Mn * > A * (8)] and B Is the event
[Rnlm ~B(8)],wehaveonA nB,fora=Rnlm,
M. ' 00
Thus,
We also have
m
E Pi 2 ~ IAnB , G(c)
i=1
where
II
G (8) = sup
A (6)::;z <II <A '(6)
(Jz f )2 .
J-I..l/~
CHAPTER 2 51
EPi
i=1
2 ~ IA mln((l-6)..!...R n f
m
J 2 , C (6» = IA Z (Rn )
Let us take expectations on both sides of this Inequality. For arbitrary E > Owe
have
The proof Is complete If we can show that the last probability Is 0 (1) for every
€ > o. Let U l ' U 2 be Independent uniform [0,1] random variables, and note that
p Is distributed as U ll/n U //(n-l). Thus,
~ 2 (1+€)-(n-l)/4 ,
when m ~ en. We also have In those cases, by the proof of Theorem 2.1 (II),
for arbitrary € > o. Here en = min. When we sort, there Is an additional cost
of the form Am for some constant A > 0 due to the time needed to Initialize
and concatenate the buckets. If E (Rn ) -- 00, It Is easy to see that In the upper
bound,
E(R )
E (T) ~ n n Jf 2 (Ho (1))
en
provided that E (Rn )1 en -- 00. If we balance the two contributions to the cost
of searching with respect to m , then we will find that It Is best to let m Increase
at a faster-than-llnear pace. For example, consider the mInImIzatIon of the cost
functIon
CHAPTER 2 53
Am + n E (Rn) Jf 2 .
(.!!::)
n
m = n J E ~n) Jf 2 ,
If we had picked m ~ cn , then the main contribution to the sorting time would
have come from the selection sort, and It would have Increased as a constant
times n E (Rn). The balancing act reduces this to about n viE (Rn ), albeit at
some cost: the space requirements Increase at a superllnear rate too. Futhermore,
for the balancing to be useful, one has to have a priori Information about E (Rn ).
Let us consider a few examples. For the normal distribution, we would
optimally need
and obtain
m ~ n .
V
1_1_
2A
log n ,
Am ~ E (T) ~ n J~ log n
54 CHAPTER 2
Similarly, for all distributions with finite fix Ir f (x)dx, f f 2(X )dx, we can
choose m such that
H...!...
Am "'-'E(T) ~ en 2r
T 1
Ds =-+-.
2n 2
3
11m sup E (Ds) ~ - .
n-+()() 2
We stress again that the Idea of a superllnear number of buckets seems more use-
ful In problems In which a lot of preprocessing Is allowed, such as In ordinary
searching and In data base query problems.
CHAPTER 3 55
Chapter 3
MULTIDIMENSIONAL BUCKETING.
The cells will be called A 1J . . . Am' and N j will denote the number of X j ' s
J
In cell A j • Thus, to determine all the cell memberships takes time proportional
to n. Within each cell, the data are stored In a linked list for the time being.
56 CHAPTER 3
.
. "
8 by 8 grid
. . • ,
, , , ' 64 po ints
,
I
I I ,
, •
.,
I I
It:
I I
C ell A. has N. =3 points
.
'I I
, I
I I
I I
I I
I
I
I
I I
I
I
I
I
I
I
I
I
I I
Figure 3.1.
The cell structure has been used with some success In computational
geometry (see for example, Shamos (1978), Weide (1978), Bentley, Weide and Yao
(1980), and Asano, Edahlro, Imal, Irl and Murota (1985)). Often It suIDces to
travel to each cell once and to do some work In the I-th cell that takes time
g (N; ) for some function g (or at least, Is bounded from above by ag (N; ) and
from below by bg (N;) for some appropriate constants a ,b: this sllghtly more
general formulation wlll not be pursued here for the sake of slmpllclty).
For example, one heuristic for the travellng salesman problem would be as
follows: sort the points within each cell according to their y-coordlnate, join
these points, then join all the cells that have the same x-coordinate, and finally
join all the long strips at the ends to obtain a traveling salesman path (see e.g.
Chrlstofides (1976) or Papadlmltrlou and Stelglltz (1976)). It Is clear that the
m
work here Is 0 (n) + 1: g (N;) for g (u )=u 2 or g (u )=u log(u +1) depending
;=1
CHAPTER 3 57
upon the type of sorting algorithm that Is used. The same serpentine path con-
struction Is of use In minimum-weight perfect planar matching heuristics (see e.g.
Irl, Murota, and Matsui 1981, 1983).
If we need to find the two closest points among X v ... ) Xn In [O,l]d, It
clearly sumces to consider all pairwise distances d (Xi ,Xj ) for Xi and Xj at most
ad (a constant depending upon d only) cells apart, provided that the grid Is con-
structed by cutting each side of lO,l]d Into n' =Ln 1/ d equal pieces. USing the J
Inequality (u I+U 2+ ... +Uk)2 :::; 2 -1(U /+ ..• +Uk 2), It Is not hard to see that the
m
total work here Is bounded from above by 0 (n ) plus a constant times E Ni 2.
i=1
I
"
8 by 8 grid
I
I
I
I I 64 points
I I
I
I
I I I
I I
I I
I
I I
I
I I
II I I I I
I I
I I I
I
.....
I I
I
I
""
I
I I j
.
I
I I
I
B
I I
I .....
.....
I
I I
.4~ A.j ~
Figure 3.2.
Range search problem: report all points in the
intersection of A and B. Grid to be used in solution is also shown.
characteristics. If for example we want to retrieve all points for which the coordi-
nates are between certain threshold values, then we can speak of an orthogonal
range query. In the survey articles of Bentley and Friedman (1g7g) and 'Asano,
Edahlro, Imal, Irl and Murota (1gS5), some comparisons between cell structures
and other structures for the range search problem are made. The range search
problem has one additional parameter, namely the number of points retrieved.
Query time Is usually measured In terms of the number of retrieved points plus a
function of n. If most queries are large, then It makes sense to consider large
cells. In other words, the cell size should not only depend upon nand f , but
also on the expected size of the query rectangle (see e.g. Bentley, Stanat and Wll-
llams, 1977). In addition, new distributions must be Introduced for the location
and size of the query rectangle, thus compllcatlng matters even further. For
these reasons, the range search problem wlll not be dealt with any further In this
collection of notes. The travellng salesman problem Is briefly dealt with In sec-
tion 3.3, and In section 3.4, we wlll look at some closest point problems In compu-
tational geometry. The latter problems differ In that the time taken by the algo-
rithm Is no longer a simple sum of an univariate function of cell cardlnalltles, but
a sum of a multivariate function of cell cardlnalltles (usually of the cardlnallty of
a central cell and the cardlnalltles of some neighboring cells). In the entire
chapter, we wlll deal with a work function g. Initially, the time of an algorithm
Is given by
m
T ~ g(Nj )
;=1
Remark.
We would llke to point out that (1) and (11) Imply that g Is continuous and
that g (0)=0. Examples of functions g (.) satisfYing the llsted conditions are
g (u) = U r , some r ~ 1, and g (u) = u log(u +1).
CHAPTER 3 59
Theorem 3.1.
Let 1 be an arbitrary density on Rd. Then are equivalent:
m
In the proof, we will use the symbols Pi J1 , C = U Ai, P = JI. The
A, i=l C
following fact will be needed a few times: given C,
where Y j Is a binomial (n -2d ,Pi) random variable, w,. Is a binomial (n ,Pi /p)
random variable, and" <" denotes "Is stochastically smaller than", I.e.
Proof of A.
Let Co be the smallest closed rectangle covering the support of f , and let
f n (x) be the function defined by the relations: f n (x) = 0, x f. C,
fn(x) = (n-2d)p;, x E A; . We have
m m
E(T) = ~ E(g(N;)) = ~ E(E(g(N;)IC))
;=1 ;=1
m
> ~ E (E (g (Y; )1 C))
;=1
m 1
2: .~ E("2 g ((n-2d)p;-v'(n-2d)p;))
.=1
(by Lemma 5.4, If we agree to let 9 (u )=0 for u :SO)
Data point
Figure 3.3.
CHAPTERS 61
where the Inner limit Infimum Is with respect to a.e. convergence. Now, for
almost all wEn (where (n,F ,P ) Is our probability space with probability ele-
ment w ), we have C -+ Co and thus )"(C) -+ ),,(C o)' But then, by Lemma 5.11,
for almost all (x ,w) E R d X n, we have f n (x ) -+ f (x). Thus, the Fatou lower
bound given above Is
ProofofB.
m m
E(T) ~ ~ E(E(g(Wi +2d!C» ~ ~ E(E(g(2Wi )!C) +g(4d»
i=1 i=1
m
~ mg(4d) + 2k ~ E(E(g(Wi)!C»
;=1
m
~ mg (4d) + 2k ~ (aE (g (np;/p »+ag (1»
i=1
where a Is the constant of Lemma 5.4 (and depends upon k only). Thus, to
show that E (T) = 0 (n), we need only show that ~ E
m
(g (np";p »= 0 (n).
;=1
Now,
The last term Is uniformly bounded In n as we wlll now prove. First, we have
g (n) ~ n kg (1). We wlll show that P (p <1/2) ~ 2d exp(-n /(4d» for all n.
Because the function 11 k e -u, 11 > 0, Is uniformly bounded, we see that
sup g (n )P (p < 1/2) < 00. Indeed,
n
d
[p < 1/2) ~ .U [p / < 1-1/(2d»)
J=1
where· Pj Is the Integral of / over all x ' s whose j-th component Ues between
I
the minimal and maximal j-th components of all the X;' s. But by the proba-
blllty Integral transform, when U l' . . . , Un are Independent uniform [0,1) ran-
dom variables,
m
~ E (g (2np;
;=1
»= m
~ E (g (2n )"(A; )
;=1 A.
J/ /)"(A;)))
m
::::; ~ E cJ g (2n )"(A; )/ ) /)"(A; »
;=1 A.
Proof of C.
By a bound derIved In the proof of A and by the second Inequality of
Lemma 5.4, we need only show that
when f does not have compact support. By our assumptIons on g, (n -2d) can
be replaced by n. We may assume wIthout loss of generality that the first com-
ponent of XI has unbounded support. Let (a l'b 1), ... , (ad ,b d ) be € and 1-€
quantlles of all the margInal dIstrIbutIons where € E (0,1/2) Is chosen such that
d
B = .X
)=1
(a j ,b j) satIsfies JB f =.!..
2
Let Q be the collectIon of A j I 8 Intersect-
q -Z
~ E(- E g LnPi 'J ) ~ E(-Zg(-
q 1 E LnPi 'J ))
m q AJEQ m q AjEQ
q n 1 q n n
~ E(-Z g(--1)) ~ E(Z (- - - ) g(--1)/(--1)).
m 2q 2 n 2q 2q
where we used Jensen's Inequality. Since g (u )/u i 00, we need only show that
for any constant M, however large,
Now, let U, V be the mInImum and the maxImum of the first components of
X11 • . . ,Xn • When Z = 1, we have
d -I
<
I ~ + 2) ,
q m-d-
V-U
mild
64 CHAPTER 3
and thus
d-l
_P(2m-d- > n )
4(M +1) .
The second term of the last expressIon Is 0 (1) for obvIous reasons. The thIrd
term Is 0 (1) sInce m ~ n and V -U ---+ 00 In probability as n ---+ 00. The last
term Is 0 (1) sInce m ~ n. ThIs concludes the proof of C.
Jf log+ f < 00 .
The latter condItIon Is only vIolated for all but the most peaked densItIes. These
results generalize those of Devroye and KlIncsek (Ill81). We should mentIon here
that If we first transform arbItrary data by a mappIng h : R 1 ---+ [0,1] that Is
contInuous and monotone, construct buckets on [0,1], and then carry out a subse-
quent sort withIn each bucket as descrIbed above, then often E (T) = 0 (n): In
other words, wIth little extra effort, we gaIn a lot In expected tIme. The Ideal
CHAPTER 3 65
Distribution function F
Figure 3.4.
The conditions on f mentioned above are satisfied for all bounded densities
f . It is nice exercise to verify that If a transformation
h (x) = x /(Hlx D
Is used and f (x) :S a exp(-b Ix Ie) for some a ,b ,c > 0, then the density of the
transformed density remains bounded. Thus, for the large class of densities with
exponentially dominated tall, we can sort the transformed data In average time
o (n) by any of the bucket-based methods discussed above.
66 CHAPTER 3
~
Uniform interval widhts ---.
/
/
7
7
l
Nonuniform inte rval widths
J
1I~
V
~
Figure 3.5.
A nonlinear transformation useful for distribution
with unbounded support.
J
for some constants a ,b > 0 when f log(f +1) < 00. Hence, If f Is any den-
sity with a finite moment generating function In a small neighborhood of the ori-
gin, we obtain E (T) = 0 (n log log n). Examples of such densities are the
exponential and normal densities. This extends an Interesting observation
reported In Akl and Meijer (1982).
Figure 3.6.
The planar graph pOint location problem:
return the set in the partition to which the query point belongs.
68 CHAPTER 3
Query paint
Figure 3.7.
The rectangular point location prob lem.
edges Intersecting the north, south east and west boundaries (sorted), and the
region of the partition containing the north-west corner vertex of the bucket.
This assumes that all regions are numbered beforehand, and that we are to
return a region number. Partition each bucket In a number of horizontal slabs,
where the slab boundaries are defined by the locations of the vertices and the
pOints where the edges cut the east and west boundaries. For each slab, set up a
linked list of conditions and region numbers, corresponding to the regions visited
when the slab Is traversed from left to right. (Note that no two edges cross In our
graph.) It Is Important to recall that the number of edges In a planar graph Is
o (n ) , and that the number of regions In the partition Is thus also 0 (n ) . One
can verify that the data structure described above can be set up In worst case
time 0 (n 3/2) when m ~cn for some constant c. The expected set-up time Is
o (n) In many cases. This statement uses techniques similar to those needed to
analyze the expected search time. We are of course mainly Interested In the
expected search time. It should come as no surprise that the expected search time
decreases with Increasing values of m . If m Increases linearly In n , the expected
search time Is 0 (1) for many distributions. Those are the cases of Interest to us.
If m Increases faster than n, the expected search time ,while still 0 (1), has a
smaller constant. Unfortunately, the space requirements become Inacceptable
because O(max(m ,n )) space Is needed for the given data structure. On the posi-
tive side, note that the space requirements are 0 (n ) when m Increases at most
as O(n).
Figure 3.8.
The slab method described above Is due to Dobkin and Lipton (1976), and
differs slightly from the method described In Edahlro, Kokubo and Asano (19S3).
The time taken to find the region number for a query point X In a given bucket
Is bounded by the number of slabs. To see this, note that we need to find the
slab first, and then travel through the slab from left to right. Thus, the expected
70 CHAPTER 3
m
time Is bounded by ~ Pj 8 j , where 8 j denotes the number of slabs In the i -th
j=1
bucket, Pj Is the probability that X belongs to the i -th bucket, and the expected
time Is with respect to the distribution of X, but Is conditional on the data. But
E (8j )~ npj +E (aj), where a j Is the number of edges crossing the boundary of
the i -th bucket. Without further assumptions about the distribution of the data
points and the edges, any further analysis seems dim cult, because E (aj ) Is not
necessarily a quantity with properties determined by the behavior of f In or near
the i -th bucket. Assume next that X Is uniformly distributed. Then, the
expected time Is bounded by
=~+E(a)
m m
where E (a) Is the expected value of the overall number of edge-bucket boundary
crossings. E (a) can grow much faster than m : Just consider a uniform density
on [0,1]2. Sort the points from left to right, and connect consecutive pOints by
edges. This yields about n edges of expected length close to 1/3 each. E (a)
should be close to a constant times n Vm. Also, for any planar graph,
a ~"fn Vm where "f Is a universal constant. Thus, It Is not hard to check that
the conditional expected search time Is In the worst-case bounded by
n n
-+"f--.
m Vm
This Is 0 {1) when m Increases as O( n 2). Often, we cannot afford this because of
space or set-up time limitations. Nevertheless, It Is true that even If m Increases
linearly with n , then the expected search time Is 0 (1) for certain probabilistic
models for putting In the edges. Help can be obtained If we observe that an edge
of length £ cuts at most 2(2+£ Vm) buckets, and thus leads to at most twice
that number of edge-boundary crossings. Thus, the expected time Is bounded by
CHAPTER 3 71
where e Is the total number of edges and L j Is the length of the j -th edge. SInce
e =0 (n ). and m "'-'en (by assumption). this gives 0 (1) provIded that
e
~ E(L j ) = O(vm).
j=l
In other words. we have obtained a condition which depends upon the expected
lengths of the edges only. For example. the condItion Is satisfied If the data
points have an arbitrary density f on [0.1]2 • and each point Is connected to Its
nearest neighbor: this Is because the expected lengths of the edges grow roughly
as l/Vn. The condition Is also satisfied If the points are all connected to points
that are close to It In the ordinary sense. such as for example In a road map.
r--
r--
I I-
I
I - "- I
"-
"~
Que ry point
'--
Figure 3.9.
The point enclosure problem:report
all rectangles to which query point belongs.
72 CHAPTER 3
;=1
Figure 3.10.
The Euclidean traveling salesman problem:
find the shortest path through all cities.
yields a tour which Is at worst 3/2 times the length of the optimal tour. Other
heuristics can be found In Karp (1977) (with additional analysis In Steele (1981))
and Supowlt, Reingold and Plaisted (1983). We are not concerned here with the
costs of these heuristic tours as compared, for example, to the cost of the optimal
tours, but rather with the time needed to construct the tours. For lid points In
[0,1]2, the expected value of the cost of the optimal tour Is asymptotic to
f3Vn Jff where 13 > 0 Is a universal constant (Steele, 1981). For the uniform
distribution, this result goes back to Beardwood, Halton and Hammersley (1959),
where It Is shown that 0.61 :s :s13 0.92.
For the ETSP In [0,1]2, we can capture many bucket-based heuristics In the
following general form. PartitIon [0,1]2 Into m equal cubes of side l/Vm each.
Typically, m Increases In proportIon to n for sImple heurIstics, and m =0 (n)
when the expected cost of the heuristic tour Is to be optimal In some sense (see
Karp (1977)and Supowlt, Reingold and Plaisted (1983)). The bucket data struc-
ture Is set up (In time 0 (n +m)). The cells are traversed In serpentine fashion,
CHAPTER 3 75
starting with the leftmost column, the second column, etcetera, without ever lift-
Ing the pen or skipping cells. The points within the buckets are all connected by
a tour which Is of one of three possible types:
A. Random tour. The points connected as they are stored In the linked lists.
B. Sorted tour. All points are sorted according to y coordinates, and then
linked up.
c. Optimal tour. The optimal Euclidean traveling salesman tour Is found.
I, I, (
,
r\
J '- ../
J
'- ,/
J
, ~
Figure 3.11.
Serpentine cell traversal.
76 CHAPTER 3
II"'"
r\ .,..
) 7~
K~
~2 J
/
;1
"" "-
"-
<
~
•
~I r; ~ ~
\. I~ ~ r J
"-~ / \
v
\ )
~\
-"
\ J
>
V v
I
Figure 3.12.
A sorted tour.
The time costs of A ,B,C for a bucket with N points are bounded respectively
by
CN,
CN log(N +1),
and
for constants C. For the optimal tour, a dynamic programming algorithm Is used
(Bellman, 1962). The m tours are then linked up by traversing the cells In ser-
pentine order. We are not concerned here with just how the Individual tours are
linked up. It should for example be obvious that two sorted tours are linked up
CHAPTER 3 77
by connecting the northernmost point of one tour with the southernmost point of
the adjacent tour, except when an east-west connection Is made at the U-turns In
the serpentine. It Is easy to see that the total cost of the between-cell connec-
tlons Is 0 (Vm), and that the total cost of the tours Is 0 (n /Vm) for all three
schemes. For schemes A and B therefore, It seems Important to make m pro-
portlonal to n so that the total cost Is 0 (rn), just as for the optimal tour. In
scheme C, as pointed out In Karp (lg77) and Supowlt, Reingold and Plaisted
(lQS3), If m Increases at a rate that Is sllghtly subllnear (0 (n)), then we can
come very close to the globally optimal tour cost because within the buckets
small optimal tours are constructed. The expected time taken by the algorithm
Is bounded by
m
o (n +m ) + E ('E CN;) ,
;=1
m
o (n +m ) + E ('E CN; 10g(N; +1)) ,
;=1
and
o (n +m ) + E (~ CN; 2 N ,)
;=1
respectively.
Theorem 3.2.
For the methods A ,B ,C for constructing traveling salesman tours, the
expected time required Is bounded by 0 (n +m ) plus, respectively
(A) Cn
where 'l/J( u) Is the functional generating function for the density f on [0,1]2.
78 CHAPTER 3
Remark.
The functional generating function for a denslty f on [0,1]2 ls defined
by
whlch explalns the name, Note that the Taylor serles ls not necessarlly conver-
gent, and that 1/J ls not necessarlly finlte: It ls finlte for all bounded densltles wlth
compact support, and for a few unbounded densltles wlth compact support. For
example, If f :$ f * on [0,1)2, then 'ljJ(u) :$ J7"e
1 u," ,u > 0. Thus, the bound
In (C) becomes
1 (l+..!!..)'"
2Cn--e m
f *
.!:..r
(In fact, by a dlrect argument, we can obtaln the better bound 2Cne m .) Note
that In the paper of Supowlt et al. (1983), m ls allowed to be plcked arbitrarily
close to n (e.g. m =n flog log log n). As a result, the algorlthm based on (C)
has nearly llnear expected tlme. SupowIt et al. (1983) provlde a further
modlficatlon of algorlthm (C) whlch guarantees that the algorlthm runs In nearly
llnear tlme In the worst case.
n n
= E (E B j log( E B j + 1»
j=l j=l
CHAPTER 3 79
n
= n E(Bllog(BI + L: B j + 1))
j=2
n
= nPi E (iog(2 + L: B j ))
j=2
where B l' . . . ,Bn are lid Bernoulli (Pi) random variables. Also, since
Pi log(2+(n -1)Pi ) Is a convex function of Pi. another appllcatlon of Jensen's Ine-
quallty yields the upper bound
n Jf log(2 + n-1
m
f ).
A;
which Is all that is needed to prove the statement for (B). For (C). we argue
slmllarly. and note that
E(Ni 2 N.)
= E((
j=1
EB IT 2Bl))
j )(
j=1
= nE (B 12Bl IT 2Bl)
j=2
= 2nPi(1 + Pi )n-l
<
-
2np.
I
e(n-l)l'.
n-l I
~ 2n Jf e m (Jensen's Inequallty).
A.
(1) the close pairs problem: Identify all pairs of points within distance r of
each other;
(11) the isolated points problem: Identify all points at least distance r away
of all other pOints;
(lU) the Euclidean minimal spanning tree problem;
(1v) the all-nearest-neighbor problem: for each pOint, find Its nearest neigh-
bor;
(v) the closest pair problem: find the minimum distance between any two
pOints.
CHAPTER 3 81
, • •
••
Distance r in definition
I • ~ ~
•
-' •
. ..... • •
•
• •
....... '- •
• •Figure 3.13. \
Close pairs graph.
These problems are sometimes called closest point problems (Shamos and
Hoey. lQ75; Bentley. Weide and Yao. IQSO). What compllcates matters here Is
the fact that the time needed to find a solution Is not merely a function of the
form
m
E g(N;)
;=1
as In the case of one-dimensional sorting. Usually. the time needed to solve these
problems Is of the form
m
E 9 (Nj • N j *)
j=1
m
~ g(Nj + Nj ")
j=1
where g Is another function. The overlap between buckets Implicit In the terms
N j +Nj " does not matter because the expected value of a sum Is the sum of
expected values. Our goal here Is to obtain the correct asymptotic order and
constant. Throughout this section too, X l' . . . ,Xn are Independent random
vectors with density f on [O,lJd .
CHAPTER 3 83
Figure 3.14.
All nearest neighbor graph at left. This graph is a subgraph
of the minimal spanning tree, shown at right.
that satisfy IIXi-Xj II :S r, and can thus be used for clustering too. The prob-
lem of the Identification of these pairs Is called the close pairs problem.
Neighboring buckets
...-4--4-....:
Central cell
r/--.J2
Figure 3.15.
j :N. =1 j :A J neighbor of A.
= E,Nj E, 1
j j :N. =1, and A; neighbor of A J
= 'Yd n.
The grid Inltlallzatlon takes time n(r- d ) and 0 (mln(r- d ,1». In particular, the
entire algorithm Is 0 (n) In the worst-case whenever rn lid 2: c > a for some
constant c. For r much smaller than n -lid, the algorithm Is not recommended
because nearly all the points are Isolated points - the bucket size should be made
dependent upon n Instead.
'1;MaXimal gap .I
t j j 1 l l l kjXj
Figure 3.16.
Finding the maximal gap in a sequence of
n points by dividing the range into n+1 intervals.
If we organize the data Into a bucket structure with n +1 Intervals, no two points
within the same bucket can define the maximal gap. Therefore, It Is not neces-
sary to store more than two points for each bucket, namely the maximum and
the minimum. To find the maximal gap, we travel from left to right through the
buckets, and select the maximum of all dUferences between the minimum of the
current bucket and the last maximum seen untU now. This algorithm Is due to
Gonzalez (lQ75).
Let us turn now to the close-pairs problem. The time needed for reporting
all close pairs Is of the order of
E
j :A, neighbor of A,
where the first term accounts for llstlng all pairs that share the same bucket, and
the second term accounts for all distance computations between points In neigh-
boring buckets.
For this problem, let us consider a grid of m buckets. This at least guaran-
tees that the Inltlallzatlon or set-up-tlme Is 0 (n +m). The expected value of our
performance measure V Is
E
j :A, neighbor of A,
and It Is the last term which causes some problems because we do not have a full
double sum. Also, when Pi = J f Is large, Pj Is lIkely to be large too since Ai
A,
and A j are neighboring buckets. The asymptotlcs for E (V) are obtained In the
next theorem. There are 3 situations when m =n :
The upper bound In the theorem Is valid In all three cases. In fact, Theorem 3.3
also covers the situation that m ~n: m and / or r are allowed to vay with n
In an arbitrary manner.
Theorem 3.3.
Let "( = "(r , d, m) be the number of neighboring buckets of a particular
bucket In a grid of size m defined on [O,ljd, where r Is used In the definition of
neighbor. Then
2
E (V) ~ n + ..!:...("( + l)I f 2 .
m
If m -+ 00, n -+ 00, r -+ 0,
If mr d -+ f3 E (0,00), then "( oscillates but remains bounded away from 0 and 00
In the tall. In that case,
E(V)=O(n)
2
E (V) = n + ..!:... 3 d If 2(1 + 0 (1)) .
m
88 CHAPTER 3
f n (x ) = IA 1()I J f , x E [O,IJ d
x A(z)
1
gn(x) = -IB()I J f, x E [O,IJ d .
x B(z)
PiPj n(n-l)
i=1 j:Aj neighbor of A,
m
~ n 2:E )'(Ai »'(Bi ) f n (Xi )gn (Xi) (for any x 1 E A 11 . . . } xm E Am)
i=1
m
= n2:E J),(B i ) f n (x )gn (x )dx
i=1 A,
SInce f nand gn' are probably very close to each other, the Integral In the last
J
expressIon Is probably very close to f n 2. Therefore, little will be lost If the
Integral Is bounded from above by the Cauchy-Schwartz Inequality:
CHAPTER 3 89
< J A(A1( X
)) (
A(x)
J f 2)dx J A(B\ x )) ( J B(x)
f 2)dx
(Jensen's Inequality)
m m
~ Jf 2 ~ Jf 2
;=1A, ;=1A,
:s J 1
A(B (x))
( J f 2) dx
B(x)
(Jensen' s Inequality)
= E J ,,/A(A;)
;=1 A,
1 ~
j:A J neighborofA,A J
Jf 2(y )dy dx
= E ,,/A(A
;=1
1
1) A J
Jf 2(y ) [ ~
;:A, neighbor of A J A,
JdX) dy
;=1 A J
=J[2.
when m -+ 00, r -+ O. This concludes the proof of the first two statements of
the theorem. The remainder of the theorem Is concerned with the size of 1 as a
function of rand m , and follows from elementary geometric principles.
We note for example that when m -+ 00, mr d -+ 0, the optimal choice for
m would be a constant times n #Ji2 - at least, this would minimize
Cm + E (V) asymptotically, where C Is a given constant. The minimizing
value Is a constant times n #Ji2. The only situation In which E (V) Is not
o (n) for m "-' en Is when nr. a -+ 00, I.e. each bucket has very many data
pOints. It can be shown that the expected number of close pairs grows as a con-
stant times n 2r d, and this provides a lower bound for E (V). Thus, the
expected time for E (V) obtained In Theorem 3.3 has an optimal asymptotic rate.
-
"
~
I
/
., ........
,
,
/ ~
( j
\\ I
v
\.
"""'"
~
" i-o- ./
1
".~
Figure 3.17.
Spiral search for nearest neighbor.
J
then the shifted grid method takes expected time 0 (n ) If and only If f 2 < 00.
Rabin (lg76) chooses a small subset for which the closest pair Is found. The
corresponding minimal distance IS then used to obtain the overall closest pair In
llnear expected time. It Is perhaps Interesting to note that not much Is gained
over worst-case time under our computational model, since there exist algorithms
which can find the closest pair In worst case time 0 (n loglogn) (Fortune and
Hopcroft, Ig7g).
Chapter 4
The expected value of the worst possible search time for an element In a
bucket data structure Is equal to the expected value of Mn = max N j times a
l:S;j :S;m
constant. This quantity differs from the worst-case search time, which Is the
largest possible value of max N j over all possible data sets, I.e. n. In a sense,
l:S;j :S;m
the maximal cardinality has taken over the role of the height In tree structures.
Its main Importance Is with respect to searching. Throughout the chapter, It Is
crucial to note the dependence of the maximal cardinality upon the density f of
the data points X 11 • • • X n , which for the sake of simplicity are assumed to
J
take values on [O,l)d. The grid has m ~ cn cells for some constant c > 0,
unless we specify otherwise.
In section 4.1, we look at the properties of Mn ' and In particular of E (Mn )
following analysis given In Devroye (1985). This Is then generalized to E (g (Mn ))
where g Is a nonlinear work function (see section 4.3). Such nonlinear functions
of Mn are Important when one particular bucket Is selected for further work, as
for example In a bucket-based selection algorithm (section 4.2). Occasionally, the
maximal cardinality can be useful In the analysis of bucket algorithms In which
certain operations are performed on a few buckets, where buckets are selected by
the data pOints themselves. In section 4.4, we will Illustrate this .on extremal
point problems In computational geometry.
where r Is the gamma functIon. For example, when n = 40320, E (Mn ) Is near
7.35 (Gonnet, 1981, table V). In other words, E (Mn ) Is very small for all practI-
cal values of n. AddItIonal InformatIon Is gIven In Larson (1982). The sItuatIon
studIed by Gonnet pertaIns maInly to hashIng wIth separate chaInIng when a per-
fect hash functIon Is available. As we know, order-preservIng hash functIons lead
to non-unIform dIstrIbutIons over the locatIons, and we will see here how E (Mn )
depends upon f. ThIs Is done In two steps. FIrst we will handle the case of
bounded f , and then that of unbounded f .
Theorem 4.1.
Assume that f * = ess sup f < 00 (note: A{X: f (x) > f *}
= 0 ; A{X : f (x) > f *- ~} > 0 for all ~ > 0). Then, If m ~ en for some
e > 0,
E(Mn) ~ log n
log log n
and, In partIcular,
E (Mn ) =
log n
+ log n
(log log log n
f *e +
+ log(--) 0 (1)) .
log log n (log log n )2 e
and
where I Is the IndIcator functlon, and where n (1+~) and n (1-~) should be read
as "the smallest Integer at least equal to .. :'. By Lemma 5.8,
CHAPTER 4 95
4
nP (IN -n I 2: n t) :S -4 "
nE
c (n)
:S E (Mn(l+f))C (n (Ht))
C
((
n I+E
))
1
+ 0 (-) -b (n (Ht)) + (b (n (Ht)) -b (n )).
n
Now, b (n (I+E))-b (n )=0 (1), and, for n large enough, C (n )2: C (n (Ht))
log n
2:c (n) 2: C (n )/(Ht/log n).
log(n (Ht))
Thus,
b (n (Ht))+o (1)
E (Mn(l+f)) 2: C (n (I+t))(I+t/log n)
b (n (HE))+O (1)
C (n (HE))
Simllarly, It can be shown that E (Mn ) :S (b (n )+0 (1))/ C (n), and combIning
this glv"es us our theorem.
m
P (Mn * < h) = II P (Ni * < h)
i=1
m
< II (1-P (Ni * =h))
i=1
m
~ exp(- E P (Ni * =h ))
i=1
h
'!!:"'f -....!!..f.
m n
JA e m
.!!:...(f * -2€)
m
-....!!..f·
> e m J dx
A, 1/.-1 1::5:'
[ 8- J dX)
A,I/.-/I>'
CHAPTER 4 97
-'!!:"f'
> em (6-0(1».
I
Thus,
n f •
P(Mn * < h) :::; exp [ - ~ [ : ) (f
h
* _2E)h e -iii (6--0 (1» .
1
8 = log m - h log h + h - -log(21Th )
2
+ 0 (1) + h loge
f * -2E ) - n
- f * + log 6
c m
b (n )-1J f * -2E
= log n +
c (n)
(1 + loge
c
))
(where t = 1
log b - -log(21T) - - - + log c +
f * 0 (1))
2 c
= log n +
b (n )-1J
log n (l+log(
f * -2E )
(log log n )2 c
1 1
- - log log n + - log log log n + t + 0 (1)
2 2
[ 1 + loge f *c- 2 € ) - log log n + log log log n + 0 (1)) + log log n)
= logn
log log n
(IOg( f*-2€)
c
-IOg(L)
c
+1J+ 0 (1»)
98 CHAPTER 4
log n
> -", (all n large enough)
3 log log n
", log n
~ h (1-exp(-exp(- ))) ~ h (1-exp(-exp(log log n )))
3 log log n
This concludes the proof of the lower bound, since", > 0 Is arbitrary.
n . e-c
P (Mn * ~ k) ~ ~ P (N; * ~ k) ~ n ~ c J - .-
;=1 i?:.k J!
k+1
~ nc k
k +1-c
Thus,
00 00
k+1
E (Mn *) ~ h + ~ P (Mn * ~ k) ~ h + ~ nc k
k =h k =h k+1-c
~ h + nc h ~ ( h +1 )2.
h! h +1-c
CHAPTER 4 99
and that
( h +1 )2 = 1 + ~ + 0 ( 1.) .
h+1-e h +1 h
Therefore,
b (n )+17+0 (1)
e (n)
log n
E(Mn)~ -...:::...-
log log n
have support contained In [0,1] but not In [0, 1-~] or [~,1] for any ~ > o.
When f Is unbounded, the theorem gives very little Information about
E (Mn)' Actually, the behavior of E (Mn ) depends upon a number of quantities
that make a general statement all but Impossible. In fact, any slow rate of con-
vergence that Is 0 (n) Is achievable for E (Mn)' Since Ni Is binomial (n ,Pi)
where Pi Is the Integral of f over the I-th bucket, we have
Theorem 4.2.
Let q max mpj. Then
l~j~m
n
-;:; q ~ E(Mn ) ~ -;:; q
n
+ t1 (log m
n
+ -;:; t
q(e -t-l))
t
:- IOgt m + : q (e t- 1 ), all t > 0, m > 3.
n
Mn ~ m?-X npj
I
+ m~x I
Ui = -
m
q + U.
CHAPTER 4 101
~ q(e'-t-l)
<em
Thus,
E(Mn):S :q 1
Theorem 4.2 shows that there are many possible cases to be considered with
respect to the rates of Increase of q and m. Assume that m ~ en , which Is the
standard case. Then
so that
log m log n
E (Mn) ~ (1+0 (1)) -...:..;..::;:...=.-
log log n
log [ 10: ; ]
(note that thIs choIce of t almost mInImIzes the upper bound). Thus, Theorem
4.2 provIdes a consIderable short-cut over Theorem 4.1 If one Is only Interested In
first terms.
A thIrd case occurs when q = 0 (log n), but q Is not necessarily very small.
In that case, for the same choice of t suggested above, we have
The only case not covered yet Is when q ~ a log n for some constant a > O. It
Is easy to see that by taking t constant, both the upper and lower bounq..for
E (Mn ) vary In proportion to q. Since obviously the bounds Implicit In Theorem
4.1 remain valid when q -+ 00, we see that the only case In which there might be
a discrepancy between the rate of Increase of upper and lower bounds Is our
"third" case.
[ If
A~~i
I)
r
~ A(~')
I
If r
A,
(Jensen's Inequality).
Thus,
1.. 1..
q = max mPi ~ m
l:-S:i :-S:m
r cJ f r ) r .
The less outspoken the peakedness of f Is (I.e. the smaller f r ), the smaller the I
bound. For densities f with extremely small Infinite peaks, the functional gen-
erating function Is finite: "p( u ) = Ie ttf < 00, some u > O. For such densities,
even better bounds are obtainable as follows:
~ m "p(u).
Thus,
The value of u for which the upper bound Is minimal Is typically unknown. If
104 CHAPTER 4
we keep u fixed, then the upper bound Is 0 (log( m)), and we are almost In the
domaIn In whIch E (Mn ) ~ log n flog log n. If 'IjJ( u ) < 00 for all u > 0 then
we can find a subsequence um i 00 such that 'IjJ( um ) :S m for all m. It Is easy
to see that the maxImum of the mp;' s Is 0 (log m ), so that
E(Mn):S logn (1+0(1)). If 'IjJ(loglogm):SmO(l), then
10g((Iog n )/ q )
log n
E (Mn ) = 0 ( ). Thus, the functIonal generatIng functIon aIds In
log log log n
the establishment of sImple verIfiable condItIons for dIfferent domaIns of behavIor
of E (Mn).
Here Mn Is the maxImal cardInality In any of the buckets In the small grIds.
IntuItIvely, thIs can be seen as follows: for the orIgInal grid, Mn Is very close to
log n flog log n. For the buckets contaInIng about log n flog log n elements, we
obtaIn an estImate of E (Mn ) for the maxImal cardInality In Its sub-buckets by
applyIng the results of thIs sectIon after replacement of n by log n /Iog log n.
Thus, as a tool for reducIng the maxImal cardInality In the bucket data structure,
double bucketIng Is quIte efficIent although not perfect (because E (Mn ) -+ 00).
m m
P( max N; :SX) < II peN; :Sx):Sexp(-~ peN; >x)),x 2:0
1::;;::;m ;=1 ;=1
Case 1. If n
m logm
-+ °as n -+ 00, then
Urn P(Mn = r) = 1 - e -x ,
n-+oo
n
Case 2. ----'"-- -+ x E (0,00).
m log m
n n
Case 3. If ----'"-- -+ 00, then Mn /( - ) -+ 1 In probablilty.
m log m m
Case 1 Is by far the most Important case because usually m r-..J en. In cases 2
and 3, the asymptotic distribution of Mn Is no longer bl-atomlc because Mn
spreads Its mass more out. In fact, In case 3, Mn Is with high probablilty equal
to the value of the maximal cardlnallty If we were to distribute the n points
n
J 2:
evenly (not randomly!) over the m buckets! The difference Mn - - Is
m
r-..J log m In probablllty provided that m > nf for some f > 0.
106 CHAPTER 4
v= an + /3 max N j
::;m
l::;j
+ '1m
v = an + (3 max N j log(Nj
1~j~m
+ 1) + ,m •
or
v = an + (3 max
1~j~m
Nj 2 + ,m .
dependIng upon whether an n log n or a quadratIc sort Is used. To obtaIn a
good estImate for E(V), we need good estImates for E(Mn log (Mn+1)) and
E (Mn 2), I.e. for expected values of nonlinear functIons of Mn. ThIs provIdes
some of the motIvatIon for the analysIs of sectIon 4.3. In thIs sectIon. we will
merely apply Theorem 4.2 In the desIgn of a fast selectIon algorIthm when a
linear worst-case algorIthm Is used wIthIn buckets. The maIn result Is gIven In
Theorem 4.3: thIs theorem applies to all bounded densItIes on [0,1] wIthout
exceptIon. It Is for thIs reason that we have to appeal, once agaIn, to the Lebes-
gue densIty theorem In the proof.
Theorem 4.3.
Define for posItIve a, (3. ,.
v= an + (3 max N j
1~j~m
+ ,m ,
where X l' . . . ,Xn are lid random varIables wIth bounded densIty f on [0,1]
f (x) ~ f * < 00 for all x. Then, for any q ,m :
n
an + ,m + (3-q
m
~ E(V)
where s
If we choose
108 CHAPTER 4
then
1 1
and, In fact
Lemma 4.1.
For any bounded density I on [O,ljd, and for any sequence m -+ 00,
q = max mPi -+ I * = ess sup I·
19~m
I m (x ) = mPi ,x E Ai ,
I* ~ q = max
z
I m (x ) = ess sup Im ~ <f 1m r )1/r (any r) ,
and thus
11m Inf q r ~
m~oo
J 11m Inf 1m r
m-+oo
(Fatou's lemma)
= Jf r (Lemma 5.10)
CHAPTER 4 109
~(f*r-E
by choice of r = r (E), for arbitrary E > O. This concludes the proof of the
Lemma.
We continue now with the proof of Theorem 4.3. The starting point Is the
bound given Immediately following the proof of Theorem 4.2. The choice of t Is
asymptotically optimal when nq 1m log m -+ 00. Since q ~ 1 In all cases, this
follows If n 1m log m -+ 00, which Is for example satisfied when m ~ Vn, a
choice that will be convenient In this proof. The upper and lower bounds for
+ ,m + (J'!!:"'q.
J~
E (V), Ignoring lower order terms, are thus roughly em Because
m
q -+ f * (Lemma 4.1), the choice m = L nf *J Is again asymptotically
optimal. Resubtltutlon of this choice for m gives us our result.
(v) 9 Is convex.
(vI) 9 Is regularly varyIng at InflnIty, I.e. there exIsts a constant p 2: 0 such that
for all u E R ,
9 (x ) = Xr ,r 2: 1;
9 (x) = 1 + x log(I+X).
For the propertIes of regularly varyIng functIons, see Seneta (lg76) and Dehaan
(lg75) for example.
The maIn result of thIs sectIon Is:
Theorem 4.4.
Let 9 be a work functIon satIsfyIng (l-lv, vI), let X l' . . . ,Xn be lId random
vectors WIth bounded densIty f on [O,l]d , and let the grId have m '" en buck-
ets as n -+ 00 for some constant e > O. Then, for an as gIven above,
CHAPTER 4 111
as n -+ 00.
log n )
E (g (an Mn)) ~ g ( an (Ho (1)).
log log n
E (g (M )) ~ g[ log n ).
n log log n
00
00
00
00 _.!:...q -v log(...!!!!..)
~g(U)+ J (a+bOl;vB)me me enqOlndv
u /a.
by Lemma 5.5. If we can show that the Integral Is 0 (1), then we have
by conditions (Iv) and (vi) on g. Since € was arbitrary, we have shown the upper
bound In the theorem. By convexity of g, the lower bound follows easily from
theorem 4.1, Jensen·s Inequality and (vi):
log n ).
'" g (Ol n --=--
log log n
This leaves us with the proof of the statement that the second term Is 0 (1).
Note that q ~ f * , and that the bound of Lemma 5.5 remains valid If q Is for-
mally replaced by f *. It suffices to show that
u /a.
CHAPTER 4 113
n
--q
-....!...log(~) e m
an mu 8 2 8 -1 e a. ea. nq
um
loge )
e an nq
The first of these two terms Is asymptotically dominant. It Is easlly seen that the
first term Is
Note that ..!!!:.. remains bounded away from 0 and 00. Trivial calculations show
nq
that for our choice of u, the last expression Is 0 (1).
Consider finally all the statements InvolvIng the condItIon g (u) 2:: b· u 8 +1.
It Is clear that If the upper bounds for the Integral are 0 (g (u )) Instead of 0 (1),
then we are done. Thus, It sumces that the Integrals are 0 (u 8 +1), or 0 (an 8 +1).
This follows If
log n
whIch Is satIsfied for u = (l+E) ---:;.--
log log n
114 CHAPTER 4
Theorem 4.4 Is useful because we can basIcally take the expected value
InsIde g. Recall that by Jensen's Inequality E (g (Mn» 2: g (E (Mn )) whenever g
Is convex. The opposIte Inequality Is provIded In Theorem 4.4. I.e. E (g (Mn» Is
1+0 (1) tImes larger than g (E (Mn)). maInly because Mn concentrates Its proba-
bility mas near E (Mn ) as n -+ 00.
The condItIons on g may appear to be a bIt restrIctIve. Note however that
all condItIons are satIsfied for most work functIons found In practIce. Further-
more. If g Is sufficIently smooth. then g I (x):s a + bx 8 and g (x) 2: b * X HI
can both be satIsfied sImultaneously.
A last word about Theorem 4.4. We have only treated bounded densIties
and grids of size m ~ en. The reader should have no difficulty at all to general-
Ize the techniques for use In other cases. For lower bounds. apply Jensen's Ine-
quality and lower bounds for E (Mn). and for upper bounds. use the Inequalities
given In the proof of Theorem 4.4.
Outer layer
G e
e lID
e
•
• €I CI
• CI
•
•
Figure 4.1.
The convex hull and the outer layer of a cloud of points.
We will refer In this short section to only two outer boundaries: the convex hull
(the collection of all Xi I 8 having the property that at least one hyperplane
through Xi puts all n -1 remaining points at the same side of the hyperplane),
and the outer layer, also called the set of maximal vectors (the collection of
all Xi I 8 having the property that at least one quadrant centered at Xi contains
no X j , j ~i). Once again, we will assume that X 11 . . . ,Xn have a common
density f on [O,ljd. A grid of size m Is constructed In one of two ways, either
by partitioning [O,l]d or by partitioning the smallest closed rectangle covering
X 11 ." . . , X n • The second grid Is of course a data-dependent grid. We will go
through the mechanics of reducing the analysis for the second grid to that of the
first grid. The reduction Is that given In Devroye (1981). For SimpliCity, we will
consider only d =2.
116 CHAPTER 4
Figure 4.2.
Cell marking procedure.
For the outer layer In R 2, we find the leftmost nonempty column of rectan-
gles, and mark the northernmost occupied rectangle In this column. Let Its row
number be j (row numbers Increase when we go north). Having marked one or
more cells In column i, we mark one or more cells In column i +1 as follows: (I)
mark the cell at row number j, the highest row number marked up to that
point; (II) mark all rectangles between row number j and the northernmost occu-
pied rectangle In column i +1 provided that Its row number Is at least j +1. In
this manner a "staircase" of at most 2Vm rectangles Is marked. Also, any point
that Is a maximal vector for the north-west quadrant must be In a marked rec-
tangle. We repeat this procedure for the three other quadrants so that eventually
at most 8Vm cells are marked. Collect all pOints In the marked cells, and find
the outer layer by using standard algorithms. The naive method for example
takes quadratic time (compare each point with all other points). One can do
better by first sortIng accordIng to y-coordlnates. In an extra pass through the
sorted array, the outer layer Is found by keeping only partial extrema In the x-
direction. If heapsort or mergesort Is used, the time taken to find the outer layer
of n elements Is 0 (n log n) In the worst-case.
CHAPTER 4 117
Figure 4.3.
Finding the outer layer points for the north-west quadrant.
Thus, returning to the data-Independent grid, we see that the outer layer can be
found In time bounded by
where co' c I' C 2' C 3 > 0 are constants and B Is the collection of Indices of
marked cells. The random component does not exceed c 2 (8VmMn)2 and
c 3 8VmMn log(1+8VmMn ) respectively. Clearly, these bounds are extremely
crude. From Theorem 4.4, we recall that when m cn, f Is bounded,
r-..J
log n
E (Mn 2) r-..J ( )2, and E (Mn log(l+Mn )) r-..Jlog n. Thus, the expected
log log n
log n 2 r-
time Is O(n( )) In the form er case, and com + c1n + O(vn log n)
log log n
118 CHAPTER 4
In the latter case. In the latter case, we observe that the contribution of the
outer layer algorithm Is asymptotically negligible compared to the contribution of
the bucket data structure set-up. When we try to get rid of the boundedn~s
condition on f , we could argue as follows: fi.rst of all, not much Is lost by
replacing log( ~ N j + 1) by log(n +1) because ~ N j = O( Vm) and m ~ cn.
jEB jEB
Thus,
E( ~ N j log( ~Nj + 1»
jEB jEB
~ E(~Ni) log(n+1)
iEB
< r-
8ym log(n +1)
[-
log m n e t -1 )
- + - q ( - - ) (all t > 0)
- t m t
Jf He < 00
for SOme! > 0 (Remark 4.1). See however the Important remark below.
2/3
m [~.nqIOg(n+1»)
2c
.
2
Plugging this back Into our condition for the use of the bound, we note that It Is
satlsfl.ed In all cases since nq -+ 00. The bound becomes
* log n)
2/3
m [~nf
2c 2
and for densities with J.'r = cJ f r )l/r < 00, we can take
120 CHAPTER 4
m m
1.
r /J r log n I2/3
2r
C3 ) 3r-2
m [ - - n /J log n .
2c 2 r
This yields useful choices for r > 2. Using q ~ /J r m l/r , we obtain the further
bound
2r
C 1n +0 «n log n ) 3r-2) •
The main conclusion Is that If m Is growing slower than n , then for certain large
classes of densities, the asymptotically most Important component In the
expected time complexity Is c 1 n. For example, when Jf 4 < 00, we have
c In + o((n log n )4/5).
Of course, the same algorithm and discussion can be used for finding the
convex hull of X 1> ••• ,Xn because for arbitrary pOints there exist simple
o (n log n) and 0 (n 2) worst-case algorithms (see Graham (lg72) , Shamos
(lQ78), Preparata and Hong (lQ77) and Jarvis (lg73)) and all convex hull points
are outer layer points. In this form, the algorithm was suggested by Shamos
(lQ7Q).
Figure 4.4.
Points are ordered according to angular coordinates
for use in Graham's convex hull algorithm , bucket algorithm.
The bucket data structure can be employed In unexpected ways. For exam-
ple, to find the convex hulls In R 2, It suffices to transform X eX , ... ,Xn -x Into
polar coordinates where x Is a point known to belong to the Interior of convex
hull of X l' . . . ,Xn (note: we can always take X =X 1)' The points are sorted
according to polar angles by a bucket sort as described In chapter 2. This yields
a polygon P. All vertices of P are visited In clockwise fashion and pushed on a
stack. The stack Is popped when a non-convex-hull pOint Is Identified. In this
manner, we can construct the convex hull from P In linear time. The stack algo-
rithm Is based upon Ideas first developed by Graham (Hl72). It Is clear that the
expected tIme of the convex hull algorithm Is 0 (n) If J g 2 < 00 or
J g log+g < 00 where g Is the density of the polar angle of Xi -x, i 2: 1. For
example, when X l' . . . ,Xn have a radially symmetrIc densIty f , and x Is
taken to be the orIgin, the g Is the uniform density on [0,211"J, and the algorIthm
takes 0 (n ) expected time. When x Itself Is a random vector, one must be care-
J
ful before concluding anything about the finiteness of g 2. In any case, g Is
bounded whenever f Is bounded and has compact support.
122 CHAPTER 4
The results about E (Mn), albeit very helpful, lead sometimes to rather
crude upper bounds. Some Improvement Is possible along the lines of Theorem
4.5 (Devroye, 1985).
Theorem 4.5.
Let X l' . . . ,Xn be Independent random vectors with common density f
on [6,1]2, let the grid have m cells, and let q = max(mp v ... , mPm). Then, If
B Is the collection of Indices of marked cell In the extremal cell marking algo-
rithm,
n
-q
m
E(ENj ) ~ 8Vm n
jEB --q
1-e m
E(ENj ) ~ (8+0(1))
1
jEB
1-e c
and
for all r 2: 1.
n
npj npj -q
E(Nj ) ~ ____ < _~_ < __
m_ _
l-(l-pj)n l_e-npi _..!!...q
l-e m
CHAPTER 4 123
The first Inequality follows trIvIally from thIs. The second Inequality Is obvIous,
Cf
and the thIrd Inequallty Is based upon the fact that q ~ m l/r f r )1/r •
In the proof of Theorem 4.5, we have not used the obvIous Inequality
E Ni ~ sVmMn • If we find the outer layer or the convex hull by an
iEB
o (n log n) worst-case time method, then under the conditions of Theorem 4.5,
wIth m ~ cn , the expected tIme Is bounded by
o (n ) + 0 (In q) log n
and thIs does not Improve over the bound obtained when the crude Inequality
was used. For example, we cannot guarantee linear expected tIme behavIor when
J J
f 2 < 00, but only when a stronger condItIon such as f HE < 00 (some
f. > 0) holds. (We can of course always work on m, see remark 4.6).
There Is, however, a further possIble Improvement along the llnes of an outer
layer algorIthm of Machll and IgarashI (1gS4). Here we eIther find the outer
layers In all cells Ai' i E B, or sort all poInts In the Individual cells. Then, In
another step, the outer layer can be found In tIme llnear In the number of poInts
to be processed. Thus, there are three components In the time complexity: n +m
(set-up), E Ni log(Ni +1) (or E Ni 2) (sortIng), and E N j (final outer layer).
iEB iEB iEB
It should be clear that a sImilar strategy works too for the convex hull. The
principle Is well-known: dlvlde-and-conquer. It Is better to delegate the work to
the Individual buckets, In other words. For example, we always have
E (E Ni log(Nj + 1»
iEB
and, If we use a more refined bound from the proof of Theorem 4.5 combined
wIth Lemma 5.6,
E (E Ni log(Ni + 1»
iEB
124 CHAPTER 4
For example, when m ---> 00, n /m ---> 00, f ~ f * < 00, the bound Is
~ ~ f * log(.!!:....) .
Vm m
~ Vn _8_ q log q ,
JC
~ N j 10g(Nj + 1)
iEB
or
(~ Ni ) 10g(~Ni + 1)
iEB jEB
CHAPTER 4 125
or
otherwIse. All these terms are bounded from above by g (an Mn ) where an Is an
Integer, g Is a work functIon and Mn = max N j • Unfortunately, our analysIs
1<j<m
of Mn and g (Mn) does not apply here because the grId Is data-dependent. The
dependence Is very weak though, and nearly all the results gIven In thIs sectIon
remaIn valid If f has rectangular support [0,1]2. (Note: the rectangular support
J
of f Is the smallest rectangle R wIth the property that R f = 1.) To keep
thIngs sImple, we will only be concerned wIth an upper bound for E (g (an Mn ))
that Is of the correct order of Increase In n - In other words, we will not be con-
cerned wIth the asymptotIc constant. ThIs case can easily be dealt wIth vIa a
"shlfted grId" argument (Devroye, 1981). PartItIon [0,1]2 (or [0,1]d for that
matter) Into a grId of sIze m /2 d wIth member cells B j • Then consIder for each
U l' " " " , jd) E {0,1}d the shIfted grId wIth member cells B j (j v " " " , J"d),
1 <i <.!!!:.., where the shIft vector Is
- - 2d
128 CHAPTER 4
Shifted grid
Original grid
Figure 4.5.
Illustration of the shifted grid argument.
The key observation Is that every Aj In the original data-dependent grid Is con-
tained In some Bk (j l' . . . , jd)' Thus,
where Mn * (j v ... , j d) Is the maximal cardinality for the (j v ... , jd) grid.
Thus,
j 1, . . . , ia
Each Individual term on the right hand side Is for a data-Independent grid, for
which we can derive several types of Inequalltles. Thus, typically, the expected
value of the right hand side Is about 2d times the expected value of one term.
For example, If f Is bounded and m ~ en, then for an ,9 as In Theorem 4.4,
d log n
the expected value of the right hand side Is ~ (1+0 (1))2 9 (an ).
log log n
CHAPTER 4 127
Chapter 5
k
where ~ Pj = 1 and all Pi' 8 are nonnegative. Y 1 Is said to be binomial
j=1
(n,p 1)'
Lemma 5.1. [Moments of the multinomial distribution; see e.g. Johnson and
Kotz,11J61J]
For Integer r,8 2: 1:
Thus,
128 CHAPTERS
E(Yj Y j ) = n(n-l)pjPj ,
and
k k
E (exp( ~ tj Y j )) = ( ~ Pj exp(tj ))n
j=l j=l
Lemma 5.3. [Uniform bounds for the moments of a binomial random vari-
able.]
If Y Is binomial (n ,p) and r > 0 Is a constant, then there exist a ,b > 0
only depending upon r such that
E (yr) :s a (np Y + b .
CHAPTERS 129
E (yr) :S (np +1)' + E (yr ly~np +1) :S (np +1)' + E (zr I z ~np+l)
Because (u +v y :S 2 r -1( U r +v r), the first two terms In the last sum are not
greater than a (np )' + b for some constants a, b only depending upon r. The
last sum can be bounded from above by
~
L.J
(_k_)r (np )k-r
~~- e- np (np y .
k>np+r k-r (k-r)!
Lemma 5.4.
Let 9 (u) be a nonnegative nondecreaslng function on [0,00), and let Y be
binomial (n ,p). Then If 9 (u) = 0 on (-00,0),
E(g(Y)) >
-
..!..g(np-v'TiP).
2
1 1
If P E [O''4 J, we have E(g(Y)) ~ '2
g (Ln p J ).
130 CHAPTER 5
Thus,
E(g(Y)) >
-
..!..g (np-v'np (l-p)) > ..!..g(np-vnp).
2 - 2
The second Inequality follows directly from Theorem 2.1 In Slud (1977). Next,
n
P( max Ni 2': xl < me --;:q (enq)X
l::;i::;m mx
-Iog(..!..)x + np (..!..-l)
= e np np
where we took e t = -=-, since this choice minimizes the upper bound.
np
Note
that the upper bound remains valid when B Is binomial (n ,p I ), p' :S p. For
the multinomial distribution, we apply Bonferronl's Inequality.
.t
1=1
(np) r:~:) p i-1(1_p i n - 1)-{i-1) log(i +1)
= np E (iog(Z +2))
:S np log(E (Z )+2)
:S np log(np +2).
:S 2 e A(e'-1-t-tf)
where we used the fact that e -t :S e t -2t. The exponent e t -l-t (l+E) Is
CHAPTER 5 133
Here we used the Taylor's series with remainder term to obtain the last inequal-
Ity.
4
PCI Y-AI>>'e)
-
< - , all e > O.
- >.2 f 4
Lemma 5.10.
Let A be the class of all rectangles containing the origin of R d, and with
sides8 11 " , 1 8d satisfying aj 8j:s :s
bj for some fixed positive numbers
aj :s
bj , 1 :s :s
i d.
There exists a set D ~ R d such that "A(D C ) = 0 (D C Is the complement of
D) and
Lemma 5.11.
Let C be a fixed rectangle of R d with sides C 11 . . • 1 cd. Let {An} be a
sequence of rectangles tending to C as n ~ 00. Let A n be the collection of all
translates of An that cover the origin. Then, for any sequence of positive
numbers rn ! 0,
CHAPTERS 13S
The set on which the convergence takes place does not depend upon the choice of
the sequences An and rn •
as n -+ 00.
II J
f n - f 1 = 2 (f - f n )+ -+ 0 ,
where we used the almost everywhere convergence of f n to f and the Lebesgue
dominated convergence theorem.
136 REFERENCES
REFERENCES
A.V. Aho, J.E. Hopcroft, and J.D. Ullman, Data Structures and Algorithms,
Addison-Wesley, Reading, Mass. (HI83).
S.G. Akl and H. Meijer, "Recent advances In hybrid sortIng algorIthms,"
Utilitas Mathematica 21 pp. 325-343 (HI82).
S.G. Akl and H. Meijer, "On the average-case complexity of bucketing algo-
rIthms,"' Journal of Algorithms 3 pp. 9-13 (1982).
D.C.S. Allison and M.T. Noga, "SelectIon by dIstrIbutIve partItIonIng," In-
formation Processing Letters 11 pp. 7-8 (1980).
T.W. Anderson and S.M. Samuels, "Some Inequalltles among bInomIal and
PoIsson probabilitIes," pp. 1-12 In Proceedings of the Fifth Berkeley Sympo-
sium on Mathematical Statistics and Probability, UnIversIty of CalifornIa
Press (1965).
T. Asano, M. Edahlro, H. Imal, M. Irl, and K. Murota, "PractIcal use of
bucketing technIques In computational geometry," pp. 0-0 In Computational
Geometry, ed. G.T. Toussalnt,North-Holland (1985).
J. Beardwood, J.H. Halton, and J.M. Hammersley, "The shortest path
through many poInts,"' Proceedings of the Cambridge Philosophical Society
55 pp. 299-327 (1959).
R. Bellman, "Dynamlc programmIng treatment of the travelling salesman
problem," Journal of the ACM 9 pp. 61-63 (1962).
J.L. Bentley, "Solutions to Klee's rectangle problems," Technical Report,
Department of Computer SCience, Carnegie-Mellon University, Pittsburgh,
PA. (1977).
J.L. Bentley, D.F. Stanat, and E.H. Wlillams, "The complexity of ftxed-
radius near neighbor searching,"' Information Processing Letters 6 pp. 209-
212 (1977).
J.L. Bentley and J.H. Friedman, "Data structures for range searching,"'
ACM Computing Surveys 11 pp. 397-409 (1979).
J.L. Bentley, B.W. WeIde, and A.C. Yao, "OptImal expected-time algorithms
for closest point problems,"' ACM Transactions on Mathematical Software
6 pp. 563-580 (1980),
REFERENCES 137
J.L. Bentley and D. Wood, "An optimal worst-case algorithm for reporting
Intersections of rectangles;' IEEE Transactions on Computers C-29 pp.
571-577 (1980).
M. Blum, R.W. Floyd, V. Pratt, R.L. Rivest, and R.E. Tarjan, "Time
bounds for selection," Journal of Computers and System Sciences 7 pp. 448-
461 (1973).
D. Cheriton and R.E. Tarjan, "Finding minimum spanning trees," SIAM
Journal on Computing 5 pp. 724-742 (1976).
H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis
based on the sum of observations," Annals of Mathematical Statistics 23 pp.
493-507 (1952).
Y.S. Chow and H. Teicher, Probability Theory, Springer-Verlag, New York,
N.Y. (1978).
N. Chrlstofides, "Worst-case analysis of a new heuristic for the traveling
salesman problem," Symposium on Algorithms and Complexity, Department
of Computer Science, Carnegie-Mellon University (1976).
L. Dehaan, "On Regular Variation and Its Application to the Weak Conver-
gence of Sample Extremes," Mathematical Centre Tracts 32, Mathematlsch
Centrum, Amsterdam (1975).
L. Devroye and T. KlIncsek, "Average time behavior of distributive sorting
algorithms," Computing 26 pp. 1-7 (1981).
L. Devroye, "On the average complexity of some bucketing algorithms,"
Computers and Mathematics with Applications 7 pp. 407-412 (1981).
L. Devroye, "On the expected time required to construct the outer layer of a
set of points," Information Processing Letters 0 pp. 0-0 (1985).
L. Devroye, "Expected time analysis of algorithms In computational
geometry : a survey," pp. 0-0 In Computational Geometry, ed. G.T.
Toussalnt,North-Holland (1985).
L. Devroye, "The expected length of the longest probe sequence when the
distribution Is not uniform," Journal of Algorithms 6 pp. 1-9 (1985).
L. Devroye and F. Machell, "Data structures In kernel density estimatIon,"
IEEE Transactions on Pattern Analysis and Machine Intelligence 7 pp. 360-
366 (1985).
D. DobkIn and R.J. LIpton, "MultIdImensIonal searchIng problems," SIAM
Journal on Computing 5 pp. 181-186 (1976).
W. Doboslewlcz, "SortIng by dIstrIbutIve partItionIng," Information Process-
ing Letters 7 pp. 1-6 (1978).
138 REFERENQES
(1g75).
M.1. Shamos and J.L. Bentley, "Optimal algorithms for structuring geo-
graphic data;' Proceedings of the Symposium on Topological Data Structures
for Geographic Information Systems, pp. 43-51 (lg77).
M.1. Shamos, "Computational Geometry," Ph.D. Dissertation, Yale Universi-
ty, New Haven, Connecticut (lg7S).
M.1. Shamos, Ig7g.
INDEX