Documente Academic
Documente Profesional
Documente Cultură
Kobe University,
1-1, Rokkodai, Nada-ku, Kobe 657-8501 Japan,
Email: ky@kobe-u.ac.jp
Figure 1: An example of the data structures in base prototypes are representative objects which are
TLAESA. used to avoid some explorations of the tree.
The ideal selection of them is that each object is
as far away as possible from other objects. In (Micó
2 TLAESA et al. 1994), a greedy algorithm is proposed for this
selection. This algorithm chooses an object that max-
TLAESA uses two kinds of data structures: the dis- imizes the sum of distances from the other base pro-
tance matrix and the search tree. The distance matrix totypes which have already been selected. In (Micó &
stores the distances from each object to some selected Oncina 1998), another algorithm is proposed, which
objects. The search tree manages hierarchically all chooses an object that maximizes the minimum dis-
objects. During the execution of the search algorithm, tance to the preselected base prototypes. (Micó &
the search tree is traversed and the distance matrix Oncina 1998) shows that the latter algorithm is more
is used to avoid exploring some branches. effective than the former one. Thus, we use the later
algorithm for the selection of base prototypes.
The search efficiency depends not only on the se-
2.1 Data Structures lection of base prototypes but also on the number
We explain the data structures in TLAESA. Let P of them. There is a trade-off between the search
be the set of all objects and B be a subset consisting efficiency and the size of distance matrix, i.e. the
of selected objects called base prototypes. The dis- memory capacity. The experimental results in (Micó
tance matrix M is a two-dimensional array that stores et al. 1994) show that the optimal number of base
the distances between all objects and base prototypes. prototypes depends on the dimensionality dm of the
The search tree T is a binary tree such that each node space. For example, the optimal numbers are 3, 16
t corresponds to a subset St ⊂ P . Each node t has and 24 if dm = 2, 4 and 8, respectively. The exper-
a pointer to the representative object pt ∈ St which imental results also show that the optimal number
is called a pivot, a pointer to a left child node l, a does not depend on the number of objects.
pointer to a right child node r and a covering radius
rt . The covering radius is defined as 2.3 Search Algorithm
rt = max d(p, pt ). (1) The search algorithm follows the branch and bound
p∈St strategy. It traverses the search tree T in the depth
first order. The distance matrix M is referred when-
The pivot pr of r is defined as pr = pt . On the other ever each node is visited in order to avoid unnecessary
hand, the pivot pl of l is determined so that traverse of the tree T . The distance are computed
only when a leaf node is reached.
pl = argmax d(p, pt ). (2) Given a query object q, the distance between q and
p∈St the base prototypes are computed. These results are
stored in an array D. The object which is the closest
Hence, we have the following equality: to q in B is selected as the nearest neighbour candi-
date pmin , and the distance d(q, pmin ) is recorded as
rt = d(pt , pl ). (3) dmin . Then, the traversal of the search tree T starts
at the root node. The lower bound for the left child
St is partitioned into two disjoint subsets Sr and Sl node l is calculated whenever each node t is reached if
as follows: it is not a leaf node. The lower bound of the distance
between q and an object x is defined as
Sr = {p ∈ St |d(p, pr ) < d(p, pl )},
(4)
Sl = St − Sr . gx = max |d(q, b) − d(b, x)|. (5)
b∈B
1: if t is a leaf then
pt if gpt < dmin then
q
2:
3: d = d(q, pt ) {distance computation}
rt 4: if d < dmin then
5: pmin = pt , dmin = d
6: end if
Figure 3: Pruning Process. 7: end if
8: else
9: r is a right child of t
procedure NN search(q) 10: l is a left child of t
11: gpr = gpt
1: t ← root of T 12: gpl = max |(D[b] − M [b, pt ])|
b∈B
2: dmin = ∞, gpt = 0
3: for b ∈ B do 13: if gpl < gpr then
4: D[b] = d(q, b) 14: if dmin + rl > gpl then
5: if D[b] < dmin then 15: search(l, gpl , pmin , dmin )
6: pmin = b, dmin = D[b] 16: end if
7: end if 17: if dmin + rr > gpr then
8: end for 18: search(r, gpr , pmin , dmin )
9: gpt = max |(D[b] − M [b, pt ])| 19: end if
b∈B 20: else
10: search(t, gpt , q, pmin , dmin ) 21: if dmin + rr > gpr then
11: return pmin 22: search(r, gpr , pmin , dmin )
23: end if
24: if dmin + rl > gpl then
Figure 4: Algorithm for an NN search in TLAESA. 25: search(l, gpl , pmin , dmin )
26: end if
27: end if
We explain the pruning process. Fig. 3 shows the 28: end if
pruning situation. Let t be the current node. If the
inequality
dmin + rt < d(q, pt ) (6) Figure 5: A recursive procedure for an NN search in
is satisfied, we can see that no object exists in St TLAESA.
which is closer to q than pmin and the traversal to
node t is not necessary. Since gpt ≤ d(q, pt ), Eq. (6)
can be replaced with
dmin + rt < gpt . (7)
Figs. 4 and 5 show the details of the search
algorithm(Micó et al. 1996).
Figure 13: The number of distance computations in Figure 15: Error rate in 10-NN searches.
1-NN searches.
160
140
270
240 120
210 100
180 80
150
60
120
90 40
60 20 Ak-LAESA
30 TLAESA Proposed
Proposed 0
0 0 0.2 0.4 0.6 0.8 1
0 2000 4000 6000 8000 10000
α
Number of Objects
Figure 16: Relation of the number of distance com-
Figure 14: The number of distance computations in putations to the value of α in 10-NN searches.
10-NN searches.
Figure 17: The distribution of the approximate solu- Figure 19: The distribution the approximate solution
tion by Ak-LAESA to the optimal solution. by the proposed method with α = 0.9 to the optimal
solution.
Distance to the k th Approximate Solution
1.6
References
1.4
1.2 Ciaccia, P., Patella, M. & Zezula, P. (1997), M-tree:
An efficient access method for similarity search
1 in metric spaces, in ‘Proceedings of the 23rd
0.8 International Conference on Very Large Data
Bases (VLDB’97)’, pp. 426–435.
0.6
Hjaltason, G. R. & Samet, H. (2003), ‘Index-driven
0.4
similarity search in metric spaces’, ACM Trans-
0.2 actions on Database Systems 28(4), 517–580.
0 Micó, L. & Oncina, J. (1998), ‘Comparison of fast
0 0.2 0.4 0.6 0.8 nearest neighbour classifiers for handwritten
Distance to the k th Optimal Solution character recognition’, Pattern Recognition Let-
ters 19(3-4), 351–356.
Figure 18: The distribution the approximate solution Micó, L., Oncina, J. & Carrasco, R. C. (1996), ‘A
by the proposed method with α = 0.5 to the optimal fast branch & bound nearest neighbour classi-
solution. fier in metric spaces’, Pattern Recognition Let-
ters 17(7), 731–739.
Micó, M. L., Oncina, J. & Vidal, E. (1994), ‘A new
very low error rate. Moreover, the accuracy of its ap- version of the nearest-neighbour approximating
proximate solutions is superior to that of Ak-LAESA. and eliminating search algorithm (AESA) with
linear preprocessing time and memory require-
6 Conclusions ments’, Pattern Recognition Letters 15(1), 9–17.
Moreno-Seco, F., Micó, L. & Oncina, J. (2002),
In this paper, we proposed some improvements of ‘Extending LAESA fast nearest neighbour algo-
TLAESA. In order to reduce the number of distance rithm to find the k-nearest neighbours’, Lecture
computations in TLAESA, we improved the search Notes in Computer Science - Lecture Notes in
algorithm to best first order from depth first order Artificial Intelligence 2396, 691–699.
and the tree structure to a multiway tree from a bi-
nary tree. In the 1-NN searches and 10-NN searches Moreno-Seco, F., Micó, L. & Oncina, J. (2003), ‘A
in a 8-dimensional space, the proposed method re- modification of the LAESA algorithm for ap-
duced about 40% of distance computations. We then proximated k-NN classification’, Pattern Recog-
proposed the selection method of root object in the nition Letters 24(1-3), 47–53.
search tree. This improvement is very simple but is
effective to reduce the number of accesses to the dis- Navarro, G. (2002), ‘Searching in metric spaces
tance matrix. Finally, we extended our method to an by spatial approximation’, The VLDB Journal
approximation k-NN search algorithm that can en- 11(1), 28–46.
sure the quality of solutions. The approximate so- Rico-Juan, J. R. & Micó, L. (2003), ‘Comparison
lutions of the proposed method are suppressed by α1 of AESA and LAESA search algorithms using
times of the optimal solutions. Experimental results string and tree-edit-distances’, Pattern Recogni-
show that the proposed method can reduce the num- tion Letters 24(9-10), 1417–1426.
ber of distance computations with very low error rate Vidal, E. (1986), ‘An algorithm for finding nearest
by selecting the appropriate value of α, and that the neighbours in (approximately) constant average
accuracy of the solutions is superior to Ak-LAESA. time’, Pattern Recognition Letters 4(3), 145–157.
From these viewpoints, the method presented in this
paper is very effective when the distance computa- Yianilos, P. N. (1993), Data structures and algo-
tions are time-consuming. rithms for nearest neighbor search in general
metric spaces, in ‘SODA ’93: Proceedings of the
fourth annual ACM-SIAM Symposium on Dis-
crete algorithms’, pp. 311–321.