Sunteți pe pagina 1din 7

Improvements of TLAESA Nearest Neighbour Search Algorithm

and Extension to Approximation Search


Ken Tokoro Kazuaki Yamaguchi Sumio Masuda

Kobe University,
1-1, Rokkodai, Nada-ku, Kobe 657-8501 Japan,
Email: ky@kobe-u.ac.jp

Abstract For the NN and k-NN searches in metric spaces,


some methods that can manage a large set of ob-
Nearest neighbour (NN) searches and k nearest neigh- jects efficiently have been introduced(Hjaltason &
bour (k-NN) searches are widely used in pattern Samet 2003). They are categorized into two groups.
recognition and image retrieval. An NN (k-NN) The methods in the first group manage objects with
search finds the closest object (closest k objects) to a a tree structure such as vp-tree(Yianilos 1993), M-
query object. Although the definition of the distance tree(Ciaccia, Patella & Zezula 1997), sa-tree (Navarro
between objects depends on applications, its compu- 2002) and so forth. The methods in the second group
tation is generally complicated and time-consuming. manage objects with a distance matrix, which stores
It is therefore important to reduce the number of dis- the distances between objects. The difference be-
tance computations. TLAESA (Tree Linear Approx- tween two groups is caused by their approaches to
imating and Eliminating Search Algorithm) is one of fast searching. The former aims at reducing the com-
the fastest algorithms for NN searches. This method putational tasks in the search process by managing
reduces distance computations by using a branch and objects effectively. The latter works toward reducing
bound algorithm. In this paper we improve both the the number of distance computations because gen-
data structure and the search algorithm of TLAESA. erally their costs are higher than the costs of other
The proposed method greatly reduces the number of calculations. In this paper we consider the latter ap-
distance computations. Moreover, we extend the im- proach.
proved method to an approximation search algorithm AESA (Approximating and Eliminating Search
which ensures the quality of solutions. Experimental Algorithm)(Vidal 1986) is one of the fastest algo-
results show that the proposed method is efficient and rithms for NN searches in the distance matrix group.
finds an approximate solution with a very low error The number of distance computations is bounded by
rate. a constant, but the space complexity is quadratic.
LAESA (Linear AESA)(Micó, Oncina & Vidal 1994)
Keywords: Nearest Neighbour Search, k Nearest was introduced in order to reduce this large space
Neighbour Search, TLAESA, Approximation Search, complexity. Its space complexity is linear and its
Distance Computaion. search performance is almost the same as that of
AESA. Although LAESA is more practical than
1 Introduction AESA, it is impractical for a large database be-
cause calculations other than distance computations
NN and k-NN searches are techniques which find the increase. TLAESA (Tree LAESA)(Micó, Oncina &
closest object (closest k objects) to a query object Carrasco 1996) is an improvement of LAESA and re-
from a database. These are widely used in pattern duces the time complexity to sublinear. It uses two
recognition and image retrieval. We can see exam- kinds of data structures: a distance matrix and a bi-
ples of their applications to handwritten character nary tree, called a search tree.
recognition in (Rico-Juan & Micó 2003) and (Micó In this paper, we propose some improvements
& Oncina 1998), and so on. In this paper we consider of the search algorithm and the data structures of
NN (k-NN) algorithms that can work in any metric TLAESA in order to reduce the number of distance
space. For any x, y, z in a metric space, the distance computations. The search algorithm follows the best
function d(·, ·) satisfies the following properties: first algorithm. The search tree is transformed to a
multiway tree from a binary tree. We also improve
d(x, y) = 0 ⇔ x = y, the selection method of the root object in the search
tree. These improvements are simple but very effec-
d(x, y) = d(y, x), tive. We then introduce the way to perform a k-NN
d(x, z) ≤ d(x, y) + d(y, z). search in the improved TLAESA. Moreover, we pro-
pose an extension to an approximation search algo-
Although the definition of the distance depends on rithm that can ensure the quality of solutions.
applications, its calculation is generally complicated This paper is organized as follows. In section 2,
and time-consuming. We particularly call the calcu- we describe the details of the search algorithm and
lation of d(·, ·) a distance computation. the data structures of TLAESA. In section 3, we pro-
pose some improvements of TLAESA. In section 4,
c
Copyright °2006, Australian Computer Society, Inc. This pa- we present an extension to an approximation search
per appeared at Twenty-Ninth Australasian Computer Science algorithm. In section 5, we show some experimental
Conference (ACSC2006), Hobart, Tasmania, Australia, Jan- results. Finally, in section 6, we conclude this paper.
uary 2006. Conferences in Research and Practice in Informa-
tion Technology, Vol. 48. Vladimir Estivill-Castro and Gill
Dobbie, Ed. Reproduction for academic, not-for profit pur-
poses permitted provided this text is included.
Figure 2: Lower bound.

Figure 1: An example of the data structures in base prototypes are representative objects which are
TLAESA. used to avoid some explorations of the tree.
The ideal selection of them is that each object is
as far away as possible from other objects. In (Micó
2 TLAESA et al. 1994), a greedy algorithm is proposed for this
selection. This algorithm chooses an object that max-
TLAESA uses two kinds of data structures: the dis- imizes the sum of distances from the other base pro-
tance matrix and the search tree. The distance matrix totypes which have already been selected. In (Micó &
stores the distances from each object to some selected Oncina 1998), another algorithm is proposed, which
objects. The search tree manages hierarchically all chooses an object that maximizes the minimum dis-
objects. During the execution of the search algorithm, tance to the preselected base prototypes. (Micó &
the search tree is traversed and the distance matrix Oncina 1998) shows that the latter algorithm is more
is used to avoid exploring some branches. effective than the former one. Thus, we use the later
algorithm for the selection of base prototypes.
The search efficiency depends not only on the se-
2.1 Data Structures lection of base prototypes but also on the number
We explain the data structures in TLAESA. Let P of them. There is a trade-off between the search
be the set of all objects and B be a subset consisting efficiency and the size of distance matrix, i.e. the
of selected objects called base prototypes. The dis- memory capacity. The experimental results in (Micó
tance matrix M is a two-dimensional array that stores et al. 1994) show that the optimal number of base
the distances between all objects and base prototypes. prototypes depends on the dimensionality dm of the
The search tree T is a binary tree such that each node space. For example, the optimal numbers are 3, 16
t corresponds to a subset St ⊂ P . Each node t has and 24 if dm = 2, 4 and 8, respectively. The exper-
a pointer to the representative object pt ∈ St which imental results also show that the optimal number
is called a pivot, a pointer to a left child node l, a does not depend on the number of objects.
pointer to a right child node r and a covering radius
rt . The covering radius is defined as 2.3 Search Algorithm
rt = max d(p, pt ). (1) The search algorithm follows the branch and bound
p∈St strategy. It traverses the search tree T in the depth
first order. The distance matrix M is referred when-
The pivot pr of r is defined as pr = pt . On the other ever each node is visited in order to avoid unnecessary
hand, the pivot pl of l is determined so that traverse of the tree T . The distance are computed
only when a leaf node is reached.
pl = argmax d(p, pt ). (2) Given a query object q, the distance between q and
p∈St the base prototypes are computed. These results are
stored in an array D. The object which is the closest
Hence, we have the following equality: to q in B is selected as the nearest neighbour candi-
date pmin , and the distance d(q, pmin ) is recorded as
rt = d(pt , pl ). (3) dmin . Then, the traversal of the search tree T starts
at the root node. The lower bound for the left child
St is partitioned into two disjoint subsets Sr and Sl node l is calculated whenever each node t is reached if
as follows: it is not a leaf node. The lower bound of the distance
between q and an object x is defined as
Sr = {p ∈ St |d(p, pr ) < d(p, pl )},
(4)
Sl = St − Sr . gx = max |d(q, b) − d(b, x)|. (5)
b∈B

Note that if t is a leaf node, St = {pt } and rt = 0.


Fig. 1 shows an example of the data structures. See Fig. 2. Recall that d(q, b) was precomputed be-
fore the traversals and was stored in D. In addition,
the value d(b, x) was also computed during the con-
2.2 Construction of the Data Structures struction process and stored in the distance matrix
We first explain the construction process of the search M . Therefore, gx is calculated without any actual
tree T . The pivot pt of the root node t is randomly distance computations. The lower bound gx is not ac-
selected and St is set to P . The pivot pl of the left tual distance d(q, x). Thus, it does not ensure that the
child node and the covering radius rt are defined by number of visited nodes in the search becomes mini-
Eqs. (2) and (3). The pivot pr of the right child node mum. Though, this evaluation hardly costs, hence it
is set to pt . St is partitioned into Sr and Sl by Eq. is possible to search fast. The search process accesses
(4). These operations are recursively repeated until the left child node l if gpl ≤ gpr , or the right child
|St | = 1. node r if gpl > gpr . When a leaf node is reached,
The distance matrix M is constructed by selecting the distance is computed and both pmin and dmin are
base prototypes. This selection is important because updated if the distance is less than dmin .
St p min procedure search(t, gpt , q, pmin , dmin )

1: if t is a leaf then
pt if gpt < dmin then
q
2:
3: d = d(q, pt ) {distance computation}
rt 4: if d < dmin then
5: pmin = pt , dmin = d
6: end if
Figure 3: Pruning Process. 7: end if
8: else
9: r is a right child of t
procedure NN search(q) 10: l is a left child of t
11: gpr = gpt
1: t ← root of T 12: gpl = max |(D[b] − M [b, pt ])|
b∈B
2: dmin = ∞, gpt = 0
3: for b ∈ B do 13: if gpl < gpr then
4: D[b] = d(q, b) 14: if dmin + rl > gpl then
5: if D[b] < dmin then 15: search(l, gpl , pmin , dmin )
6: pmin = b, dmin = D[b] 16: end if
7: end if 17: if dmin + rr > gpr then
8: end for 18: search(r, gpr , pmin , dmin )
9: gpt = max |(D[b] − M [b, pt ])| 19: end if
b∈B 20: else
10: search(t, gpt , q, pmin , dmin ) 21: if dmin + rr > gpr then
11: return pmin 22: search(r, gpr , pmin , dmin )
23: end if
24: if dmin + rl > gpl then
Figure 4: Algorithm for an NN search in TLAESA. 25: search(l, gpl , pmin , dmin )
26: end if
27: end if
We explain the pruning process. Fig. 3 shows the 28: end if
pruning situation. Let t be the current node. If the
inequality
dmin + rt < d(q, pt ) (6) Figure 5: A recursive procedure for an NN search in
is satisfied, we can see that no object exists in St TLAESA.
which is closer to q than pmin and the traversal to
node t is not necessary. Since gpt ≤ d(q, pt ), Eq. (6)
can be replaced with
dmin + rt < gpt . (7)
Figs. 4 and 5 show the details of the search
algorithm(Micó et al. 1996).

3 Improvements of TLAESA Figure 6: A case in which the search algorithm in


TLAESA does not work well.
In this section, we propose some improvements of
TLAESA in order to reduce the number of distance
computations.
We show a method to construct the tree structure
3.1 Tree Structure and Search Algorithm in Fig. 7. We first select randomly the pivot pt of
the root node t and set St to P . Then we execute the
If we can evaluate the lower bounds g in the ascending procedure makeTree(t, pt , St ) in Fig. 7.
order of their values, the search algorithm runs very We explain the search process in the proposed
fast. However, this is not guaranteed in TLAESA structure. The proposed method maintains a priority
since the evaluation order is decided according to the queue Q that stores triples (node t, lower bound gpt ,
tree structure. We show such an example in Fig. 6. covering radius rt ) in the increasing order of gpt − rt .
In this figure, u, v and w are nodes. If gpv < gpw , Given a query object q, we calculate the distances be-
it is desirable that v is evaluated before w. But, if tween q and base prototypes and store their values in
gpv > gpu , w might be evaluated before v. D. Then the search process starts at the root of T .
We propose the use of a multiway tree and the The following steps are recursively repeated until Q
best first order search instead of a binary tree and becomes empty. When t is a leaf node, the distance
the depth first search. During the best first search d(q, pt ) is computed if gpt < dmin . If t is not a leaf
process, we can traverse preferentially a node whose node and its each child node t0 satisfies the inequality
subset may contain the closest object. Moreover, we
can evaluate more nodes at one time by using of the gpt < rt0 + dmin , (8)
multiway tree. The search tree in TLAESA has many
nodes which have a pointer to the same object. In the the lower bound gpt0 is calculated and a triple
proposed structure, we treat such nodes as one node.
Each node t corresponds to a subset St ⊂ P and has (t0 , gpt0 , rt0 ) is added to Q. Figs. 8 and 9 show the
a pivot pt , a covering radius rt = max d(p, pt ) and details of the algorithm.
p∈St
pointers to its children nodes.
procedure makeTree(t, pt , St ) procedure search(t, gpt , q, pmin , dmin )

1: t0 ← new child node of t 1: if t is a leaf then


2: if |St | = 1 then 2: if gpt < dmin then
3: pt0 = pt and St0 = {pt0 } 3: d = d(q, pt ) {distance computation}
4: else 4: if d < dmin then
5: pt0 = argmax d(p, pt ) 5: pmin = pt , dmin = d
p∈St 6: end if
6: St0 = {p ∈ St |d(p, pt0 ) < d(p, pt )} 7: end if
7: St = St − St0 8: else
8: makeTree(t0 , pt0 , St0 ) 9: for each child t0 of t do
9: makeTree(t, pt , St ) 10: if gpt < rt0 + dmin then
10: end if 11: gpt0 = max |(D[b] − M [b, pt0 ])|
b∈B
12: Q ← Q ∪ {(t0 , gpt0 , rt0 )}
Figure 7: Method to construct the proposed tree 13: end if
structure. 14: end for
15: end if
procedure NN search(q)

1: t ← root of T Figure 9: A procedure used in the proposed algorithm


2: dmin = ∞, gpt = 0 for an NN search.
3: for b ∈ B do
4: D[b] = d(q, b) procedure k-NN search(q, k)
5: if D[b] < dmin then
6: pmin = b, dmin = D[b] 1: t ← root of T
7: end if 2: dmin = ∞, gpt = 0
8: end for 3: for b ∈ B do
9: gt = max |(D[b] − M [b, pt ])| 4: D[b] = d(q, b)
b∈B
5: if D[b] < dmin then
10: Q ← {(t, gpt , rt )} 6: V ← V ∪ {(b, D[b])}
11: while Q is not empty do do 7: if |V | = k + 1 then
12: (t, gpt , rt ) ← element in Q 8: remove (k + 1)th pair from V
13: search(t, gpt , q, pmin , dmin ) 9: end if
14: end while 10: if |V | = k then
15: return pmin 11: (c, d(q, c)) ← kth pair of V
12: dmin = d(q, c)
13: end if
Figure 8: Proposed algorithm for an NN search. 14: end if
15: end for
16: gpt = max |(D[b] − M [b, pt ])|
3.2 Selection of Root Object b∈B
17: Q ← {(t, gpt , rt )}
We focus on base prototypes in order to reduce node 18: while Q is not empty do
accesses. The lower bound of the distance between a 19: (t, gpt , rt ) ← element in Q
query q and a base prototype b is 20: search(t, gpt , q, V, dmin , k)
21: end while
gb = max |d(q, b) − d(b, b)| 22: return k objects ← V
b∈B
= d(q, b).
Figure 10: Proposed algorithm for a k-NN search.
This value is not an estimated distance but an actual
distance.
If we can use an actual distance in the search pro- d(q, p). dmin is defined as
cess, we can evaluate more effectively which nodes
are close to q. This fact means that the search is effi- ½
ciently performed if many base prototypes are visited ∞ (|V | < k)
dmin = (9)
in the early stage. In other words, it is desirable that d(q, c) (|V | = k)
more base prototypes are arranged in the upper part
of the search tree. Thus, in the proposed algorithm, where c is the object of the kth pair in V .
we choose the first base prototype b1 as the root ob- We show in Figs. 10 and 11 the details of the k-
ject. NN search algorithm. The search strategy essentially
follows the algorithm in Figs. 8 and 9, but the k-NN
3.3 Extension to a k-NN Search search algorithm uses V instead of pmin .
(Moreno-Seco et al. 2002) shows that the optimal
LAESA was developed to perform NN searches and number of base prototypes depends on not only the
(Moreno-Seco, Micó & Oncina 2002) extended it so dimensionality of the space but also the value of k and
that k-NN searches can be executed. In this section, that the number of distance computations increases
we extend the improved TLAESA to a k-NN search as k increases.
algorithm. The extension is simple modifications of
the algorithm described above. We use a priority 4 Extension to an Approximation Search
queue V for storing k nearest neighbour candidates
and modify the definition of dmin . V stores pairs In this section, we propose an extension to an ap-
(object p, distance d(q, p)) in the increasing order of proximation k-NN search algorithm which ensures the
procedure search(t, gpt , q, V, dmin , k) 300

Number of Distance Computations


1: if t is a leaf then 250
2: if gpt < dmin then
3: d = d(q, pt ) {distance computation} 200
4: if d < dmin then
5: V ← V ∪ {(pt , d(q, pt ))} 150
6: if |V | = k + 1 then
7: remove (k + 1)th pair from V
100
8: end if
9: if |V | = k then TLAESA(1-NN)
10: (c, d(q, c)) ← kth pair of V 50 TLAESA(10-NN)
Proposed(1-NN)
11: dmin = d(q, c) Proposed(10-NN)
12: end if 0
0 10 20 30 40 50 60 70 80 90 100 110 120
13: end if
14: end if Number of Base Prototypes
15: else
16: for each child t0 of t do Figure 12: Relation of the number of distance com-
17: if gpt < rt0 + dmin then putations to the number of base prototypes.
18: gpt0 = max |(D[b] − M [b, pt0 ])|
b∈B
19: Q ← Q ∪ {(t0 , gpt0 , rt0 )} 1-NN 10-NN
20: end if TLAESA 40 80
21: end for Proposed 25 60
22: end if
Table 1: The optimal number of base prototypes.
Figure 11: A procedure used in the proposed algo-
rithm for a k-NN search.
5.1 The Optimal Number of Base Prototypes
quality of solutions. Consider the procedure in Fig. We first determined experimentally the optimal num-
11. We replace the 4th line with ber of base prototypes. The number of objects
was fixed to 10000. We executed 1-NN and 10-NN
if d < α · dmin then searches for various numbers of base prototypes, and
counted the number of distance computations. Fig.
and the 17th line with 12 shows the results. From this figure, we chose the
number of base prototypes as shown in Table. 1.
if gt < rt0 + α · dmin then We can see that the values in the proposed method
are fewer than those in TLAESA. This means that
where α is real number such that 0 < α ≤ 1. The the proposed method can achieve better performance
pruning process gets more efficient as these conditions with smaller size of distance matrix. We used the
become tighter. values in Table. 1 in the following experiments.
The proposed method ensures the quality of solu-
tions. We can show the approximation ratio to an 5.2 Evaluation of Improvements
optimal solution using α. Let a be the nearest neigh-
bour object and a0 be the nearest neighbour candi- We tested the effects of our improvements described
date object. If our method misses a and give a0 as in 3.1 and 3.2. We counted the numbers of distance
the answer, the equation computations in 1-NN and 10-NN searches for various
numbers of objects. The results are shown in Figs.
g(q, a) ≥ α · d(q, a0 ) (10) 13 and 14. Similar to TLAESA, the number of the
distance computations in the proposed method does
is satisfied. Then a will be eliminated from targeted not depend on the number of objects. In both of 1-NN
objects. Since g(q, a) ≤ d(q, a), we can obtain the and 10-NN searches, it is about 60% of the number of
following equation: distance computations in TLAESA. Thus we can see
that our improvements are very effective.
1 In the search algorithms of TLAESA and the pro-
d(q, a0 ) ≤ d(q, a). (11) posed methods, various calculations are performed
α
other than distance computations. The costs of the
Thus, the approximate solution are suppressed by 1 major part of such calculations are proportional to
α
times of the optimal solution. the number of accesses to the distance matrices. We
therefore counted the numbers of accesses to the dis-
tance matrices. We examined the following two cases:
5 Experiments
(i) TLAESA vs. TLAESA with the improvement of
In this section we show some experimental results and selection of the root object.
discuss them. We tested on an artificial set of random
points in the 8-dimensional euclidean space. We also (ii) Proposed method only with improvement of tree
used the euclidean distance as the distance function. structure and search algorithm vs. proposed
We evaluated the number of distance computations method only with the improvement of selection
and the number of accesses to the distance matrix in of the root object.
1-NN and 10-NN searches. In the case (i), the number of accesses to the distance
matrix is reduced by 12% in 1-NN searches and 4.5%
in 10-NN searches. In the case (ii), it is reduced by
6.8% in 1-NN searches and 2.7% in 10-NN searches.
100 100

Number of Distance Computations


Ak-LAESA
90 90 Proposed
80 80
70 70

Error Rate Ε[%]


60 60
50 50
40 40
30 30
20 20
10 TLAESA 10
Proposed
0 0
0 2000 4000 6000 8000 10000 0 0.2 0.4 0.6 0.8 1
Number of Objects α

Figure 13: The number of distance computations in Figure 15: Error rate in 10-NN searches.
1-NN searches.

160

Number of distance computations


300
Number of Distance Computations

140
270
240 120
210 100
180 80
150
60
120
90 40
60 20 Ak-LAESA
30 TLAESA Proposed
Proposed 0
0 0 0.2 0.4 0.6 0.8 1
0 2000 4000 6000 8000 10000
α
Number of Objects
Figure 16: Relation of the number of distance com-
Figure 14: The number of distance computations in putations to the value of α in 10-NN searches.
10-NN searches.

Ak-LAESA. In particular, the error rate of the pro-


Thus we can see that this improvement about selec- posed method is almost 0 in range α ≥ 0.9. From two
tion of the root object is effective. figures, we can control the error rate and the number
of distance computations by changing the value of α.
5.3 Evaluation of Approximation Search For example, the proposed method with α = 0.9 re-
duces abount 28.6% of distance computations and its
We tested the performance of the approximation error rate is almost 0.
search algorithm. We compared the proposed method Then we examined the accuracy of the approx-
to Ak-LAESA, which is the approximation search al- imate solutions. We used α = 0.5 for the pro-
gorithm proposed in (Moreno-Seco, Micó & Oncina posed method because the error rate of the proposed
2003). Each time a distance is computed in Ak- method with α = 0.5 is equal to the one of Ak-
LAESA, the nearest neighbour candidate is updated LAESA. We performed 10-NN searches 10000 times
and its value is stored. When the nearest neighbour for each method and examined the distribution of kth
object is found, the best k objects are chosen from the approximate solution to kth optimal solution. We
stored values. In Ak-LAESA, the number of distance show the results in Figs. 17 and 18. In each figure,
computations of the k-NN search is exactly the same x axis represents the distance between a query ob-
as that of the NN search. ject q and the kth object in the optimal solution. y
To compare the proposed method with Ak- axis shows the distance between q and the kth ob-
LAESA, we examined how many objects in the ap- ject in the approximate solution. The point near the
proximate solutions exist in the optimal solutions. line y = x represents that kth approximate solution is
Thus, we define the error rate E as follows: very close to kth optimal solution. In Fig. 17, many
points are widely distributed. In the worst case, some
|{xi |xi ∈
/ Opt, i = 1, 2, · · · , k}| appriximate solutions reach about 3 times of the op-
E[%] = × 100 (12)
k timal solution. From these figures, we can see that
the accuracy of solution by the proposed method is
where {x1 , x2 , · · · , xk } is a set of k objects which are superior to the one by Ak-LAESA. We also show the
obtained by an approximation algorithm and Opt is result with α = 0.9 in Fig. 19. Most points exist near
a set of k closest objects to the query object. the line y = x.
Fig. 15 shows the error rate when the value of α is Though Ak-LAESA can reduce drastically the
changed in 10-NN searches. Fig. 16 also shows the re- number of distance computations, its approximate so-
lation of the number of distance computations to the lutions are often far from the optimal solutions. On
value of α in 10-NN searches. In the range α ≥ 0.5, the other hand, the proposed method can reduce the
the proposed method shows the lower error rate than number of distance computations to some extent with
Distance to the k th Approximate Solution

Distance to the k th Approximate Solution


1.6 1.6
1.4 1.4
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
Distance to the k th Optimal Solution Distance to the k th Optimal Solution

Figure 17: The distribution of the approximate solu- Figure 19: The distribution the approximate solution
tion by Ak-LAESA to the optimal solution. by the proposed method with α = 0.9 to the optimal
solution.
Distance to the k th Approximate Solution

1.6
References
1.4
1.2 Ciaccia, P., Patella, M. & Zezula, P. (1997), M-tree:
An efficient access method for similarity search
1 in metric spaces, in ‘Proceedings of the 23rd
0.8 International Conference on Very Large Data
Bases (VLDB’97)’, pp. 426–435.
0.6
Hjaltason, G. R. & Samet, H. (2003), ‘Index-driven
0.4
similarity search in metric spaces’, ACM Trans-
0.2 actions on Database Systems 28(4), 517–580.
0 Micó, L. & Oncina, J. (1998), ‘Comparison of fast
0 0.2 0.4 0.6 0.8 nearest neighbour classifiers for handwritten
Distance to the k th Optimal Solution character recognition’, Pattern Recognition Let-
ters 19(3-4), 351–356.
Figure 18: The distribution the approximate solution Micó, L., Oncina, J. & Carrasco, R. C. (1996), ‘A
by the proposed method with α = 0.5 to the optimal fast branch & bound nearest neighbour classi-
solution. fier in metric spaces’, Pattern Recognition Let-
ters 17(7), 731–739.
Micó, M. L., Oncina, J. & Vidal, E. (1994), ‘A new
very low error rate. Moreover, the accuracy of its ap- version of the nearest-neighbour approximating
proximate solutions is superior to that of Ak-LAESA. and eliminating search algorithm (AESA) with
linear preprocessing time and memory require-
6 Conclusions ments’, Pattern Recognition Letters 15(1), 9–17.
Moreno-Seco, F., Micó, L. & Oncina, J. (2002),
In this paper, we proposed some improvements of ‘Extending LAESA fast nearest neighbour algo-
TLAESA. In order to reduce the number of distance rithm to find the k-nearest neighbours’, Lecture
computations in TLAESA, we improved the search Notes in Computer Science - Lecture Notes in
algorithm to best first order from depth first order Artificial Intelligence 2396, 691–699.
and the tree structure to a multiway tree from a bi-
nary tree. In the 1-NN searches and 10-NN searches Moreno-Seco, F., Micó, L. & Oncina, J. (2003), ‘A
in a 8-dimensional space, the proposed method re- modification of the LAESA algorithm for ap-
duced about 40% of distance computations. We then proximated k-NN classification’, Pattern Recog-
proposed the selection method of root object in the nition Letters 24(1-3), 47–53.
search tree. This improvement is very simple but is
effective to reduce the number of accesses to the dis- Navarro, G. (2002), ‘Searching in metric spaces
tance matrix. Finally, we extended our method to an by spatial approximation’, The VLDB Journal
approximation k-NN search algorithm that can en- 11(1), 28–46.
sure the quality of solutions. The approximate so- Rico-Juan, J. R. & Micó, L. (2003), ‘Comparison
lutions of the proposed method are suppressed by α1 of AESA and LAESA search algorithms using
times of the optimal solutions. Experimental results string and tree-edit-distances’, Pattern Recogni-
show that the proposed method can reduce the num- tion Letters 24(9-10), 1417–1426.
ber of distance computations with very low error rate Vidal, E. (1986), ‘An algorithm for finding nearest
by selecting the appropriate value of α, and that the neighbours in (approximately) constant average
accuracy of the solutions is superior to Ak-LAESA. time’, Pattern Recognition Letters 4(3), 145–157.
From these viewpoints, the method presented in this
paper is very effective when the distance computa- Yianilos, P. N. (1993), Data structures and algo-
tions are time-consuming. rithms for nearest neighbor search in general
metric spaces, in ‘SODA ’93: Proceedings of the
fourth annual ACM-SIAM Symposium on Dis-
crete algorithms’, pp. 311–321.

S-ar putea să vă placă și