Sunteți pe pagina 1din 15

Vertical Fragmentation

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/1


Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a perturbation of AA
 Initialization: Place and fix one of the columns of AA in CA.
 Iteration: Place the remaining n-i columns in the remaining i+1 positions in
the CA matrix. For each column, choose the placement that makes the most
contribution to the global affinity measure.
 Row order: Order the rows according to the column ordering.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/2


Bond Energy Algorithm
“Best” placement? Define contribution of a placement:

cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Aj) –2bond(Ai, Aj)

where

n
bond(Ax,Ay) =  aff(Az,Ax)aff(Az,Ay)
z 1

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/3


BEA – Example
Consider the following AA matrix and the corresponding CA matrix where
A1 and A2 have been placed. Place A3:

Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/4
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2

45 45 0
0 5 80
45 53 5
0 3 75

• When A4 is placed, the final form of the CA matrix (after row organization)
is A1 A3 A 2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/5
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2, …, An}
into two (or more) sets {A1, A2, …, Ai} and {Ai, …, An} such that
there are no (or minimal) applications that access both (or more
than one) of the sets.

A1 A2 A3 … Ai Ai+1 . . A
. m
A1
A2
TA
Ai

Ai+1
BA
Am

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/6


VF – Algorithm
Define
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
and
CTQ =total number of accesses to attributes by applications
that access only TA
CBQ =total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA
Then find the point along the diagonal that maximizes

CTQCBQCOQ2

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/7


VF – Algorithm
Two problems :
Cluster forming in the middle of the CA matrix
➡ Shift a row up and a column left and apply the algorithm to find the “best”
partitioning point
➡ Do this for all possible shifts
➡ Cost O(m2)

More than two clusters


➡ m-way partitioning
➡ try 1, 2, …, m–1 split points along diagonal and try to find the best point for
each of these
➡ Cost O(2m)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/8


VF – Correctness
A relation R, defined over attribute set A and key K, generates the vertical
partitioning FR = {R1, R2, …, Rr}.
• Completeness
➡ The following should be true for A:

A =  ARi
• Reconstruction
➡ Reconstruction can be achieved by

R = ⋈•
K Ri, Ri  FR

• Disjointness
➡ TID's are not considered to be overlapping since they are maintained by the
system
➡ Duplicated keys are not considered to be overlapping

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/9


Extra Stuff

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/10


The basic steps of this clustering
algorithm are:
i. Create an attribute affinity matrix in which each entry indicates the affinity
between the two associate attributes. The entries in the similarity matrix are
based on the frequency of common usage of attribute pairs.

ii. The BEA then converts this similarity matrix to a BOND matrix in which the
entries represent a type of nearest neighbor bonding based on probability of
co-access. The BEA algorithm rearranges rows or columns so that similar
attributes appear close together in the matrix.

iii. Finally, the designer draws boxes around regions in the matrix with high
similarity.
The resulting matrix, modified from, is illustrated in Figure. The two shaded
boxes represent the attributes that have been grouped together into two
clusters.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/11


Vertical Splitting
Bond Energy
Algorithm

Given the following access characteristics and access frequencies for Q1,...,Q4, calculate the optimal
vertical splitting using the Bond Energy Algorithm (BEA),
Steps :
1. Prepare an affinity matrix.
2. Apply BEA algorithm.
3. Perform vertical splitting by maximizing the split quality.

Name Family Age Position Location


Q1 1 1 1 0 0
Q2 0 0 1 1 0
Q3 0 1 0 1 1
Q4 0 0 1 0 1

Site A Site B Site C


Q1 20 1 0
Q2 10 5 9
Q3 80 1 9
Q4 2 5 4
Solution :
q1: 21
q2: 24
q3: 90
q4: 11

q1: A1 A2 A3 21
q2: A3 A4 24
q3: A2 A4 A5 90
q4: A3 A5 11

A1 A2 A3 A4 A5
A1 21 21 21 0 0
A2 21 111 21 9 9
A3 21 21 56 0 0
A4 0 2
90 24 114 1
A5 0 90 11 904 1
9
0
1
Place attributes: 0
place A1 1
contributioco at pos 0 = 2121
ntributionn at pos 1 = -1598
contribution at pos 2 = 2058
contribution is placed at pos 0: [A1, A5, A3]
attribute A1
place A2
contribution at pos 0 = 3213
contribution at pos 1 = 28503
contribution at pos 2 = 28732
contribution at pos 3 = 7098
attribute A2 is pos 2:
placed at [A1, A5, A2, A3]

place A4
contribution at 0 = 2394
pos
contribution at 1 = 27987
pos
contribution at pos
2 = 29157
contribution at pos
3 = 28716
contribution at pos
4 = 6960
attribute A4 is placed at pos 2: [A1, A5, A4, A2, A3]

resulting order: [A1, A5, A4, A2, A3]

find fragments:
split at [A1, A2, A3, A4] | [A5]
accesses frag1 alone: 45
accesses frag2 alone: 0
accesses frag1 and frag2: 101
split quality = -10201

split at [A1, A2, A3] | [A4, A5]


accesses frag1 alone: 21
accesses frag2 alone: 0
accesses frag1 and frag2: 125
split quality = -15625

split at [A1, A3] | [A2, A4, A5]


accesses frag1 alone: 0
accesses frag2 alone: 90
accesses frag1 and frag2: 56
split quality = -3136

split at [A1] | [A2, A3, A4, A5]


accesses frag1 alone: 0
accesses frag2 alone: 125
accesses frag1 and frag2: 21
split quality = -441

split at [A1, A2, A3, A5] | [A4]


accesses frag1 alone: 32
accesses frag2 alone: 0
accesses frag1 and frag2: 114
split quality = -12996

split at [A1, A3, A5] | [A2, A4]


accesses frag1 alone: 11
accesses frag2 alone: 0
accesses frag1 and frag2: 135
split quality = -18225

split at [A1, A5] | [A2, A3, A4]


accesses frag1 alone: 0
accesses frag2 alone: 24
accesses frag1 and frag2: 122
split quality = -14884

split at [A1, A3, A4, A5] | [A2]


accesses frag1 alone: 35
accesses frag2 alone: 0
accesses frag1 and frag2: 111
split quality = -12321
split at [A1, A4, A5] | [A2, A3]
accesses frag1 alone: 0
accesses frag2 alone: 0
accesses frag1 and frag2: 146
split quality = -21316

split at [A1, A2, A4, A5] | [A3]


accesses frag1 alone: 90
accesses frag2 alone: 0
accesses frag1 and frag2: 56
split quality = -3136

optimal split(s) (sq = -441):


[A1] | [A2, A3, A4, A5]

S-ar putea să vă placă și