Lecture12 - Vertical Fragmentation - II

Vertical Fragmentation
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/1

Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a perturbation of AA
 Initialization: Place and fix one of the columns of AA in CA.
 Iteration: Place the remaining n-i columns in the remaining i+1 positions in
the CA matrix. For each column, choose the placement that makes the most
contribution to the global affinity measure.
 Row order: Order the rows according to the column ordering.

Bond Energy Algorithm
“Best” placement? Define contribution of a placement:
cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Aj) –2bond(Ai, Aj)
where
n
bond(Ax,Ay) =  aff(Az,Ax)aff(Az,Ay)
z 1

BEA – Example
Consider the following AA matrix and the corresponding CA matrix where
A1 and A2 have been placed. Place A3:
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2
45 45 0
0 5 80
45 53 5
0 3 75
• When A4 is placed, the final form of the CA matrix (after row organization)
is A1 A3 A 2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2, …, An}
into two (or more) sets {A1, A2, …, Ai} and {Ai, …, An} such that
there are no (or minimal) applications that access both (or more
than one) of the sets.
A1 A2 A3 … Ai Ai+1 . . A
. m
A1
A2
TA
Ai
Ai+1
BA
Am

VF – Algorithm
Define
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
and
CTQ =total number of accesses to attributes by applications
that access only TA
CBQ =total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA
Then find the point along the diagonal that maximizes
CTQCBQCOQ2

VF – Algorithm
Two problems :
Cluster forming in the middle of the CA matrix
➡ Shift a row up and a column left and apply the algorithm to find the “best”
partitioning point
➡ Do this for all possible shifts
➡ Cost O(m2)
More than two clusters

➡ m-way partitioning
➡ try 1, 2, …, m–1 split points along diagonal and try to find the best point for
each of these
➡ Cost O(2m)

VF – Correctness
A relation R, defined over attribute set A and key K, generates the vertical
partitioning FR = {R1, R2, …, Rr}.
• Completeness
➡ The following should be true for A:
A =  ARi
• Reconstruction
➡ Reconstruction can be achieved by
R = ⋈•
K Ri, Ri  FR
• Disjointness
➡ TID's are not considered to be overlapping since they are maintained by the
system
➡ Duplicated keys are not considered to be overlapping

Extra Stuff

The basic steps of this clustering
algorithm are:
i. Create an attribute affinity matrix in which each entry indicates the affinity
between the two associate attributes. The entries in the similarity matrix are
based on the frequency of common usage of attribute pairs.
ii. The BEA then converts this similarity matrix to a BOND matrix in which the
entries represent a type of nearest neighbor bonding based on probability of
co-access. The BEA algorithm rearranges rows or columns so that similar
attributes appear close together in the matrix.
iii. Finally, the designer draws boxes around regions in the matrix with high
similarity.
The resulting matrix, modified from, is illustrated in Figure. The two shaded
boxes represent the attributes that have been grouped together into two
clusters.

Vertical Splitting
Bond Energy
Algorithm
Given the following access characteristics and access frequencies for Q1,...,Q4, calculate the optimal
vertical splitting using the Bond Energy Algorithm (BEA),
Steps :
1. Prepare an affinity matrix.
2. Apply BEA algorithm.
3. Perform vertical splitting by maximizing the split quality.
Name Family Age Position Location

Q1 1 1 1 0 0
Q2 0 0 1 1 0
Q3 0 1 0 1 1
Q4 0 0 1 0 1
Site A Site B Site C

Q1 20 1 0
Q2 10 5 9
Q3 80 1 9
Q4 2 5 4
Solution :
q1: 21
q2: 24
q3: 90
q4: 11
q1: A1 A2 A3 21
q2: A3 A4 24
q3: A2 A4 A5 90
q4: A3 A5 11
A1 A2 A3 A4 A5
A1 21 21 21 0 0
A2 21 111 21 9 9
A3 21 21 56 0 0
A4 0 2
90 24 114 1
A5 0 90 11 904 1
9
0
1
Place attributes: 0
place A1 1
contributioco at pos 0 = 2121
ntributionn at pos 1 = -1598
contribution at pos 2 = 2058
contribution is placed at pos 0: [A1, A5, A3]
attribute A1
place A2
attribute A2 is pos 2:
placed at [A1, A5, A2, A3]
place A4
contribution at 0 = 2394
pos
contribution at 1 = 27987
pos
contribution at pos
2 = 29157
contribution at pos
3 = 28716
contribution at pos
4 = 6960
attribute A4 is placed at pos 2: [A1, A5, A4, A2, A3]
resulting order: [A1, A5, A4, A2, A3]
find fragments:
split at [A1, A2, A3, A4] | [A5]
accesses frag1 alone: 45
accesses frag1 and frag2: 101
split quality = -10201
split at [A1, A2, A3] | [A4, A5]

split at [A1, A3] | [A2, A4, A5]

split at [A1] | [A2, A3, A4, A5]

split at [A1, A2, A3, A5] | [A4]

split at [A1, A3, A5] | [A2, A4]

split at [A1, A5] | [A2, A3, A4]

split at [A1, A3, A4, A5] | [A2]

split at [A1, A4, A5] | [A2, A3]
split at [A1, A2, A4, A5] | [A3]

optimal split(s) (sq = -441):

[A1] | [A2, A3, A4, A5]

Lecture12 - Vertical Fragmentation - II

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture12 - Vertical Fragmentation - II

Încărcat de

Drepturi de autor:

Formate disponibile

Vertical Fragmentation

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/1

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/2

cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Aj) –2bond(Ai, Aj)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/3

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/6

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/7

More than two clusters

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/8

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/9

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/10

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/11

Name Family Age Position Location

Site A Site B Site C

resulting order: [A1, A5, A4, A2, A3]

split at [A1, A2, A3] | [A4, A5]

split at [A1, A3] | [A2, A4, A5]

split at [A1] | [A2, A3, A4, A5]

split at [A1, A2, A3, A5] | [A4]

split at [A1, A3, A5] | [A2, A4]

split at [A1, A5] | [A2, A3, A4]

split at [A1, A3, A4, A5] | [A2]

split at [A1, A2, A4, A5] | [A3]

optimal split(s) (sq = -441):

S-ar putea să vă placă și