Sunteți pe pagina 1din 8

Solving Quadratic Assignment Problems by Genetic

Algorithms with GPU Computation: A Case Study


Shigeyoshi Tsutsui

Noriyuki Fujimoto

Department of Management and Information


Science, Hannan University
5-4-33 Amamihigashi, Matsubara,
Osaka 580-8502, Japan

Graduate School of Science, Osaka Prefecture


University
1-1 Gakuen-Cho, Naka-ku, Sakai-Shi,
Osaka, 599-8531, Japan

tsutsui@hannan-u.ac.jp

fujimoto@mi.s.osakafu-u.ac.jp

ABSTRACT

  
        

     
     

            
   !    "  
 "
        #
  
   $  
    %&'('
!
  )*+,         
           -  
&./      
  
   
   0   1'2 
0  $
.   "       3 
4* "
   '  5 67, 

Categories and Subject Descriptors

' * + 89
 '  
:;   < " =  /
 "  <
>?
/ @ ( 4 3 8 

 :; =
    >(  
 "   
General Terms

 

Keywords

 
 " (   
 "  

    "  = "  =



1.

INTRODUCTION

  
     
 
       

$   


 $  0   
     
 


    

   = A *BBB 83:" 


 
 
9      
$; 4      " *
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GECCO09, July 812, 2009, Montral Qubec, Canada.
Copyright 2009 ACM 978-1-60558-505-5/09/07 ...$5.00.

  9    "  3 

    '   "     


  "     9    
  
 !      
  $ 
 
    
$
  <
   

       "      
"    
      
 /      

 $
       
 C

     

 $    <


 

     $   
  $ /'/(
" $  0  
  
     $  "    
   $   
     
 "    
      
         $  
" 
 
    $"   "  


 
  " 
   

 
" 
 $       
      


  "      
 
'   $"        

     
     
  
     =(
   

 D
    
  ; !"        
   $ 
    
    
      
  <   
        
    $    E
$ <
 "
       F
     
    "       
        3 4 
        
  0 
     $"     " 
      3  4* "
  
'  = 5 67, 
" 
      

  
 $
'      "     
 
   
    $

  <
 *  "  <
 3  
         
   
 ' <
 G" C      $
  ! $" <
 ,

   

2.

A BRIEF REVIEW ON GPU COMPUATION AND ITS APPLICATION TO EVOLUTIONARY COMPUTATION

'    " =(       


  $ 
    ! 4 

  =(      
 #



   + 
 
   


 C
    47H2   $
I    "  C
   


    &./ &./   0  


$  
 <"
      =(  
 
$   &./   
  


  &./"  $
     
$
  


=(           
   '   "  
   
 =(      ! * '  =( 
"     
$;     
0
  
0         
0  
  $ 4" *"  3       
0  
     $      $ 4
 * #
  C
  
  
9 $ 
0  
  0  

     
      $     $
    
$        

0
   F
 $   $ ?
"  C       
0  
 4B*G <"    4B*G     "  
        
0   

&./     $    4,6 2J
?"
 

 &./"   $ 


$    
4BB  4,B 
  8*3:  &./ 
$

        F


       $
 $ 

    


 
          

 $
  
  
 $  
  
$      
    


   
0 
   

%&'(' 
    46*    
0  
 * 
0  
 8*3:
&./     $     7GH2 8*3:  
 
 
  
 
   $

 

 

 0  
   $
 + H2  
 8*3:
2.2 Applications of GPU Computation to the
Evolutionary Computation

   $ 


    
 9

 9  8*3: ! C"


 
0 
$   " *BB+ 86:" 
 
  $ 
 K H" *BB+ 84B:"  
    $ ?$
  " *BB+ 844:" #<
$ $ 
   
 
/ 0" *BB5 8*B:"  
  
 
 <

 " *BB+ 8*+:        
 
  I   
   
     C
 
 
 =( 

 $ !L *BB+ 85" 7: 

 
    
 84+" 43" **" 45:"   

2.1 GPU Computation

multiprocessor
multiprocessor
16KB
shared
shared
memory
mem.
proc.

proc.

proc.

device memory
VRAM

!  4; =( 

     $

 

          9


  

    
<    =(
      $

 9 " 


$   
  "
   2    M    $ 8*: 1 
      $    

   
$    1  K
2 " *BB+ 84G: 1  K ? *BB+  
<'/( =NN  $      
  $N 

  

   
    
  84,: .    *BB+     
 F
 $         8*7:
D K 2  *BB+     $  


   
  833:
        
 
 !0   *BB5  $     
 4 *,  ,   
  
 8,:  
  " 
  
  
   
         I 1 
(
) 3( D " / 1 K D "
*BB7 
   $   
  ? 
 
 

  83G: ? C  

  

  $ 
  =
$  
  $      
  " =$   *BB+   ( 
   
  (" 
    
      $   


 8G:
' "       
  
   "   O  
     

E       
   
 
  
         
   E    
  F

$  
C
   
 
 /   " 


    /  C
  <'/(  !"
F
        
   
$     $  
 / 47H2
 !
  )*+,"  C

on GPU
grid
block 2

block 3

thread 0
thread 1

thread 0
thread 1

...

thread n-1

thread n-1

thread n-1

thread 0
thread 1

...

thread n-1

thread n-1

d
an
s
te
ra
ne

ge

cate

ed

allocated

cat

dim3 grid(m, 1,); dim3 block(n, 1, 1);


kernel<<< grid, block >>>();

allocated

allo

memory copy from CPU to GPU;

allo

...

ex

ec

ut

es

int main()
{

block m-1

...

block 1
thread 0
thread 1

...

block 0
thread 0
thread 1

...

__global__ void kernel()


{
code dependent on block ID and threadID
}

...

on CPU

memory copy from GPU to CPU;

...

multiprocessor 1
multiprocessor 2

...

multiprocessor p

VRAM
GPU

!  *;    =(    


IMPLEMENTATION OF PARALLEL GA
FOR GPU COMPUTATION

location 1
5

'  84*:"     


    
 
 0     
   
    
 
 ! 
   
     "  
  

! 
   
    "
    O    
   
   
  
  
   " " 
 $      
     
 
 
   
    O 
 
     
      

  
 
  
     

  
$    $      
  
  
      
$

location 2

 



   

4

 0    
       
    ' $"        


     $   
$   

    
$   
 
!  3    C      
   G  
 C   O C 
    
 '  " $  $
  
 9    $  $
  
$
    9 "   PQ*" 4" G" 3R  
 
$ 4     
 *" 
$ *   
 
 4" 
$ 3     
 G"  
$


11

21

facility 4

30

facility 2

location

location

10

10

facility 3

12
facility

21

11

44

21

12

30

11

12

44

30

flow matrix f ij

distance matrix dij

I=

location 3

10
6


  

location 4
3

3.1 Quadratic Assignment Problem (QAP)

44

facility 1
2

facility

3.

cost (I )

ij

d I ( i )I ( j )

i 1 j 1

1524

!  3;   C  

 

G     
 3" 
$ !    
"   
  

   # 4  4,*G
         8*5: 
 
            

   <
 4   
  
   
    
  0  
1'2 
0  $  84:

Apply Crossover and mutation

I1
I2

Ii

better

Select another parent


randomly

IN

Pair wise
selection

W
I1'
I'2
I'N

!  G;       


3.2 GA Model for GPU Computation for QAP
3.2.1 The base GA model for QAP

D 
         
 

   
  =
 '
   "     $


    
    
 $ 
 =I 8*6" *4" 34:      
  #( 8*,: ?"    $   

 $ 
  !  G 
         
 1   
  D           
    0     
  
 "     0   0 $  
E      $  
     
     "      ;
< 4 <  
        
< * # 
     
< 3 ! 
       " 
      
   $  $ 
     "
      
      
< G ! 
  " $      $ 
< , # 
     
< 7 ! 
 "
 
     '  
  "  
   
< 5 '
   
      
< + '   
  "   
  I"   < 3

$" 
     E 
   ?"         $  

     ' < 7" 


 

  0     
   * ?
" 

       
  
     C     
               
  
  $ <
     

 $    " 
 



 C
     $ 0  
 

   $ / 846: ! 
     "        
  
  
          

$

       $ 
 
       $    
  0  #%' I.   $ D$   83*:
!
 "    $  
   0  "  "   
 I) 8*G:
  $ 
 /) 8+: $ 
        =    
 /)  0 
   I) 
   "   C   <
 G"  
/)  !  "     
      $
    
  C
  <   
    
    
$      


$ 
 
     $
 
 $   8*6" *4" 34:  
 

        
  <  *I

?"  $  
 
 F
 $"
     $  
   ! C
"        
 83B:      $
 0   
  
     $

         
  
      F
    
 "      E
     
"    $  $ 
 
    $

3.2.2 Parallel GA model for GPU Computation

  %&'(' !


  )*+, 
    
 $  3B 
 /  
 /  +
 
 <  47H2    $
     
   F
 $" 
            4*+ 
 
3 * 3 ! 
   "       

   3 * 4 <"  
    
  
   3 * 4    $  
 /
D 9              
   <"            

   
   9 $    
    =(      
            
 
         

   ;
4 '   
 "      -  "
$      &./   
* !     $ 
 " 
 / 

   
   " 
 
 
     &./    $" 
     
  !  3  

  "  9 $
  
      $  &./
3    
        
 

G  "     
 
0   $ 
  
    



,  


      

 9
    
 "       
     

 

  =(  "   


    9
 ;

4
4

8 individuals
4 4

at most 14 blocks of length 4

    $         $"


         $  $   
"
  $    
     

    *,, ?"     $
     
   $ F

       $ 
1            
  "
   
      $ <"    
    0        
        
  
 /  
  D   $ 47H2  $  / 
C            
  $  &./

 " 
 
 4*+       
0  47  +  
       ,7
!  ,     
0  J 4*+  
  

 

   '   


 "    
      $    
 $   
    9C  ,7     
=  $"     "   
 
      
  $   $
D  
 C  O C   

   $ 


   $
  

 


    $ 
    
"
            


16 times 8 threads load/store 8


individuals concurrently.
Each thread accesses 4 chars at the
same time. Some threads has no work
if the problem size of QAP is small.

4 /C    $ (


 
$" C

   $ 
"  

 D
  $ 7GH2
   $ 
 1
$ 
&./ 

  4BB  4,B      C



    

 I 
  " 
$   $ 

  
        0
 O
   


 < $  L 0  $


 


 $
* /C           /
 =( 

   
$  &./ 


 $   

 
 
       
 


0 847:     $ E
$"  C

   $
     
3 =
 

  &./ &./     


 F
 $     $ 


 $         C



         
 
 


      $  
  3*" 7G" 
4*+ $ 8*3: I  $ C  


 

    4B     


 



an array of L
elements of
type unsigned
char; L is
a multiple of 4
and at most 56

3.2.3 Implementation Details for GPU Computation

each individual

4 4

4 4

all individuals on VRAM

processed in parallel by one thread block


including 16 x 8 threads.
One thread block repeats the process 16 times
for the allocated 128 individuals

!  ,; =
 

    


D            

        

4. EXPERIMENTS
4.1 Experimental Conditions

'   $"    = 


    '  =
5 67, 
    %&'(' !
  )*+, 
I<  D  )    %&'(' 

 & 4+4 *B ! =(  
 " /

 & <  *BB,   #   


  JI*  =( * 4 <(H  
D    
  1'2 837:  
 
 3 4
 
9   G
@   $  
  
"     
 C"  
 
"    0  
 83G: '  C
 "     +  
 
 
9
             *B  GB@
*B " *, " 03B" 03B " 3B " 3, " 37 " 
GB
4B      
  
 D 
 
 $      9  
   

      = 



0 D

     3 * *  4  7 
  4      6       
        
    

  
9      C C
 
 
     4, 
   *B " 3B 
  
*, "  4*B 
   03B" 03B " 3B " 3, "
37 " 
$

 3 * *       <


       







$   &./"  &./   $ 


 
 '   $   
 

   
 "   
 $    =" 4  * 4   
$
  
       
   <

3 '
               

   4*+    #$ ,BB  
 "      
     - *

              2  $
,S       $ C
    
   $ ,B       
        
 $   
   =  4  * D    
<<#  
   =
 '    "
      PB 4
'  
 " D   
"   
4  * =
 "  
 ?
"   "  O     
$    $        
      E
  
  
  
 




4.2 Results

.      4 2  


  
 " M   E
 
    4  *  
  
G 4" 4           
              = 
   " *   
 
 
4 D
      4    
*    C
 03B

4 ' 


   "  -      
   
 
 -  
 $ =   
   


  '      


    F
       $
*           
! 
 "    0 $ 
0     
3     
  
    $"     " 9   
       

G '   $"    $  $ 
 
 ?
"  
 
  
  

    "  
 
 

    
 

,     
      
"  
    $  
 




!  $ C


"     0  
   
 9      
!"    $  $   

   
 $       
!"            $  
  <
 "
  =( 
 
 $ 
   &./ "   
 $  
 

     D    $

 
     '  $
 "

     $ $"   0  
    F
   
6. ACKNOWLEDGMENTS

% M     




    =
  I *B "  C
" 
  
      P
     "      B B7G 4 
*           &
   4  *  B G*+  B G**" 

$ 
  7 5  7 7   
4  *" 
$      
 
  E       

        * 6  4* 7    *"

    9     

  
    $

5.

CONCLUSIONS

   
 
" 
 $  
      
 '  
"        
 
   
       

       
 
  0       $"  
   "       3 
4* "
   '  = 5 67, 

" 
       
  
 $
      0;

D  0   0 ( /  "  I I


0 
  $"   
   

  
  $  $  / 
$  # 
 " =" <" <

  
  $
 T       <
 9
.
  
46,BB466      $    
U <
  24+5BBB,+   T <
$ 
   <


7. REFERENCES

84: 1'2    


      $"
*BB6 ;JJ    J
8*: D 2 " < ?  " D 1  "   D


  
    


   < " *BB+
83: # = A #F
   

   

  H 
 
 " *BBB
8G: =$ " 1 "  1 "  /$" 
' 1 $ .     
   

   ' 
  
  4B  
 
   

 $
 " *BB+
8,: H !0" D "  / D # $

 
  
  
'### '    <$" ***" *BB5

  4; #C    




  
  =
    

GPU computation
QAP
instances
tai20b

Total
#OPT
population
128u30u1

10

CPU computation with single thread

T avg
(sec)

std

0.064

0.005

GA-1
T avg
(sec)

Total
#OPT
population
128u30u1

10

0.428

std

GA-2
T avg
(sec)

Total
#OPT
population

0.039

128u30u1

10

Speedup ratio to
CPU computation
std

GA-1

GA-2

0.422

0.042

6.7

6.6

tai25b

128u30u1

10

0.169

0.015

128u30u1

10

1.386

0.135

128u30u1

10

1.286

0.145

8.2

7.6

kra30a

128u30u5

10

2.002

1.741

128u30u2

9.651

4.541

128u30u4

11.870

3.115

4.8

5.9

kra30b

128u30u5

1.332

0.732

128u30u5

23.399

11.492

128u30u4

16.745

11.164

17.6

12.6

tai30b

128u30u3

10

0.947

0.576

128u30u3

10

22.649

6.830

128u30u1

10

7.203

6.274

23.9

7.6

tai35b

128u30u4

10

2.510

0.740

128u30u3

10

22.649

6.830

128u30u1

10

7.203

6.274

9.0

2.9

ste36b

128u30u4

10

3.337

1.056

128u30u4

10

33.274

13.062

128u30u2

10

14.675

3.836

10.0

4.4

tai40b

128u30u1

10

1.088

0.087

128u30u1

6.016

0.486

128u30u1

10

5.811

0.482

5.5

5.3

#OPT : the number of runs in which the algorithm succeeded in finding the optimal solution
T avg : the average time to find optimal solutions in successful runs in second
std : standard deviation of T avg

87: % !L (  C


 
 
 =( 

  
 1"
4+G" *BB+
85: % !L ! C
 
 
!
 ++BB ) ' 
     ** '###
'      (  

<$ '(<" *BB+
8+: (    
   
"
  
      D$
 
 $" 46+6
86: < $" = H " 
H       
  
  
 
0 

 
 ' 
     , '.
D0   .
   . < 
..<" *BB+
84B: H   < H ' 
     *BB6
= 
    < 
9
(
 " *BB6
844: ?$"  =$0"  ." ! ' " . /$"
 / L  2 
    $  


    
 '

     **      

 
  <
 " *BB+
84*: H   / 2
0   
    
  
 


#
 
" *," 46,5
843: < 1    %$  <  

   
  ' 
   
 '### '     K ( 

 <$ '(<"  " *BB6
84G: D 1   D 2      
 
    

  '

     44 # = 
 
 
  < " *BB+
84,: D 1    ?    

       
 
  < =   !  !  "
/    
 " 4*4*" *BB+

847: # 1 " T %
0" < I  " 
T / $ %  ;   9 
 

 

 '### /
" *+*" *BB+
845: U 1" # V "  ) < 
   
0      '

     '### '     K
(  
 <$ '(< 
" *BB6
84+:  /
  H 2    

 

 9

 ;  
 
C
  
 ' 
    
'    /
 
  = <


 '  
  $" *BB+
846: < / 
     

   ' 
     <C
'    = 
   
 
/  H " 466,
8*B: < / 0 = 
     F
 
  

 
$ $ ' 
  
  '### '    = 
  < 

  = 
 '=<= *BB5" *BB5
8*4: & /    =     $ 
   
     '###
 
   H    ( #   "
44," 4666
8**: < /0    
 $ 
 '

     + '    / ? 

 =  =  <


&#=."  1%=< ,337" *BB+
8*3: %&'('" *BB6
;JJ  
J J 
8*G: ' I" ( <"  T ?   $ 

    
   ' 
     *
'    = 
   
  1
#  
 '
" 46+5
8*,: / 0 " < "  . H ( 
$
"  "   
  
  ' 
      


8*7:
8*5:
8*+:

8*6:
83B:

# $ = = 


 #==I
*BB5 =/" *BB5
( .  " & / $"  = ! 
     +B  ' 
  
  44 # = 
   

  < " *BB+


< <     
 C
  T    =/" *3" 4657
/ <
 " T &W  " < 
" 
( /X  =E
  
  

 
 ; 
  
 

  ' 
     *BB+
 
 
=  " *BB+
<Y  ? ? /C   $ !
  = <$" 476" *BBB
Z    

  *BBG
;JJ
 

J J
  J [


834: <    


 $   
 
      $


 ' 
     <C '   
= 
    = $ I  <
'  
 < " *BB+
83*: ( D$" <0"  ( !$
<
        ;
  
  
    '

    3 '    = 
 
 
  /  H " 46+6
833:  D  D 2 1  1   

     
M C C 37B '

     '### =   # $
= *BB+ =#=MB+" *BB+
83G: / D  D  $   

  
  
   '

    '### =   # $
= *BB7 =#=MB7" *BB7

S-ar putea să vă placă și