Sunteți pe pagina 1din 9

High Performance Multipliers in QuickLogic FPGAs

Introduction
Performing a hardware multiply is necessary in any system that contains Digital Signal Processing (DSP) functionality such as filtering, modulation, or video processing. Often there is an off-the-shelf component that the engineer can use to solve his problem. However, in some designs, the expense of a dedicated DSP chip is not justified and so the use of a Field Programmable Gate Array (FPGA) is more viable. Another reason for using a programmable device to perform hardware multiplication is if the system parameters are non-standard and would require expensive circuitry outside the DSP, or running over-powered DSP chips for a large and small operand combinations (a 12- by 4-bit multiplier, for example). Using an FPGA in cases like these gives complete flexibility, high performance, and lower cost over an off-the-shelf solution. This application note describes various techniques available to the designer for performing high-speed signed multiply operations in QuickLogic FPGAs. A brief discussion on the basics of digital multiplication will be introduced, followed by discussions on how to implement the signed multiplier to meet your design requirements. Example implementations will be shown, from the slower, purely combinatorial version to the fully pipe-lined version that runs at 200MHz. The example designs shown in this application note are done in schematic as well as in Verilog and VHDL. These are generic designs that can be easily modified to suit various design needs. These are not hard macros. Therefore they can be modified by users. All timing information presented in this application note is derived from worst case simulations performed within QuickWorks, QuickLogics integrated Windowsbased design software.

Twos Complement
Before starting with signed multiplication, a quick review of the 2s complement system of signed number representation for a binary number would be helpful in understanding the derivation of the algorithm. Basically, in the 2s complement system, the left most bit (MSB) indicates the sign of the number, with 0 being positive, 1 being negative. To obtain a negative number, simply subtract the corresponding positive number from 2 n, where n is the number of bits of the original number. For example, to obtain the 4 bit signed number -4, take 24(b10000), and subtract from it the corresponding positive number, 4(b0100), the result is (b1100), -4 in the 2s complement representation. Alternatively, it can be obtained by subtracting 2(n-1) from the corresponding positive number to obtain the negative number. Using the same example, to obtain the number -4, take the number 4(b0100) and subtract 2(4-1)(b1000)from it. The result is (b1100). The multiplier algorithm discussed in this application note takes advantage of the latter method. One important note on using 2s complement number representation is the need for sign extension. That is, to obtain a negative number using a greater number of bits, simply repeat the sign bit to the left until the desired number of bits are filled. For example, to extend the number -4 (b1100) from 4 bits to 8 bits, the resultant sign extended number would be (b11111100).

Binary Multiply Basics


Similar to the familiar long hand decimal multiplication, binary multiplication involves the addition of shifted versions of the multiplicand based on the value and position of each of the multiplier bits. As a matter of fact, its much simpler to perform binary multiplication than decimal multiplication. The value of each digit of a binary number can only be 0 or 1, thus, depending on the value of the multiplier bit, the partial products can only be a copy of the multiplicand, or 0. In digital logic, this is simply an AND function. Figure 1 shows the multiplication of two unsigned 4 bit numbers as an example.

a3 X b3

a2 b2

a1 b1

a0 b0

a3b0 a2b0 a1b0 a0b0 a3b1 a2b1 a1b1 a0b1 a3b2 a2b2 a1b2 a0b2 a3b3 a2b3 a1b3 a0b3 a0b0 a1b0+a0b1 a2b0+a1b1+a0b2+carry a3b0+a2b1+a1b2+a0b3+carry a3b1+a3b2+a1b3+carry a3b2+a2b3+carry a3b3+carry carry Product P, p0 p1 p2 p3 p4 p5 p6 p7

Figure 1. Basic Binary Multiplication

Baugh-Wooley Algorithm
The Baugh-Wooley algorithm for the unsigned binary multiplication is based on the concept shown in Figure 1. The algorithm specifies that all possible AND terms are created first, and then sent through an array of half-adders and full-adders with the carry-outs chained to the next most significant bit at each level of addition. For signed multiplication (by utilizing the properties of the twos complement system) the Baugh-Wooley algorithm can implement signed multiplication in almost the same way as the unsigned multiplication shown above. The algorithm for the signed multiplication is listed in Figure 2.

P=

a[m-1]b[n-1]2(m+n-2)

i = n-2

NOT(a[m-1]b[i])2(i+m-1)

i=0

+ +

j = m-2

NOT(a[j]b[n-1])2(j+n-1)

j=0

i = m-2, j = n-2

a[i]b[j]2(i+j)

i = 0, j = 0

2(n-1) + 2(m-1) - 2(m+n-1)

Figure 2. Baugh-Wooley algorithm for signed multiplication Although the equation in Figure 2 looks complicated, the hardware implementation is actually very similar to the unsigned implementation. The difference is that except for the AND term involving the MSB of both the multiplicand and multiplier, all other AND terms involving the MSBs of the multiplicand or the multiplier are inverted before being feed into the adder array. The last two addition terms in the equation are easy to implement, involving only the addition of a 1 to the appropriate adder stage. The last term in the equation is a subtraction, which can be implemented with a XOR gate in the last stage. The architecture block diagram in Figure 3 illustrates the flow for the multiplication of two 4-bit signed numbers using the equation in Figure 2 (m= n=4). The mathematics involved in deriving the equation in Figure 2 are left as an exercise for the reader.
a0b3 a0b2 a0b1 a0b0

HA
a1b3

a1b2

HA

a1b1

HA

a1b0

FA
a2b3

a2b2

FA

a2b1

FA

a2b0

FA
a3b3

a3b2

FA

a3b1

FA

a3b0

HA

HA

FA

HA

HA

HA

p7

p6

p5

p4

p3

p2

p1

p0

Figure 3: Architectural block diagram of the Baugh-Wooley multiply algorithm.

Implementing an 8-bit signed Baugh-Wooley multiplier


To implement an 8-bit signed Baugh-Wooley multiplier, set m=n=8 for the equation in Figure 2. Solve the equation to obtain the AND terms and the arrangement of the adder array. The array approach of the algorithm allows clean hardware implementation that clearly shows the data flow. The schematic in Figure 4 shows the combinatorial implementation of the 8-bit multiplier. Since QuickLogics pASIC2 and pASIC 3 families of FPGAs can implement a full adder in a single logic cell, the combinatorial 8-bit signed multiplier takes up less than 110 logic cells, less than

20% of the logic resources available in the QL2009 device targeted. This implementation of the multiplier can run at 25MHz.
A[7] B[0] A[6] B[1] A[6] B[0] A[5] B[1] A[5] B[0] A[4] B[1] A[4] B[0] A[3] B[1] A[3] B[0] A[2] B[1] A[2] B[0] A[1] B[1] A[1] B[0] A[0] B[1] B[0] Cin AND2i0 A[0]

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[1] A[6] B[2] A[5] B[2]

SUM Cout A[4] B[2]

SUM

Cout A[3] B[2]

SUM Cout A[2] B[2]

SUM

Cout A[1] B[2]

SUM Cout A[0] B[2]

SUM

Cout C[0]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[2] A[6] B[3] A[5] B[3]

SUM

Cout A[4] B[3]

SUM Cout A[3] B[3]

SUM

Cout A[2] B[3]

SUM Cout A[1] B[3]

SUM

Cout A[0] B[3]

SUM Cout C[1]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[3] A[6] B[4] A[5] B[4]

SUM

Cout A[4] B[4]

SUM

Cout A[3] B[4]

SUM Cout A[2] B[4]

SUM

Cout A[1] B[4]

SUM Cout A[0] B[4]

SUM

Cout C[2]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[4] A[6] B[5] A[5] B[5]

SUM

Cout A[4] B[5]

SUM

Cout A[3] B[5]

SUM

Cout A[2] B[5]

SUM Cout A[1] B[5]

SUM

Cout A[0] B[5]

SUM Cout C[3]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[5] A[6] B[6] A[5] B[6]

SUM

Cout A[4] B[6]

SUM

Cout A[3] B[6]

SUM

Cout A[2] B[6]

SUM

Cout A[1] B[6]

SUM Cout A[0] B[6]

SUM

Cout C[4]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[6] A[6] B[7] A[5] B[7]

SUM

Cout A[4] B[7]

SUM

Cout A[3] B[7]

SUM

Cout A[2] B[7]

SUM

Cout A[1] B[7]

SUM

Cout A[0] B[7]

SUM Cout C[5]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

A[7] B[7]

Cout Cin

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout Cin

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

P[15]

P[11]

P[10]

P[14]

P[13]

P[12]

P[9]

P[8]

P[7]

P[6]

P[5]

P[4]

P[3]

P[2]

Figure 4: Combinatorial 8-bit Signed Multiplier

In addition to the schematic multiplier illustrated in Figure 4, the synthesis-friendly logic cells in the pASIC2 and pASIC3 families of FPGAs are also well suited for high performance Verilog and VHDL implementations. These HDL versions of the multiplier have comparable performance to the schematic version, while offering the ease of use of HDLs, with powerful features such as scaleable width for design reusability. To verify the performance of the multiplier, the Path Analyzer static timing analysis tool included with QuickWorks and QuickTools were used to obtain the flip-flop-to-flip-flop delay (in order to get the true performance of the multiplier, the inputs and outputs were registered). The Silos III Verilog simulator included in QuickWorks was used to perform both pre-layout functional simulation and post-layout timing simulation. A self-checking Verilog test fixture is used to randomly generate up to 400 input vectors, and automatically notify the user when a failure occurs. A VHDL version of the test fixture is also included for users who wish to use their own VHDL simulator to perform the simulation.

P[1]

P[0]

The results for different versions of the combinatorial 8-bit signed multiplier are listed in the Comparison of Methods Presented section later in the application note.

Speeding Up the Multiplier


Looking at the schematic in Figure 4, it is obvious that the performance of the multiplier can be increased by reducing the carry delay through the half-adder array. Upon closer inspection, one will see that the half-adder array is nothing more than a ripple adder. Since the half-adder array consists of 8 stages, think of it as an 8 bit ripple adder. By replacing the ripple adder with, for example, a carry-select adder, several nanoseconds can be saved. However, although several fast adder options exists, the increase in performance by replacing the ripple adder is limited to no more than 15% due to the delays through the AND/ADD array. To gain higher performance, the multiplier needs to be pipelined.

Pipelining for Performance


Throughput is often a greater concern than one-cycle response in DSP designs. In this case, the engineer may opt to tolerate some latency in the multiply operation in return for a faster maximum clock rate. This is accomplished in a multiplier by breaking the carry-chains up and inserting flip-flops at strategic locations. Care must be taken to ensure that all inputs to an adder are created from signals at the same stage in the pipeline (i.e., the registers must effect a clean break across the matrix of adders). Once the maximum acceptable latency is determined, the placement of the registers needs to be carefully considered. The delay should be evenly distributed across the pipeline, so each stage has the same delay as the other stages. The path delays can be analyzed with the Path Analyzer in QuickWorks or QuickTools. The locations to insert the pipeline registers can then be determined by the distribution of the path delays. The example in Figure 5 shows the 8-bit signed multiplier in Figure 4 with a single pipeline stage inserted.

A[7] B[0]

A[6] B[1]

A[6] B[0]

A[5] B[1]

A[5] B[0]

A[4] B[1]

A[4] B[0]

A[3] B[1]

A[3] B[0]

A[2] B[1]

A[2] B[0]

A[1] B[1]

A[1] B[0]

A[0] B[1]

B[0]

Cin

Cin

Cin

Cin

Cin

Cin

Cin

A[0] AND2i0

Cout A[7] B[1] A[6] B[2] A[5] B[2]

SUM Cout A[4] B[2]

SUM

Cout A[3] B[2]

SUM Cout A[2] B[2]

SUM

Cout A[1] B[2]

SUM Cout A[0] B[2]

SUM

Cout C[0]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[2] A[6] B[3] A[5] B[3]

SUM

Cout A[4] B[3]

SUM Cout A[3] B[3]

SUM

Cout A[2] B[3]

SUM Cout A[1] B[3]

SUM

Cout A[0] B[3]

SUM Cout C[1]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[3] A[6] B[4] A[5] B[4]

SUM

Cout A[4] B[4]

SUM

Cout A[3] B[4]

SUM Cout A[2] B[4]

SUM

Cout A[1] B[4]

SUM Cout A[0] B[4]

SUM

Cout C[2]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[4] A[6] B[5] A[5] B[5]

SUM

Cout A[4] B[5]

SUM

Cout A[3] B[5]

SUM

Cout A[2] B[5]

SUM Cout A[1] B[5]

SUM

Cout A[0] B[5]

SUM Cout C[3]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout A[7] B[5] A[6] B[6] A[5] B[6]

SUM

Cout A[4] B[6]

SUM

Cout A[3] B[6]

SUM

Cout A[2] B[6]

SUM

Cout A[1] B[6]

SUM Cout A[0] B[6]

SUM

Cout C[4]

SUM

Cin

Cin

Cin

Cin

Cin

Cin

Cin

Cout CLK

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM Cout

SUM

DA[7] DB[6]

DA[6] DB[7]

DA[5] DB[7]

DA[4] DB[7]

DA[3] DB[7]

DA[2] DB[7]

DA[1] DB[7]

Cin

Cin

Cin

Cin

Cin

Cin

DA[0] DB[7]

Cin

DA[7] DB[7]

Cout Cin

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout Cin

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

Cout

SUM

P[5]

P[15]

P[11]

P[10]

P[9]

P[4]

P[1]

P[14]

P[13]

P[12]

P[8]

P[7]

P[6]

P[3]

Figure 5: Single Stage Pipelined 8-Bit Signed Multiplier With just a single stage of pipeline inserted, the multiplier now runs in excess of 35MHz. The Verilog and VHDL test fixtures for the pipelined version of the multiplier are very similar to the test fixtures for the combinatorial version, except the addition of pipeline registers to the expected result that are used to check against the output.

P[2]

P[0]

Fully Pipelined 200MHz Multiplier


If the latency is acceptable, the multiplier can be fully pipelined by adding registers between every adder . This would give a latency of 2n-1 clock cycles, where n is the number of bits of the inputs. The schematic in Figure 6 shows a fully pipe-lined version of the multiplier that can run up to 200MHz. Notice that in order to control the fan out to no more than 4, a fan out register array is used. This design utilizes 390 logic cells, taking up approximately 75% of the QL2007 device targeted. In order to obtain the 200MHz performance, strict control of the design is needed, therefore only the schematic version of the design can run at 200MHz. The HDL versions of the fully pipelined design run at slower clock rate.
X[0] QD QD QD QD QD QD Y[0] QD QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

Xe1 QD

Xd1 QD QD

Xc1 QD

Xb1 QD

Xa1

X[2] QD

Yg1 QD

Yf1

Ye1 QD

Yd1 QD QD

Yc1

Yb1 QD

Ya1 QD

Xe2 QD

Xd2 QD QD

Xc2 QD

Xb2 QD

Xa2 QD

X[3] QD

Yg2 QD

Yf2

Ye2 QD

Yd2 QD QD

Yc2

Yb2 QD

Ya2 QD

Xe3 QD

Xd3 QD QD

Xc3 QD

Xb3 QD

Xa3 QD QD

X[4] QD

Yg3 QD

Yf3

Ye3 QD

Yd3 QD QD

Yc3

Yb3 QD

Ya3 QD

Xe4 QD

Xd4 QD QD

Xc4 QD

Xb4 QD

Xa4 QD QD QD

X[5] QD

Yg4 QD

Yf4

Ye4 QD

Yd4 QD QD

Yc4

Yb4 QD

Ya4 QD

Xe5 QD

Xd5 QD QD

Xc5 QD

Xb5 QD

Xa5 QD QD QD QD

X[6] QD

Yg5 QD

Yf5

Ye5 QD

Yd5 QD QD

Yc5

Yb5 QD

Ya5 QD

Xe6 QD

Xd6 QD QD

Xc6 QD

Xb6 QD

Xa6 QD QD QD QD QD

X[7] QD

Yg6 QD QD

Yf6

Ye6 QD

Yd6 QD QD

Yc6

Yb6 QD

Ya6 QD

Xe7

Xd7

Xc7

Xb7

Xa7

Yh7

Yg7

Yf7

Ye7

Yd7

Yc7

Yb7

Ya7 Xa0 Ya0 clk P[15:0]

QD

QD

QD

QD

QD

QD

QD

200 MHz 8x8 Multiplier Signed, Pipelined

QD

QD

Xd0

Xc0

Xb0

QD

Xa0

X[1]

Yg0

Yf0

Ye0

Yd0

Yc0

Yb0

Ya0

Y[1]

Y[2]

Y[3]

Y[4]

Y[5]

Y[6]

Y[7]

Xd1 Ya6 C S C S

Xc1 Ya5 C S

Xc1 Ya4 C S

Xb1 Ya3 C S

Xb1 Ya2 C S

Xa1 Ya1 C S

Xa1 Ya0

AND2i0

Xd0 Ya7

Xd0 Ya6

Xb0 Ya3

Xb0 Ya2

Xa0 Ya1

Xc0 Ya5

Xc0 Ya4

Xe1 Yb7 C S

Xd2 Yb6 C S

Xc2 Yb5 C S

Xc2 Yb4 C S

Xb2 Yb3 C S

Xb2 Yb2 C S

Xa2 Yb1 C S

Xa2 Yb0

QD

Xe2 Yc7 C S

Xd3 Yc6 C S

Xc3 Yc5 C S

Xc3 Yc4 C S

Xb3 Yc3 C S

Xb3 Yc2 C S

Xa3 Yc1 C S

Xa3 Yc0

QD

QD

Xe3 Yd7 C S

Xd4 Yd6 C S

Xc4 Yd5 C S

Xc4 Yd4 C S

Xb4 Yd3 C S

Xb4 Yd2 C S

Xa4 Yd1 C S

Xa4 Yd0

QD

QD

QD

Xe4 Ye7 C S

Xd5 Ye6 C S

Xc5 Ye5 C S

Xc5 Ye4 C S

Xb5 Ye3 C S

Xb5 Ye2 C S

Xa5 Ye1 C S

Xa5 Ye0

QD

QD

QD

QD

Xe5 Yf7 C S

Xd6 Yf6 C S

Xc6 Yf5 C S

Xc6 Yf4 C S

Xb6 Yf3 C S

Xb6 Yf2 C S

Xa6 Yf1 C S

Xa6 Yf0

QD

QD

QD

QD

QD

Xe6 Yg7 C S

Xd7 Yg6 C S

Xc7 Yg5 C S

Xc7 Yg4 C S

Xb7 Yg3 C S

Xb7 Yg2 C S

Xa7 Yg1 C S

Xa7 Yg0

QD

QD

QD

QD

QD

QD

Xe7 Yh7 C S C S C S C S C S C S C S

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

C S

C S

C S

C S

C S

QD

C S

QD

QD

QD

QD

QD

QD

QD

QD

C S

C S

C S

C S

QD

C S

QD

QD

QD

QD

QD

QD

QD

QD

QD

C S

C S

C S

QD

C S

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

C S

C S

QD

C S

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

C S

QD

C S

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

C S

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

P[15]

P[14]

P[13]

P[12]

P[11]

P[10]

P[9]

P[8]

P[7]

P[6]

P[5]

P[4]

P[3]

P[2]

P[1]

Figure 6: 200MHz Fully Pipelined 8-bit Signed Multiplier

P[0]

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

QD

clk

X[7:0]

Y[7:0]

Comparison of Methods Presented


The table below shows the results of worst-case simulations (Vcc = 4.75V, Ta = 70C) of the QL2009-2PQ208C. All simulations were performed in QuickWorks Silos III Verilog simulator with the test fixtures included with the design files. All designs are based on the Baugh-Wooley adder tree algorithm. As with any type of design, the choice of algorithm usually involves a trade-off of some kind: either increased latency vs. reduced throughput or size vs. speed. It is up to the designer to chose the best implementation that meets the design requirement. Multiplier Area(Logic Cells) Speed Latency Combinatorial 8-bit Signed (Schematic) 110 25MHz Combinatorial 8-bit Signed (HDL) 112 21.7MHz 1-Stage Pipelined 8-bit Signed(Schematic) 110 35MHz 2 clock cycles 1-Stage Pipelined 8-bit Signed (HDL) 110 34MHz 2 clock cycles Fully Pipelined 8-bit Signed(Schematic) 390 200MHz 16 clock cycles Fully Pipelined 8-bit Signed(HDL) N/A N/A Table 1: Speed and size comparison of the methods presented in this paper

Written By: John Birkner - V.P. CAE, QuickLogic Jim Jian - Customer Engineering, QuickLogic Kevin Smith - Field Applications Engineer, QuickLogic

Further recommended reading 1. J. McCanney and J. McWhirter, Completely Iterative, Pipelined Multiple Array Suitable for VLSI, IRE Proceedings, Vol. 129. No. 2, April, 1982 R. Lyon, 2s Complement Pipelined Multipliers, IEEE Transactions on Communications, April, 1976. Paul J. Song and Giovanni De Micheli Circuit and Architecture Trade-offs for High-Speed Multiplication, IEEE Journal of Solid-State Circuits, Vol. 26, No. 9, September 1991.

2. 3.

QuickWorks is a registered trademark of QuickLogic corporation. Pentium is a registered trademark of Intel corporation. Windows 95 is a registered trademark of Microsoft corporation. Synplify-Lite is a registered trademark of Synplicity, Inc.

Appendix A: Deriving the Algorithm


For two's compliment numbers A and B defined as: A = SUM(a[i]2**i for i=0 to m-2) -a[m-1]2**(m-1) B = SUM(b[j]2**j for j=0 to n-2) -b[n-1]2**(n-1) the product, P, of A and B is derived as follows: P = = + AB a[m-1]b[n-1]2**(m+n-2) SUM(a[i]2**i for i=0 to m-2) b[n-1]2**(n-1) SUM(b[j]2**j for j=0 to n-2) a[m-1]2**(m-1) SUM(a[i]b[j]2**(i+j) for i=0 to m-2,for j=0 to n-2)

Recognizing that addition is more convenient than subtraction, and utilizing the equality: SUM(c[k]2**k for k=0 to n-1) = (2**n)-1-SUM(not(c[k]2**k) for k=0 to n-1) then the equation for P can be converted to the following form: P = + a[m-1]b[n-1]2**(m+n-2) 2**(n-1)(2**(m-1)-1-SUM(not(a[i]b[n-1])2**i for i=0 to m-2)) 2**(m-1)(2**(n-1)-1-SUM(not(b[j]a[m-1])2**j for j=0 to n-2)) SUM(a[i]b[j]2**(i+j) for i=0 to m-2,for j=0 to n-2)

Simplifying this equation yields: P = + + + + + a[m-1]b[n-1]2**(m+n-2) SUM(not(a[i]b[n-1])2**(i+n-1) for i=0 to m-2) SUM(not(b[j]a[m-1])2**(j+m-1) for j=0 to n-2) SUM(a[i]b[j]2**(i+j) for i=0 to m-2,for j=0 to n-2) 2**(n-1) 2**(m-1) 2**(m+n-1)

S-ar putea să vă placă și