Ee800 DFP

Decimal Floating-Point Arithmetic
Dongdong Chen
EE800, U of S
Objectives
IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1)
DFP numbers formats DFP number encoding DFP arithmetic operations DFP rounding modes DFP exception handling
EE800, U of S 2
Objectives (Con.)
Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2)
DFP adder/substracter DFP multiplier DFP divider DFP transcendental function computation
EE800, U of S
Background
The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true?
EE800, U of S
Introduction
Decimal is still essential for specific applications
Numbers in commercial databases are decimal Extensive use decimal in commercial applications Survey of commercial databases report Decimal fixed-point or floating-point number
How to process decimal computation

Software computation Convert back to decimal representation Problems
EE800, U of S
Introduction (Con.)
Errors from decimal and binary conversion
Example 1: represent 0.1 in DFP or BFP Decimal representation (BCD code):0.0001 Binary representation: 0.00011 0.09 Example 2: telephone billing Cost: 0.70; Tax: 5% BFP arithmetic: 0.69998*(1.05)=0.734999 DFP arithmetic: 0.70*(1.05)=0.74
Decimal integer, fixed-point or floating-point? Decimal hardware or software solutions?

EE800, U of S 6
Current Researches
DFP arithmetic defined in IEEE 754-2008 IBM computing systems include DFP hardware
IBM Power6, z9, z10
Intel include DFP software solution in system

Intel DFP software computation library
DFP arithmetic IP blocks:

Basic DFP arithmetic IPs: DFP adder/substrcter, multiplier, divider, square root etc.
Transcendental DFP arithmetic IPs:

DFP CORDIC, Logarithm, antilogarithm, reciprocal etc.
EE800, U of S 7
DFP Arithmetic in IEEE 754-2008
Review BFP arithmetic in IEEE 754-2008 How to define new DFP in IEEE 754-2008
EE800, U of S
BFP Floating-point representation

Representation:
sign, exponent, significand (or mantissa):
(1)sign significand 2exponent

more bits for significand gives more accuracy more bits for exponent increases range
IEEE 754 floating point standard:

single precision: 8 bit exponent, 23 bit significand double precision: 11 bit exponent, 52 bit significand
EE800, U of S
BFP floating-point Number

Leading 1 bit of significand is implicit
Example: if the significand is 0110101100, the actual significand is 1.0110101100
This is called a normalized number; there is exactly one non-zero digit to the left of the point.
Unique representation of a number We get a little more precision: there are 24 bits in the significand, but only 23 of them are stored.
EE800, U of S
10
Exponent
Exponent is biased to make sorting easier
all 0s is smallest exponent, all 1s is largest The actual exponent is e-127 for single precision, and e-1023 for double precision Bias of 127 for single precision and 1023 for double precision By biasing the exponent and storing it before the significand, we can compare magnitudes as if they were unsigned integers.
If e = 1000 0011 (13110), the actual exponent is 131-127=4 If e = 0101 1101 (9310), the actual exponent is 93-127=-34
EE800, U of S
11
BFP Floating-Point Formats

Short (32-bit) format
8 bits, bias = 127, 126 to 127 23 bits for fractional part (plus hidden 1 in integer part)
Sign Exponent
11 bits, bias = 1023, 1022 to 1023
Significand
52 bits for fractional part (plus hidden 1 in integer part)
Long (64-bit) format
EE800, U of S
12
BFP Floating-Point Formats (Con.)

Positive and negative zero Positive and negative infinity
0 1 00000000 00000000000000000000000
Biased exponent Fraction
1 11111111 00000000000000000000000 0
Fraction
Biased exponent Negative underflow Negative Overflow Expressible negative numbers
Positive underflow
Expressible positive numbers Positive Overflow
- (2 2-23)2128
-2-127
2-127
(2 2-23)2128
exponent = 128 and fraction 0, It is called not a number or NaN

EE800, U of S 13
Example
Summary: FP representation (1)sign(1+significand)2exponent bias Example:
decimal: -.75 = -3/4 = -3/22 binary: -.11 = -1.1 x 2-1 floating point: exponent = 126 = 01111110 IEEE single precision: 1 01111110 10000000000000000000000
EE800, U of S 14
DFP Number Representation

Representation:
sign, exponent, significand (or mantissa): (1)sign significand 10exponent more digits for significand gives more accuracy more bits for exponent increases range representation:
DFP formats:
decimal32: DFP storage format encoded in 32-bit
decimal64: DFP computational format encoded in 64-bit

decimal128: DFP computational format encoded in 128-bit
EE800, U of S 15
DFP Number format
1-bit Sign (S) is defined as same as BFP format
w+5-bit combination (G) to two subfield:

5-bit (G0G4) to encode: 2 MSBs of exponent; 1 MSD of significand; Not-a-Number (NaN); Inf; W-bit(G5Gw+4) as a suffix 2 MSBs derived from G0G4, which consists of w+2-bit nonnegative biased exponent.
EE800, U of S 16
DFP Exponent
Exponent is biased to make sorting easier
Binary format (not decimal) The actual exponent is e-101 for decimal32, e-398 for decimal64, e-6167 for decimal128 Range of exponent is (eminq+1) e (emaxq+1);
EE800, U of S
17
DFP Number format (Con.)

J10-bit Trailing Significand (T) Field:
Densely packed decimal (DPD) encoding 3-digit decimal number encoded to 10-bit binary number DPD converted to binary coded decimal (BCD) Binary integer decimal (BID) encoding decimal number encoded by binary integer Non-normalized decimal significand (-1)0 0.00900 102 (-1)0 0.09000 101 DFP numbers Cohort
EE800, U of S
18
Parameters in DFP Format
EE800, U of S
19
Example
Summary: DFP representation (1)sign(significand)10exponent-bias Convert -8.3510-2 to decimal64
Sign bit: 1 negative, 0 positive (sign 1) Exponent: -2+398=396 (8-bit 0110001100) Significand: 835(50-bit DPD coding 000 02 3D) Encoding of 5-bit MSBs (G0G4) of Combinational field 01000 Decimal-64 : 10100010001100..001000111101 A2 30 00 00 00 00 02 3D (binary/hex)
EE800, U of S 20
DFP special values

Not-a-Number: G0G4 11111; Infinite Number: G0G4 11110, sign of Inf according to the sign bit; Overflow: If DFP numbers with absolute values are larger than the largest DFP number (|vmax|=(10q 1)10emax-q+1) then overflow occurs. Underflow: If DFP number are less than the smallest DFP number (|vmin|=10emin-q+1) then underflow occurs. If the absolute value of DFP number is less than 10emin and larger than 10emax-q+1, it produces subnormal. Normal number: The remaining exponent values and significands represent normal numbers.
EE800, U of S 21
DFP Arithmetic Operations

Basic DFP arithmetic operations Two decimal-specific DFP operations
SameQuantum(DFP1,DFP2) Quantize(DFP1,DFP2)
DFP comparison operations do not distinguish between redundant of the same number DFP conversion operations
DFP to BFP conversion (correctly rounded); DFP to integer conversion
Recommended DFP operations

EE800, U of S 22
DFP Arithmetic Operations

Basic DFP arithmetic operations Two decimal-specific DFP operations
SameQuantum(DFP1,DFP2) Quantize(DFP1,DFP2)
DFP comparison operations do not distinguish between redundant of the same number DFP conversion operations
DFP to BFP conversion (correctly rounded); DFP to integer conversion
Recommended DFP operations

EE800, U of S 23
DFP Numbers Cohort

Non-normalized decimal significand DFP numbers Cohort Standard defines the preferred (required) exponent (quantum)
Exact operation results: the cohort member is selected based on the preferred exponent (quantum) for a DFP result of that operation Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits
EE800, U of S
24
DFP Rounding Modes

Five types of active rounding modes
roundTiesToEven roundTiesToAway roundTiesToPositive roundTiesToNegative roundTowardZero
Correct rounding and Faithful rounding IEEE 754-2008 require to satisfy the correct rounded results for all DFP arithmetic operations DFP operations should satisfy all rounding modes
EE800, U of S 25
DFP Exception Handling

Invalid operation: Operand is NaN; 0Inf; quareroot of negative operand; default result is NaN Division by zero: if the dividend is a finite non-zero number and the divisor is zero. The default result is a +inf or inf. Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation. Underflow operation: if the magnitude of a result is below 10emin. Inexact: the correctly rounded result of an operation differs from the infinite precision result.
EE800, U of S 26
DFP Addition/Subtraction
EE800, U of S
27
DFP Add/Sub Data flow
EE800, U of S
28
DFP Addition
Step 1: equalize the exponents
add the mantissas only when exponents are the same. the number with smaller exponent should be shifting its point to the left, and the number with larger exponent should be shifting its point to right. Rewriting the operand with the smaller exponent could result in a loss of the least significant digits keep guard digit, round digit, and stick digit for the operand with smaller exponent
EE800, U of S 29
DFP addition
Step 2: add the mantissas 0099999x101 +0016234x10-3 0999990x100 0000016(234)x100 1000006(234) x100 Step 3: Normalize the result if necessary
EE800, U of S 30
DFP addition
Step 4: Round the number if needed 1000006234x100 =1000006x100 Step 5: Repeat step 3 if the result is no longer normalized The final result is 1000006 The correct answer is 1000006.234
EE800, U of S 31
Guard bits
To help minimize rounding problems, IEEE specifies that intermediate steps of operations must store guard digits additional internal digits that increase the precision of the operations. Previous example: add one extra digit. IEEE 754-2008 requires one guard digit, one rounded digit and one sticky digit to make rounding more accurate.
EE800, U of S 32
DFP add/sub
EE800, U of S
33
General Description: Addition
EE800, U of S
34
Example: Addition
EE800, U of S
35
Example: Addition (Con.)
EE800, U of S
36
DFU: IBM POWER6 and Z10
EE800, U of S
37
High performance Implementation
EE800, U of S
38
EE800, U of S
39
[12] A. Vzquez and E. AnteloA High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding ARITH19, Portland. June 08-10 2009
EE800, U of S 40
Evaluation Results and Comparison
[Proposed]: A. Vzquez and E. AnteloA High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding ARITH19, Portland. June 08-10 2009
EE800, U of S 41
DFP Multiplication
EE800, U of S
42
Scheme of decimal multiplier

x: y: xy0: 5x 0 xy1: 5x x xy2 : x 0 xy3: 10x 2x 1963 8145= 9815 0000 9815 -1963 1963 0000 19630 -3926 15988635
43
EE800, U of S
Partial product generation
Generate XYi Yi {1,2,37,8,9} XYi is carry save format
EE800, U of S
44
Partial product generation

Solid Circles: BCD Sum (digit) Hollow Circles: Carry (bit)
n-digit radix-10 CSA
m-digit radix-10 counter

EE800, U of S 45
Carry Save Adder Tree
CSA Tree to Generate Multiplication Result
EE800, U of S
46
Flowchart of DFP Multiplier
47
Architecture of DFP Multiplier
48
Exception Detection & Handling

Invalid operation
sNaN (pass significand of sNaN) 0 x (produce qNaN with significand 0)
Overflow (and Inexact)

IEIP SLA > Emax Increase SLA until all LZs removed
Underflow (and possibly Inexact)

IEIP SLA < Emin Decrease SLA until 0, then shift right
Inexact
49
Implementation Highlights
Leverage operands' LZCs
SC, SLA, and IESIP
Handle NaNs with minimal overhead

No dataflow modification Coerce multiplicand or multiplier to 1
Support gradual underflow

No dataflow modification Simply extend number of iterations
Simple, control-based rounding scheme

50
Synthesis Results
64-bit (16 digit) operands, DPD encoded LSI Logic's gflxp 0.11um CMOS, 55ps FO4 Synopsys Design Compiler Results
Fixed-point Floating-point 119,653 um2 237,607 um2 14.72 FO4s 15.45 FO4s
Critical path
Fixed-point Floating-point 4:2 compressor (accumulator) 128-bit barrel shifer
51
Applicability to Parallel Designs

IE and IP shift generation Rounding scheme NaN handling Exception detection and handling
On-the-fly sticky bit generation... NO
52
Sequential vs. Parallel

Sequential
Less area Potentially better cycle time
Parallel
Less latency Higher throughput
53
DFP Division
EE800, U of S
54
DFP Division Data Flow

64 64
Sign (1 bit)
Combinational Field (5 bits)
Exponent Field (8 bits)

E1_b 8 8 E2_b
Significands Field (50bits)

M1_b 50 50 M2_b
Unpacking
5 C1 1 1
5 C2
2 E1_a 2 E2_a E1
Combin_Register
10 10 E2
DPD_to_BCD
M1_b 60 60 M2_b
S1
Combinational Div Process

S2
Exponent Substraction
4 M2_a 4 M1_a E12 10
Combin_Register
Sign Logic
M1 64
64
M2
Mantissa Division Bias Addition

F Sq Ea 10 Mn
72
Exponent Adjustment
1 1
Fa 1
Normalization
72
10 Mn
72
Rounding Control
Exponent Adjustment
Ea Eq_C 10
Fa2 1
Rounding
64 Mq
1 Fr
Combinational Com Process
2 Mq_C 4
Exponent Div
Significand_Div
60 Mq
Eq Cq 5
BCD_to_DPD
50 Mq
11
Sign (1 bit) Eb
ExponentM12 (8 bits) Field

64
Significands Field (50 bits)
packing
Unpacking Decimal FloatingPoint Number Check for zeros and infinity Subtract exponents Divide Mantissa Normalize and detect overflow and underflow Perform rounding Replace sign Packing
55
EE800, U of S
Unpacking and Sign Logic

64 64
Sign (1 bit)
Exponent Field (8 bits)
Significands Field (50bits)
Unpacking
S1 1
Step1: Unpacking Floating-Point Number Check for zeros and infinity (if F=0, Stop)
1 S2
Step2: Sign Process
Sign Logic
Sq S1 S2
1 Sq
EE800, U of S
56
Exponent Subtraction
E1 11 11 E2
Exponent Substraction
E12 11
Step3: Exponent Subtract
Eb E1 E2 + bias
Bias Addition
Eb
11
EE800, U of S
57
Mantissa Division
M1 64 64 M2
Algorithms Choose here? 1. Restoring division 2. Non-restoring division 3. High-Radix division 4. Convergence division
Mantissa Division
Step4: Mantissa Division
0.1 M1 1 0.1 M 2 1
M12 68
M min 0.1 M max 1 10 p 1
0.1 M min / M max M1 / M 2 M max / M min 10

EE800, U of S 58
Normalization
Eb 10 M12 68
Exponent Adjustment
Ea 10
1 Fa
Normalization
Mn
68
Step5 : Left shift over one bit is needed to make Mantissa result Normalized, also need to detect overflow and underflow For example: 09342140819564 Left shift one bit 93421408195640 Should tell exponent and Ea=Eb-1
EE800, U of S 59
Rounding and Packing

10 Ea 68 Mn 68
Exponent Adjustment
10 Eq
Fr 1
Rounding
Fr 1
Rounding Control
64
Mq
Step6 : Truncate, Round-up, Round-to-nearest. Sometimes, the Rounding Policy above is not fair, according to IEEE Rounding standard: Round to nearest even is more better.
11
Sign (1 bit) Eb
ExponentM12 (8 bits) Field

64
Significands Field (50 bits)
packing
Step7: Packing the Sign bit and Exponent bits and Significand bits together, detect the NaN, Infinity,
EE800, U of S 60
[1] L.-K. Wang and M. J. Schulte, Decimal Floating-Point Division Using Newton-Raphson Iteration, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 84-95, Sep. 2004.
EE800, U of S 61
[2] Toms Lang and Alberto Nannarelli, A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture,IEEE Transactions on Computers, pp727739, IEEE, June 2007.
EE800, U of S 62
EE800, U of S
63
Evaluation Results and Comparison

DFP Divider[1] Precision (digit) Cycle time (ns) # of cycles Latency (ns) 16 (decimal64) 0.57 150 85.5 DFP Divider[2] 16 (decimal64) 1 20 20
1:
Synthesized with a STM 90-nm standard cell library
EE800, U of S
64
DFP Transcendental Arithmetic
EE800, U of S
65
Contents
Introduction Decimal Logarithmic Converter Decimal Antilogarithmic Converter Conclusions Future Work
EE800, U of S
66
32-bit DFP Logarithm

X (1) s 10e coefficient
R log10 ( X ) log10 (10e ) + log10 (coefficient )
coefficient is a non-normalized decimal Integer.
Example: R log10 ((1)0 108 0024589)
8 + 5 + log10 (0.2458900)
To guarantee a 32-bit DFP Calculation, there need to keep 14-digit FXP logarithmic calculation.
EE800, U of S 67
32-bit DFP Antilogarithm

Here:
log10 ( X min ) X log10 ( X max )
P Anti log10 ( X ) 10 X
For 32-bit DFP:

Anti log10 ( X ) 10
X [101,96.99999]
X Int X Frac
10
1
X Int
10
X frac
Example:
Anti log10 ((1) 1940467 10 )
Anti log10 (19.40467) 1019 100.4046700

To guarantee a 32-bit DFP calculation, there need to keep 8-digit FXP antilog calculation.
EE800, U of S 68
Digit-Recurrence Algorithm (Log)

The corresponding recurrences:
E ( j + 1) E[ j ](1 + e j 10 j ) L( j + 1) L[ j ] log10 (1 + e j 10 j )
Here:
E[1] m
L[1] 0
ej -9 -8 -70 17 8 9
e j selected so that E( j + 1) converges to 1

EE800, U of S 69
Digit-Recurrence Algorithm (Antilog)

Any 7-digit fixed-point decimal input N:
10( m) em ln(10) em '

The corresponding recurrences:
L( j + 1) L[ j ] ln(1 + e j 10 )
E ( j + 1) E[ j ] (1 + e j 10 j )
f i 1 + e j 10 j Here: E[1] 1 L[1] m '
e j selected so that L( j + 1) converges to 0
ej -9 -8 -70 17 8 9
EE800, U of S 70
Selection By Rounding (cont.)

A scaled remainder is defined as: Log: Antilog:
W [ j ] 10 j (1 E[ j ])
W [ j ] 10 ( E[ j ])
j
e j is achieved by Rounding W [j]

e j round (W [ j ])
e1 is achieved by using look-up table, e2ej can be obtained with selection by rounding
EE800, U of S 71
Architecture: Decimal Log Converter

m 8 Detector 28 2
Mult1
Tab I
4
0000
Reg 1 28
Mux 7
8 32 m2m 3m 5m e1 Reg 2 56 m' e1 4 56 Mux 1 56 4 56 56 m'
Stage 1
0000
e1 4 ej
Stage 2
4 (1/ln(10)) 4 56 Adjusted Costant 0 & Log 10(5,2,3) 64 Mux 9 64 Reg 6 64
W[j]
4e j
Mux 2
Mult2
Shifter (x10-j) 56 Reg 4 Mux 5 56 9'sCom 56
56 ej m' Mux 3 56 9'sCom 56
4 56
4 56 1 Mux 4 56 56 W[j]
Tab II Mult3
64 Mux 8 64 64
14-Digit Decimal CLA Adder 56 Shifter (x10) 56 Mux 6 56 Shifter (x100) 56
16-Digit Dec CLA

64 4
14-Digit Dec CLA Rounding Logic

56 W[j] ej 4
Reg 5
Reg 3
critical path
EE800, U of S
72
Implementation Results
Logic Utilization Used Available* Utilization
# of Occupied Slices Maximum Frequency # of Clock Cycles
2842
13696 21% 47.7 MHz 17 clock cycle
*: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7
Critical Path Detail (ns):

Reg2 1.188 Mux2 Mult 2 1.564 9.347 Shifter 1.438 Mux5 1.350 CLA 5.519 Round 0.566 Total 20.97
73
EE800, U of S
Architecture: Dec. Antilog Converter

Cons Mul 0000 32
m' 40 Reg 2 40
X frac 28 Reg 1 28
28
ln(10)
Stage 1
Stage 2
Critical Path
ej 4 ej 40 e1 40 40 1 Mux 5 40 Reg 6 40
TAB I e 8
1
12
9'sCom 40 40 Shifter_Reg 40 Mux 3 40
AddGen AddGen 7 7 Mux 1 7 0000 TABLE II 40 40 9'sCom Shifter (x10j+1) 40 40 Mux 2 40
Mult 40 0000
Mux 4 40 Shifter (x10-j) 40
10-digit Dec CLA W[j]

40 Rounding Logic ej 4 Reg 3 ej 4
10-digit Dec CLA L(j) 40

Final Rounding 28 Reg 5 28
40
EE800, U of S
74
Implementation Results
Logic Utilization Used Available* Utilization
# of Occupied Slices Maximum Frequency # of Clock Cycles
2315
13696 17% 51.5 MHz 11 clock cycle
*: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7
Critical Path Detail (ns):

Reg6
1.599
Mult
7.839
Mux4
1.539
Shifter
1.100
CLA
6.794
Round
0.545
Total
19.42
EE800, U of S
75
Comparison
(with Binary FXP Log and Exponential Converters) similar dynamic range for the normalized coefficients.
223 107 224 252 1016 253 Binary reference available having the same digitrecurrence algorithm with Selection by Rounding. The radix-10 is close to radix-8.
EE800, U of S
76
Comparison (cont.)
(with Binary FXP Log and Exponential Converters)
Radix-10 Decimal1 Log. Exp. Radix-8 Binary [1] Log. Exp.
Precision (digit)
Area (fa2) Cycle time (T3) # of cycles Latency (T3)
1:
16
16
24
53
24
53
1630 2640 1370 2260 17 8 136 19 17 323 16 8 128 18 17 306
647 1829 7 8 56 8 18 144
627 1777 7 11 77 8 21 168
Synthesized with a TMSC 0.18-um standard cell library 2: the area of 1-bit full adder 3: the delay of 1-bit full adder
EE800, U of S 77
Conclusions
Achieved 32-bit DFP accuracy of decimal log and antilog results. Implemented them on FPGA and ASIC. Compare them with binary converters.
EE800, U of S
78
Future Work
The 64-bit and 128-bit DFP logarithm and antilog converters. The presented architecture can be optimized to achieve a faster speed or occupy a smaller area.
EE990 April. 2009
Decimal Log and Antilog Converters
EE800, U of S
79
79/18
Summary
IEEE 754-2008 defines a DFP standard that defines
number representation in several precisions correct DFP arithmetic operations rounding modes
Implementation of DFP Adder, Multiplier, Divider, Logarithmic and Antilogarithmic Converter Implementing and programming DFP are both really hard.
EE800, U of S
80

Ee800 DFP

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Ee800 DFP

Încărcat de

Drepturi de autor:

Formate disponibile

Decimal Floating-Point Arithmetic

How to process decimal computation

Decimal integer, fixed-point or floating-point? Decimal hardware or software solutions?

Intel include DFP software solution in system

DFP arithmetic IP blocks:

Transcendental DFP arithmetic IPs:

DFP Arithmetic in IEEE 754-2008

BFP Floating-point representation

(1)sign significand 2exponent

IEEE 754 floating point standard:

BFP floating-point Number

BFP Floating-Point Formats

Long (64-bit) format

BFP Floating-Point Formats (Con.)

Biased exponent Negative underflow Negative Overflow Expressible negative numbers

exponent = 128 and fraction 0, It is called not a number or NaN

DFP Number Representation

decimal64: DFP computational format encoded in 64-bit

DFP Number format

1-bit Sign (S) is defined as same as BFP format

w+5-bit combination (G) to two subfield:

DFP Number format (Con.)

Parameters in DFP Format

DFP special values

DFP Arithmetic Operations

Recommended DFP operations

DFP Arithmetic Operations

Recommended DFP operations

DFP Numbers Cohort

DFP Rounding Modes

DFP Exception Handling

DFP Add/Sub Data flow

General Description: Addition

Example: Addition (Con.)

DFU: IBM POWER6 and Z10

High performance Implementation

High performance Implementation

High performance Implementation

Evaluation Results and Comparison

Scheme of decimal multiplier

Partial product generation

Generate XYi Yi {1,2,37,8,9} XYi is carry save format

Partial product generation

n-digit radix-10 CSA

m-digit radix-10 counter

Carry Save Adder Tree

CSA Tree to Generate Multiplication Result

Flowchart of DFP Multiplier

Architecture of DFP Multiplier

Exception Detection & Handling

Overflow (and Inexact)

Underflow (and possibly Inexact)

Handle NaNs with minimal overhead

Support gradual underflow

Simple, control-based rounding scheme

Applicability to Parallel Designs

On-the-fly sticky bit generation... NO

Sequential vs. Parallel

DFP Division Data Flow

Combinational Field (5 bits)

Exponent Field (8 bits)

Significands Field (50bits)