Sunteți pe pagina 1din 510

COMPUTER ARITHMETIC

Algorithms and Hardware Designs

Behrooz Parhami
Department of Electrical and Computer Engineering University of California, Santa Barbara

New York

Oxford

OXFORD UNIVERSITY PRESS

2000

Oxford University Press


Oxford New York Athens Auckland Bangkok Bogota Buenos Aires Calcutta Cape Town Chennai Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Paris Sao Paulo Singapore Taipei Tokyo Toronto Warsaw
and associated companies in Berlin lbadan

Copyright 2000 by Oxford University Press, Inc.


Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 http://www.oup-usa.org Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press.

Library of Congress Cataloging-in-Publication Data

Parhami, Behrooz. Computer arithmetic : algorithms and hardware designs I Behrooz Parhami. p. em. Includes bibliographical references and index. ISBN 0-19-512583-5 (cloth) 1. Computer arithmetic. 2. Computer algorithms. I. Title. QA76.9.C62P37 1999 98-44899 004'.01'513-dc21 CIP

Printing (last digit): 9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper

To the memory of my father, Salem Parhami ( 1922-1992 ), and to all others on whom I can count for added inspiration, multiplied joy, and divided anguish.

CONTENTS

Preface

xv

PART I NUMBER REPRESENTATION

NUMBERS AND ARITHMETIC 3 1 .1 What Is Computer Arithmetic? 3 1 .2 A Motivating Example 5 1 .3 Numbers and Their Encodings 6 1 .4 Fixed-Radix Positional Number Systems 11 1 .5 Number Radix Conversion 1.6 Classes of Number Representations 14 Problems 15 References 18

2 REPRESENTING SIGNED NUMBERS


2.1 2.2 2.3 2.4 2.5 2.6

19

Signed-Magnitude Representation 19 Biased Representations 21 Complement Representations 22 Two's- and 1's-Complement Numbers 24 Direct and Indirect Signed Arithmetic 27 Using Signed Positions or Signed Digits 28 Problems 31 References 33

REDUNDANT NUMBER SYSTEMS 35 3.1 Coping with the Carry Problem 35 3.2 Redundancy in Computer Arithmetic 37 3.3 Digit Sets and Digit-Set Conversions 39 3.4 Generalized Signed-Digit Numbers 41
vii

viii

Contents

3.5 Carry-Free Addition Algorithms 43 3.6 Conversions and Support Functions 48 Problems 50 References 52

RESIDUE NUMBER SYSTEMS 54 4.1 RNS Representation and Arithmetic 54 4.2 Choosing the RNS Moduli 57 4.3 Encoding and Decoding of Numbers 60 4.4 Difficult RNS Arithmetic Operations 64 4.5 Redundant RNS Representations 66 4.6 Limits of Fast Arithmetic in RNS 67 Problems 70 References 72

PART II ADDITION/SUBTRACTION

5 BASIC ADDITION AND COUNTING


5.1 5.2 5.3 5.4 5.5 5.6

75

Bit-Serial and Ripple-Carry Adders 75 Conditions and Exceptions 78 Analysis of Carry Propagation 80 Carry Completion Detection 82 Addition of a Constant: Counters 83 Manchester Carry Chains and Adders 85 Problems 87 References 90

CARRY-LOOKAHEAD ADDERS 91 6.1 Unrolling the Carry Recurrence 91 6.2 Carry-Lookahead Adder Design 93 6.3 Ling Adder and Related Designs 97 6.4 Carry Determination as Prefix Computation 6.5 Alternative Parallel Prefix Networks 100 6.6 VLSI Implementation Aspects 104 Problems 104 References 107 VARIATIONS IN FAST ADDERS 7.1 Simple Carry-Skip Adders

98

108
108

Contents

ix

7.2 7.3 7.4 7.5 7.6

Multilevel Carry-Skip Adders 111 Carry-Select Adders 114 Conditional-Sum Adder 116 Hybrid Adder Designs 11 7 Optimizations in Fast Adders 120 Problems 120 References 123

MULTIOPERAND ADDITION 125 8.1 Using Two-Operand Adders 125 8.2 Carry-Save Adders 128 8.3 Wallace and Dadda Trees 131 8.4 Parallel Counters 133 8.5 Generalized Parallel Counters 134 8.6 Adding Multiple Signed Numbers 136 Problems 137 References 140

PART Ill MULTIPLICATION

BASIC 9.1 9.2 9.3 9.4 9.5 9.6

MULTIPLICATION SCHEMES 143 Shift/Add Multiplication Algorithms 143 Programmed Multiplication 145 Basic Hardware Multipliers 146 Multiplication of Signed Numbers 148 Multiplication by Constants 151 Preview of Fast Multipliers 153 Problems 153 References 156

10

HIGH-RADIX MULTIPLIERS 157 10.1 Radix-4 Multiplication 157 10.2 Modified Booth's Recoding 159 10.3 Using Carry-Save Adders 162 10.4 Radix-8 and Radix-16 Multipliers 10.5 Multi beat Multipliers 166 10.6 VLSI Complexity Issues 167 Problems 169 References 171

164

Contents

11

TREE 11.1 11.2 11.3 11.4 11 .5 11.6

AND ARRAY MULTIPLIERS 1 72 Full-Tree Multipliers 172 Alternative Reduction Trees 175 Tree Multipliers for Signed Numbers Partial-Tree Multipliers 180 Array Multipliers 181 Pipelined Tree and Array Multipliers Problems 186 References 189

178

185

12

VARIATIONS IN MULTIPLIERS 191 12.1 Divide-and-Conquer Designs 191 12.2 Additive Multiply Modules 193 12.3 Bit-Serial Multipliers 195 12.4 Modular Multipliers 200 12.5 The Special Case of Squaring 201 12.6 Combined Multiply-Add Units 203 Problems 204 References 207

PART IV DIVISION

13

BASIC DIVISION SCHEMES 211 13.1 Shift/Subtract Division Algorithms 211 13.2 Programmed Division 213 13.3 Restoring Hardware Dividers 216 13.4 Nonrestoring and Signed Division 218 13.5 Division by Constants 221 13.6 Preview of Fast Dividers 223 Problems 224 References 226 HIGH-RADIX DIVIDERS 228 14.1 Basics of High-Radix Division 228 14.2 Radix-2 SRT Division 230 14.3 Using Carry-Save Adders 234 14.4 Choosing the Quotient Digits 236 14.5 Radix-4 SRT Division 238

14

Contents

xi

14.6 General High-Radix Dividers Problems 241 References 244

240

15

VARIATIONS IN DIVIDERS 246 15.1 Quotient Digit Selection Revisited 246 15.2 Using p-d Plots in Practice 248 15.3 Division with Prescaling 250 15.4 Modular Dividers and Reducers 252 15.5 Array Dividers 253 15.6 Combined Multiply/Divide Units 255 Problems 256 References 259 DIVISION BY CONVERGENCE 261 1 6.1 General Convergence Methods 261 16.2 Division by Repeated Multiplications 263 16.3 Division by Reciprocation 265 16.4 Speedup of Convergence Division 267 16.5 Hardware Implementation 269 16.6 Analysis of Lookup Table Size 270 Problems 272 References 2 75

16

PART V REAL ARITHMETIC

17

FLOATING-POINT REPRESENTATIONS 279 17.1 Floating-Point Numbers 279 17.2 The ANSI/IEEE Floating-Point Standard 282 17.3 Basic Floating-Point Algorithms 284 1 7.4 Conversions and Exceptions 286 17.5 Rounding Schemes 287 17.6 Logarithmic Number Systems 291 Problems 293 References 296 FLOATING-POINT OPERATIONS 297 18.1 Floating-Point Adders/Subtractors 297 18.2 Pre- and Postshifting 300

18

xi i

Contents

18.3 18.4 18.5 18.6

Rounding.and Exceptions 303 Floating-Point Multipliers 304 Floating-Point Dividers 306 Logarithmic Arithmetic Unit 307 Problems 308 References 311

19

ERRORS AND ERROR CONTROL 31 3 19.1 Sources of Computational Errors 313 19.2 Invalidated Laws of Algebra 316 318 19.3 Worst-Case Error Accumulation 19.4 Error Distribution and Expected Errors 320 19.5 Forward Error Analysis 322 19.6 Backward Error Analysis 323 Problems 324 References 327 PRECISE AND CERTIFIABLE ARITHMETIC 20.1 High Precision and Certifiability 328 20.2 Exact Arithmetic 329 20.3 Multiprecision Arithmetic 332 20.4 Variable-Precision Arithmetic 334 20.5 Error Bounding via Interval Arithmetic 20.6 Adaptive and Lazy Arithmetic 338 Problems 339 References 342

20

328

336

PART VI FUNCTION EVALUATION

21

SQUARE-ROOTING METHODS 345 21.1 The Pencil-and-Paper Algorithm 345 21.2 Restoring Shift/Subtract Algorithm 347 21.3 Binary Nonrestoring Algorithm 350 21 .4 High-Radix Square-Rooting 352 21.5 Square-Rooting by Convergence 353 21 .6 Parallel Hardware Square-Rooters 356 Problems 357 References 360

Contents

xiii

22

THE CORDIC ALGORITHMS 361 22.1 Rotations and Pseudorotations 361 22.2 Basic CORDIC Iterations 363 22.3 CORDIC Hardware 366 22.4 Generalized CORDIC 367 22.5 Using the CORDIC Method 369 22.6 An Algebraic Formulation 372 Problems 373 References 376

23 VARIATIONS IN fUNCTION EVALUATION


23.1 23.2 23.3 23.4 23.5 23.6

378

Additive/Multiplicative Normalization 378 Computing Logarithms 379 Exponentiation 382 Division and Square-Rooting, Again 384 Use of Approximating Functions 386 Merged Arithmetic 388 Problems 389 References 393

24

ARITHMETIC BY TABLE LOOKUP 394 24.1 Direct and Indirect Table Lookup 394 24.2 Binary-to-Unary Reduction 395 24.3 Tables in Bit-Serial Arithmetic 397 24.4 Interpolating Memory 400 24.5 Trade-Offs in Cost, Speed, and Accuracy 24.6 Piecewise Lookup Tables 403 Problems 406 References 409

402

PART VII IMPLEMENTATION TOPICS

25 HIGH-THROUGHPUT ARITHMETIC
25.1 25.2 25.3 25.4 25.5 25.6

413

Pipelining of Arithmetic Functions 413 Clock Rate and Throughput 415 The Earle Latch 418 Parallel and Digit-Serial Pipelines 419 On-Line or Digit-Pipelined Arithmetic 421 Systolic Arithmetic Units 425

xiv

Contents Problems 426 References 429

26

low-POWER ARITHMETIC 430 26.1 The Need for Low-Power Design 430 26.2 Sources of Power Consumption 432 26.3 Reduction of Power Waste 434 26.4 Reduction of Activity 436 26.5 Transformations and Trade-Offs 438 26.6 Some Emerging Methods 441 Problems 443 References 446 FAULT-TOLERANT ARITHMETIC 447 27.1 Faults, Errors, and Error Codes 447 27.2 Arithmetic Error-Detecting Codes 451 27.3 Arithmetic Error-Correcting Codes 455 27.4 Self-Checking Function Units 456 27.5 Algorithm-Based Fault Tolerance 458 27.6 Fault-Tolerant RNS Arithmetic 459 Problems 460 References 463 PAST, PRESENT, AND FUTURE 464 28.1 Historical Perspective 464 28.2 An Early High-Performance Machine 466 28.3 A Modern Vector Supercomputer 468 28.4 Digital Signal Processors 469 28.5 A Widely Used Microprocessor 472 28.6 Trends and Future Outlook 473 Problems 475 References 477

27

28

Index

479

60

Residue Number Systems

4.3 ENCODING AND DECODING OF NUMBERS


Since input numbers provided from the outside (machine or human interface) are in standard binary or decimal and outputs must be presented in the same way, conversions between binary/decimal and RNS representations are required.

Conversion from binary/decimal to RNS


The binary-to-RNS conversion problem is stated as follows: Given a number y, find its residues with respect to the moduli mi, 0 ::::: i ::::: k - 1. Let us assume that y is an unsigned binary number. Conversion of signed-magnitude or 2's-complement numbers can be accomplished by converting the magnitude and then complementing the RNS representation if needed. To avoid time-consuming divisions, we take advantage of the following equality:

If we precompute and store (2j) m; for each i and j, then the residue Xi of y (mod mi) can be computed by modulo-mi addition of some of these constants. Table 4.1 shows the required lookup table for converting 10-bit binary numbers in the range [0, 839] to RNS(8 I 7 I 5 I 3). Only residues mod 7, mod 5, and mod 3 are given in the table, since the residue mod 8 is directly available as the 3 least significant bits of the binary number y.

Example 4.1 Represent y = (1010 0100) 1wo = (164) 1en in RNS(S 17 15 I 3). Theresidueofymod 8isx3 = (YzYJYo)two = (100) 1wo = 4. Since y = 2 7 +2 5 +2 2 ,

the required residues mod 7, mod 5, and mod 3 are obtained by simply adding the values stored in the three rows corresponding to j = 7, 5, 2 in Table 4.1:

= x1 = xo =
x2

(y)7

= (Y)s = (y)3 =

(2+4+4)7
(3

+ 2 + 4)s

(2 + 2 + l)3

=3 =4 =2

Therefore, the RNS(S I 7 I 5 I 3) representation of (164) 1en is (4 I 3 I 4 I 2)RNS

In the worst case, k modular additions are required for computing each residue of a k-bit number. To reduce the number of operations, one can view the given input number as a number in a higher radix. For example, if we use radix 4, then storing the residues of 4i, 2 x 4i and 3 x 4i in a table would allow us to compute each of the required residues using only k/2 modular additions. The conversion for each modulus can be done by repeatedly using a single lookup table and modular adder or by several copies of each arranged into a pipeline. For a low-cost modulus m = 2a - 1, the residue can be determined by dividing up y into a-bit segments and adding them modulo 2a - 1.

64

Residue Number Systems


TABLE 4.2 Values needed in applying the Chinese remainder theorem to RNS(8171513)
llij

Xj

(Mi{aiXdm;)M

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 0 1 2 3 4 0 1 2

0 105 210 315 420 525 630 735 0 120 240 360 480 600 720 0 336 672 168 504 0 280 560

To avoid multiplications in the conversion process, we can store the values of (Mi (ai Xi )mi) M for all possible i and Xi in tables of total size 'L~:-6 mi words. Table 4.2 shows the required values for RNS(8171513). Conversion is then performed exclusively by table lookups and modulo-M additions.

4.4 DIFFICULT RNS ARITHMETIC OPERATIONS


In this section, we discuss algorithms and hardware designs for sign test, magnitude comparison, overflow detection, and general division in RNS. The first three of these operations are essentially equivalent in that if an RNS with dynamic range M is used for representing signed numbers in the range [- N, P], with M = N + P + 1, then sign test is the same as comparison with P and overflow detection can be performed based on the signs of the operands and that of the result. Thus, it suffices to discuss magnitude comparison and general division. To compare the magnitudes of two RNS numbers, we can convert both to binary or mixedradix form. However, this would involve a great deal of overhead. A more efficient approach is through approximate CRT decoding. Dividing the equality in the statement of Theorem 4.1 by M, we obtain the following expression for the scaled value of x in [0, 1):

S-ar putea să vă placă și