Sunteți pe pagina 1din 9

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Double Logarithmic Arithmetic Technique for


Low-Power 3-D Graphics Applications
Dina M. Ellaithy, Magdy A. El-Moursy, Ghada H. Ibrahim, Amal Zaki, and Abdelhalim Zekry

Abstract An energy efficient double logarithmic arith- Complex 3-D scenes can be represented using lists of vertices,
metic (DLA) technique is proposed for 3-D graphics applications. texture images, and equivalent camera movements, producing
DLA manipulates the logarithmic arithmetic and improves the high data-compression ratios [1]. General purpose proces-
architecture for the realization of the transcendental functions
and the advanced lighting model using energy efficient tech- sors are commonly used in graphics applications. In con-
niques. The DLA features complete elimination of multipliers in trast to general processors, typical 3-D graphical processing
logarithmic domain by using successive logarithmic converters. units (GPUs) handle complex arithmetic calculations, such
DLA demonstrates up to 56% reduction in power consumption as division, reciprocating, squaring, square-rooting, and expo-
as compared to the existing techniques. The main advantage of nentiation operations. These functions use most of the clock
this approach is the ability to perform the complex functions
using power-efficient, area-efficient, as well as high frequency cycles in real-time 3-D graphics systems. As a result of that,
design. The proposed technique performs transcendental func- they consume most of the computing power [2]. Handheld
tions using multiplier free hardware architecture. Moreover, 3-D graphics processors adopt programmable functionality.
based on nonuniform subdivisions and piecewise linear approxi- Those processors are mainly composed of vertex and pixel
mation, novel logarithmic and antilogarithmic converters are also shaders to support various advanced graphics effects. These
proposed. These converters achieve optimal power consumption
as compared to several recent approaches. The proposed convert- shaders perform complex vector, elementary, and transcen-
ers provide low relative error with less nonuniform subdivisions. dental functions, such as vector multiply-and-add, division,
Up to 19%, 12%, and 20% reduction in relative error, area, and exponentiation, logarithm to any base, and trigonometric
power consumption are achieved, respectively. functions (TRG) [3]. It can be claimed that, low-power
Index Terms Antilogarithmic converter, double logarith- consumption is one of the most important bottlenecks for
mic arithmetic (DLA) unit, graphical processing unit (GPU), wireless applications because of the limited battery lifetime
logarithmic arithmetic unit, logarithmic converter, low power. and complex cooling techniques. Cooling has become a real
bottleneck with the increased sophistication of mobile devices.
I. I NTRODUCTION Digital electronic circuits tend to become much less reliable at
high operating temperatures. Accordingly, reducing the power
T HE conventional implementation of an arithmetic unit
focuses primarily on maximizing the speed of realizing
computationally intensive real-time functions, such as gaming,
consumption of the arithmetic unit inside the programmable
shaders is the main target for the optimization effort. Since
graphics, GPS navigation, and video compression. However, logarithmic number system (LNS) affects the signal activity
the emerging need to perform the computation with low- and the strength of the operators, using logarithmic numbers
power consumption is barely taken into account. The fast can lead to large savings in power consumption [4]. In certain
growth of the mobile electronics market and the transition from cases, the mean bit assertion probability is reduced by above
text-based applications to versatile multimedia applications 50% if LNS is employed, which leads to significant power dis-
resulted in increasing the popularity of mobile communi- sipation reduction [5]. Therefore, using logarithmic arithmetic
cation devices. Real-time 3-D graphics are becoming one unit for GPU reduces the power dissipation and simplifies
of the most interesting applications in mobile workstations the arithmetic calculations [2][12]. The overall architecture
due to their benefits for gaming, advertising, marketing, and of logarithmic arithmetic units still includes programmable
avatars. Also, for bandwidth-constrained wireless applications multipliers, which are usually implemented using booth mul-
such as complex 3-D scenes, 3-D graphics are very helpful. tipliers. Booth multipliers are still needed to perform complex
functions [6][10].
Manuscript received September 22, 2016; revised December 1, 2016 and Double logarithmic arithmetic (DLA) with reduced hard-
January 17, 2017; accepted January 26, 2017. ware complexity for energy efficient GPU arithmetic is pro-
D. M. Ellaithy G. H. Ibrahim A. Zaki are with the Microelectronics
Department, Electronics Research Institute, Cairo, Egypt (e-mail: dina_ posed. The architecture of the vertex shader with DLA is
elessy@eri.sci.eg; ghadahamdy@yahoo.com; amalzaki@gmail.com). shown in Fig. 1. Logarithmic and antilogarithmic converters
M. A. El-Moursy is with the Design Creation Division, Mentor Graphics, are the main blocks of DLA. Computation error and power
Cairo, Egypt and also with Microelectronics Department, Electronics Research
Institute, 12622, Cairo, Egypt (e-mail: magdy_el-moursy@mentor.com). consumption are ongoing challenges for researchers involved
A. Zekry is with Electronics and Communications Department, Ain Shams in 3-D graphics applications. For this sake, a new set of con-
University, Cairo, Egypt (e-mail: aaazekry@hotmail.com). verters are proposed. Compared to several recent approaches,
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. the proposed converters have lower relative error with simpler
Digital Object Identifier 10.1109/TVLSI.2017.2667714 hardware implementation which reduce the overall area and
1063-8210 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 2. Original architecture of the logarithmic arithmetic unit.

division, squaring, and square rooting can be obtained by


simple addition, subtraction, and shift operations. Previously,
complicated functions such as exponentiation, logarithm to
any base, and TRG are converted into multiplication, which
is usually implemented using booth multiplier, as reported
in [4][12]. At the first stage, the operands of the inputs are
converted into the logarithmic number format. Then, linear
addition, subtraction, shift and multiplication are performed.
At the end, the results which are computed in logarithmic
domain are transformed to fixed point by the antilogarithmic
converter, as shown in Fig. 2. Especially for large operands,
multiplication has always been area, time, and power con-
suming mathematical process [4]. In Section III, the proposed
approach to reduce power consumption by completely elimi-
Fig. 1. Architecture of the vertex shader with DLA. nating multiplication is presented.
On the other hand, logarithmic and antilogarithmic convert-
power consumption. The proposed DLA hardware occupies ers are the core of the logarithmic arithmetic. The logarith-
smaller area and consumes less power and is suited for mobile mic converter is responsible for converting the input binary
applications. The rest of this paper is arranged as follows. The number to logarithmic number. A straight-line approximation
previous research in this area is presented in Section II. DLA is defined by Mitchell to determine the binary logarithm
scheme is explained in Section III. In Section IV, the proposed for power-of-two segments [13]. The converter by Mitchell
logarithmic and antilogarithmic converters are presented. Error has a simple architecture which results in a small hard-
analysis and comparison with other techniques are demon- ware implementation with large conversion error. A simple
strated in Section V. Hardware implementation and synthesis piecewise linear approximation by Abed and Siferd [14] is
results of the proposed technique along with the comparison used for logarithmic conversion. Abed and Siferd [14] use
with the previous research are presented in Section VI. Finally, combinational logic in the implementation to achieve low
Section VII contains some conclusions. power and high speed while reducing the maximum percentage
error of the conversion. Several approaches were adopted to
II. BACKGROUND optimize logarithmic converters accuracy, power consumption,
The 3-D graphics rendering is the main workload of GPU. and area.
GPU rendering is responsible for projecting 3-D scene onto Antilogarithmic converters are responsible for transforming
2-D screen. Basically, the 3-D graphics rendering pipeline is the computing results in the logarithmic domain to the final
composed of vertex shader and pixel shader. Transformation fixed point domain. Several researches have been done on
and lighting functions of geometric vertices are handled by antilogarithmic converters based on the straight-line and piece-
the vertex shader. Pixel shader takes the generated prepixel wise linear approximation representations. Based on linear
data, and uses linear equation and linear interpolation to approximations, Mitchell [13] presented an efficient conver-
determine the actual color of the pixels. Thereafter, skinning sion method. However, the provided scheme does not have
and mapping are carried out [12]. Realization of these various sufficient accuracy. Many converter algorithms are proposed
kinds of complex operations with limited power resources is to increase the conversion accuracy on the price of complexity
a challenge. Power and area overheads for the programmable and hardware implementation difficulty. For low power,
3-D graphics pipeline can be reduced in logarithmic arith- Abed et al. [15] have developed and implemented
metic scheme. In logarithmic arithmetic unit, multiplication, antilogarithm correction algorithm which slightly decreases
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ELLAITHY et al.: DLA TECHNIQUE 3

Fig. 3. Fixed-point format Qm.n of the x number.

the accuracy. The proposed DLA in this paper might be


useless without improving the logarithmic and antilogarithmic
converters accuracy. A wide range of studies in this area have
been done so far [14][24]. Digital recurrence, ROM or lookup
table, and piecewise approximation are among the primary
used techniques for logarithmic and antilogarithmic converters
approximations. Piecewise linear approximation techniques
give good tradeoff between the computation error and the
complexity of hardware and the power consumption. They are
Fig. 4. Architecture of the DLA for exponentiation operation.
widely used in many applications [15], [18], [20], [21], [23].
A new piecewise linear approximation technique is proposed
in order to achieve low conversion error using simple The DLA for exponentiation function is shown in Fig. 4.
hardware for logarithmic and antilogarithmic converters in The two inputs, x and y are converted into the LNS/format,
Section IV. The proposed converters make tradeoff between which requires one logarithmic converter for the input y to get
accuracy and hardware complexity by using nonuniform log2 y. A pair of cascaded logarithmic converters for the input
segmentation and less conversion terms in each segment x are used to get log2 log2 x. Linear additions are carried out by
compared to prior research [2], [3], [18], [20][23]. For being adding log2 y and log2 log2 x. Finally, the result of the addition
accurate and efficient in power, the conversion coefficients in LNS/format is converted to fixed point number through two
can be realized by few shift and add operations as shown in successive stages of antilogarithmic converters. DLA adopts
this paper. unifying the implementation of transcendental functions to a
definite arithmetic unit using a simple hardware to maintain the
III. P ROPOSED DLA A PPROACH small size and to achieve low-power design as in the following
A second logarithm stage is added in DLA to eliminate section.
the need for any multiplier in logarithmic domain. The
second stage replaces multiplications by additions in a second B. Transcendental Functions
logarithmic domain. Thus, large amount of power reduc- The transcendental functions such as, exponentiation of
tion is achieved. The 3-D graphics programmable shaders arbitrary exponent, logarithm to any base, trigonometric,
require complex transcendental, and lighting model functions inverse trigonometric, hyperbolic, or inverse hyperbolic func-
to provide various graphics effects [7]. The implementation tions can be converted into addition in LNS as follows.
of these functions in the proposed technique is presented in
1) The exponentiation function can be computed using
Sections III-AIII-C.
addition in LNS according to
[(log2 y)+(log2 log2 x)]
A. DLA Architecture x y = 22 (1)
In DLA, the input operands are transformed into the loga- where x and y are the input operands. The results of the
rithmic domain. Some operands go through single logarithmic addition are transformed into fixed-point domain by two
stage and others go through double logarithmic stages. Then, stages of antilogarithm converters.
linear addition, subtraction, and shift are performed in the 2) The logarithm to base b, can be converted into addition,
logarithmic domain. The results in the logarithmic domain according to
are converted into fixed point domain by the antilogarithmic
logb x = 2[(log2 K )+(log2 log2 x)] (2)
converter.
1
The transformation between the fixed-point and logarithmic k = . (3)
numbers has conversion error. It is shown in this paper that log2 b
the error could be manipulated to make DLA efficient with 3) The TRG can be described by definite Taylor series
low power and limited error. Fixed-point DLA system can expansions. Concerning the initial five terms of the
be effective for handheld 3-D graphics systems that have a Taylor series calculation, a general process of the TRG
small screen accuracy. This makes DLA efficient in trading- is defined as
off accuracy with power. As shown in Fig. 3, a fixed-point
TRG = c0 x k0 c1 x k1 c2 x k2 c3 x k3 c4 x k4 (4)
number x has the configuration of Qm.n, where m and n
correspond to the integer and fraction number of bits. The where {+, } ci is real, ki are integer constants,
five most significant bits represent the integer part and the and i {0:4}. Using this generic equation represen-
remaining twenty-seven bits represent the fraction part. tation (for example, Taylor series approximations as
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

si n x = x x3/3! + x5/5! x7/7! + x9/9!, and


cos x = 1 x2/2! + x4/4! x6/6! +
x8/8!, the calculation of the exponentiations
which is required for this process can be
transformed to addition in LNS by convert-
ing (4) to (5)
TRG = 2log2 c0 +A0 2log2 c1 +A1 2log2 c2 +A2
2log2 c3 +A3 2log2 c4 +A4 (5)
[(log2 K i )+(log2 log2 x)]
Ai = 2 . (6)
Each term in (5) requires only addition in LNS with Fig. 5. Proposed algorithm for logarithmic/antilogarithmic converters.
constant log2 ci , avoiding using multipliers.
For the logarithmic conversion, the error increases as the
input value is close to 1 so a fine segments are needed
C. Lighting Model Equations
close to 1. Starting approximately at the middle of the
Inside 3-D graphics, lighting process determines the final antilogarithmic conversion, coarser steps make it faster to find
pixel color from the direction of the light source, 3-D object, the best tradeoff between accuracy and hardware complexity.
and camera [11]. In most of the rendering, illuminations The segments and the conversion coefficients of each seg-
in vertex shader inject intensive effort-load [12], [25], [26]. ment are obtained using MATLAB software by matching the
The Phong illumination pattern is used in general to achieve approximation coefficients to the actual values. The obtained
prevertex lights [12]. For Phong pattern, color C of a vertex segments and coefficients simplify the hardware implemen-
is resolved by tation with best accuracy and minimum segments. In Fig. 5,
the proposed algorithm that is used to obtain the sub-segments
C = Camb + {(N L)Cdiff } + {(N H)Cpow Cspec (7)
and the conversion coefficients is shown. In the next subsec-
where Camb is the ambient light, N is normal vector of vertex, tions, the proposed logarithmic/antilogarithmic converters are
L is direction vector of light, Cdiff is the diffuse light, and presented.
H is half vector between direction vector of eye and normal
vector of vertex. Cpow is shinny factor, which is determined by A. Proposed Logarithmic Converters
texture. The last term Cspec is computed for specular lighting
by complex power operation. A novel set of nonuniform piecewise linear approximation
The lighting equation involves a very large amount of logarithmic converters are presented in this section. The input
calculations and very complicated arithmetic processes. They x can be presented by (12) and its logarithm by (13)
consist of multiplication and exponentiation that consume x = 2k (1 + f ) (12)
large amount of power [11]. In the presented approach,
the multiplication and the exponentiation are converted into log2 x = k + log2 (1 + f ) (13)
addition by transforming (7) into where k is the characteristic or integer portion which repre-
[(log2 A1 )+(log2 C diff )] [(log2 A3 )+(log2 C spec )] sents the place of the most significant bit of x and f is the
C = Camb + 2 +2 (8)
[(log2 N)+(log2 L)] fraction portion. Based on piecewise linear approximation
A1 = 2 (9)
A2 = 2[(log2 N)+(log2 H )] (10) log2 (1 + f ) f + f + (14)
2[(log2 Cpow )+(log2 log2 A2 )]
A3 = 2 . (11) where and are the conversion coefficients which are
In Section IV, the proposed logarithmic and antilogarithmic distinct for each segment. In this paper, two lower error
converters are presented. logarithmic converters are proposed having either nine or six
regions. The nonuniform regions and the coefficient values
IV. L OGARITHMIC AND A NTILOGARITHMIC C ONVERTERS for each region are listed in Tables I and II for nine and six
regions, respectively. Column 1 in Tables I and II contains the
DLA increases the error in the processed data. The nonuniform regions which are used to obtain the approximate
primary building blocks of the DLA are the logarith- value of log2 . The proposed conversion coefficient values
mic and antilogarithmic converters. Since logarithmic and and for each region are listed in columns 2 and 3, respec-
antilogarithmic curves are nonlinear, large uniform segments tively. The proposed antilogarithmic converters are presented
are needed to provide good precision. To overcome the in Section IV-B.
straight-line limitations on accuracy, nonuniform segments are
required [12], [21]. The selection of the conversion coeffi-
cients has a great impact on the hardware complexity. The B. Proposed Antilogarithmic Converters
nonuniform segments and the conversion coefficients for each A novel set of nonuniform piecewise linear approximation
segment are selected by using a linear programming to make antilogarithmic converters are presented in this section. The
a compromise between accuracy and hardware complexity. logarithmic number x can be presented by (15) and its
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ELLAITHY et al.: DLA TECHNIQUE 5

TABLE I TABLE III


P ROPOSED N INE N ONUNIFORM R EGIONS L OGARITHMIC P ROPOSED E IGHT N ONUNIFORM R EGIONS A NTILOGARITHMIC
C ONVERTER W ITH I TS C ONVERSION C OEFFICIENT C ONVERTER W ITH I TS C ONVERSION C OEFFICIENT

TABLE IV
P ROPOSED F OUR N ONUNIFORM R EGIONS A NTILOGARITHMIC
TABLE II
C ONVERTER W ITH I TS C ONVERSION C OEFFICIENT
P ROPOSED S IX N ONUNIFORM R EGIONS L OGARITHMIC
C ONVERTER W ITH I TS C ONVERSION C OEFFICIENT

antilogarithm by (16)
x =k+ f (15)
2 = 2 .2
x k f
(16)
where k is the integer portion and f is the fraction portion.
Based on piecewise linear approximation
2f f + (17) Fig. 6. Error of logarithmic converters with proposed nine-region.
where and are the conversion coefficients which are
distinct for each segment. In this paper, two low error antilog- proposed nine-region logarithmic converter are shown. The
arithmic converters are proposed with either eight or four error graphs for [14], [18], and [21], and the proposed six-
regions. The nonuniform regions and the coefficient values for
region logarithmic converter are shown in Fig. 7. The error
each region are listed in Tables III and IV. The nonuniform graphs of previous work for antilogarithmic converters and
regions which are used to obtain the approximate value of 2 x the proposed one are shown in Figs. 8 and 9. The work
are listed in column 1 of Tables III and IV. Columns 2 and 3
in [2], [3], [15], [20], [22], and [23], and the proposed eight
contain the proposed conversion coefficient values and and four regions antilogarithmic converters are compared.
for each region. In the next section, error analysis and com-
It is shown in the error graphs for logarithmic converters
parisons with the recent techniques are presented.
that most of previous techniques suffer from large percent
error in the first region (in Fig. 6 the first region was not
V. E RROR A NALYSIS AND C OMPARISONS taken into consideration). Selina [23] uses a lot of terms for
For high precision, several previous converters were pro- the conversion coefficients to achieve higher precision which
posed [2], [3], [14], [15], [18], [20][23]. MATLAB software leads to more power consumption, larger area and hardware
is used to obtain the error for the previous work and the complexity. From the next Figs. 69, it is shown that the
proposed one using different conversion coefficients. In Fig. 6, proposed logarithmic and antilogarithmic converters achieve
the error graphs for [2], [3], [18], [21], and [22], and the higher accuracy as compared to prior work. The error is
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE V
E RROR C OMPARISON B ETWEEN THE P ROPOSED L OGARITHMIC
C ONVERTERS AND P REVIOUS W ORK

Fig. 7. Error of logarithmic converters with proposed six-region.

TABLE VI
E RROR C OMPARISON B ETWEEN THE P ROPOSED A NTILOGARITHMIC
C ONVERTERS AND P REVIOUS W ORK

Fig. 8. Error of antilogarithmic converters with proposed eight-region.

Fig. 9. Error of antilogarithmic converters with proposed four-region.

defined by
 
(approximated value - exact value)
error% = 100. eight-region and four-region antilogarithmic converters. Since
exact value
the conversion coefficients values are combination of power
(18)
of two, logarithmic/antilogarithmic converters can be imple-
The results are summarized in Tables V and VI. The first mented using shift-and-add approach. Hardware structure of
column contains the used technique. The number of segments the proposed logarithmic converters is shown in Fig. 10(a).
is listed in the second column. The maximum positive error The logarithmic converter scheme is based on leading one
and the minimum negative error are listed in third and fourth detector that determines the characteristic value k. The bits
columns, respectively. The last column contains the difference following the leading one are then shifted and added by the
between the values of third and fourth columns which is shifter/adder block to obtain the required approximation loga-
defined as the error range. rithmic value. The adder is implemented using carry save adder
As compared to many previous techniques, the proposed 3:2 or 4:2 and carry propagating adder (CPA). In Fig. 10(b),
converters achieve higher precision. The accuracy is increased the hardware structure of the proposed antilogarithmic con-
by up to 19% for both the proposed nine-region and six-region verters is shown. Similarly, the structure of the antiloga-
logarithmic converters, and by up to 11% for both the proposed rithmic converters is based on the sum of shifts approach.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ELLAITHY et al.: DLA TECHNIQUE 7

TABLE VIII
C OMPARISON B ETWEEN THE P ROPOSED A NTILOGARITHMIC
C ONVERTERS AND P REVIOUS W ORK IN T ERMS OF
P OWER , A REA , AND D ELAY

Fig. 10. (a) Architecture of the proposed logarithmic converters.


(b) Architecture of the proposed antilogarithmic converters.
TABLE VII
C OMPARISON B ETWEEN THE P ROPOSED L OGARITHMIC C ONVERTERS AND
P REVIOUS W ORK IN T ERMS OF P OWER , A REA , AND D ELAY

TABLE IX
C OMPARISON B ETWEEN THE DLA AND P REVIOUS W ORK IN
T ERMS OF P OWER , A REA , D ELAY, AND AVERAGE E RROR

The integer part contains the value of the characteristic k


which is used for the final right shifter to obtain the approxi-
mation antilogarithmic result.

VI. H ARDWARE I MPLEMENTATION AND


S YNTHESIS R ESULTS
The proposed converters and all previous work are imple-
mented at a structural level Very High Speed Integrated
Circuit Hardware Description Language. Different logarithmic
and antilogarithmic techniques are synthesized using 90 nm with 19% higher precision. The proposed six-region loga-
CMOS technology, standard cell library with the Synopsys rithmic converter achieves more than 3% reduction in power
Design Compiler. The power consumption, area, and delay consumption as compared with De Caro et al. [18] technique
for the proposed techniques are compared with prior work with 19% higher precision.
and are summarized in Tables VIIIX. It can be noted that For antilogarithmic converters in Table VIII, the proposed
for logarithmic converters in Table VII, the proposed nine- eight-region achieves more than 35% and 26% saving in power
region logarithmic converter achieves saving of at least 20% in and area, respectively as compared to Selina [23] technique
power consumption as compared to Lastras and Parhami [22] with 9% lower precision. As compared to Kim et al. [2]
technique with 51% higher precision. As compared with technique, more than 23%, 8%, and 18% reduction in
Nam et al. [3] technique, at least 54%, 43%, and 29% area, delay, and precision are achieved, respectively, with
saving in power, area, and delay, are achieved respectively, approximately the same power consumption. The proposed
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

four-region antilogarithmic converter achieves more than 12% by up to 56%, 40%, and 48%, respectively as compared to
reduction in power consumption and area as compared previous best results with less than 0.6% increase in error.
to Kuo and Parhami [20] technique with 11% higher
precision. As compared to Selina [23] technique, more R EFERENCES
than 42% and 43% saving in power and area are achieved [1] J.-H. Sohn, Y.-H. Park, C.-W. Yoon, R. Woo, S.-J. Park, and
with 23% lower precision. On average, the proposed nine- H.-J. Yoo, Low-power 3D graphics processors for mobile terminals,
region logarithmic converter achieves saving up to 35%, 9%, IEEE Commun. Mag., vol. 43, no. 12, pp. 9099, Dec. 2005.
[2] H. Kim, B.-G. Nam, J.-H. Sohn, J.-H. Woo, and H.-J. Yoo, A 231-MHz,
and 8% in power, area, and delay, respectively. On average, 2.18-mW 32-bit logarithmic arithmetic unit for fixed-point 3-D graphics
saving of the proposed eight-region antilogarithmic converter system, IEEE J. Solid-State Circuits, vol. 41, no. 11, pp. 23732381,
is up to 8%, and 17% in power and area. For the proposed Nov. 2006.
[3] B.-G. Nam, H. Kim, and H.-J. Yoo, Power and area-efficient uni-
four-region antilogarithmic converter, the average saving is up fied computation of vector and elementary functions for handheld 3D
to 43%, and 27% in power and area. graphics systems, IEEE Trans. Comput., vol. 57, no. 4, pp. 490504,
32-bit fixed-point DLA unit is also implemented at a struc- Apr. 2008.
[4] P. Bulic, Fixed-point multiplication and division in the logarith-
tural level by VHDL. It is also synthesized by the Synopsys mic number system: A way to low-power design, J. Microelectron.,
Design Compiler. A CPA is used for the addition that is Electron. Compon. Mater., vol. 43, no. 4, pp. 203211, Dec. 2013.
shown in Fig. 4 to obtain the final results in the logarithm [5] V. Paliouras and T. Stouraitis, Low-power properties of the logarithmic
number system, in Proc. IEEE Comput. Arithmetic Conf., Jun. 2001,
domain in DLA. A booth multiplier is used for comparison pp. 229236.
with the previous work that is shown in Fig. 2. The proposed [6] B.-G. Nam, H. Kim, and H.-J. Yoo, A low-power unified arithmetic
technique is implemented using 90 nm CMOS technology, unit for programmable handheld 3-D graphics systems, in Proc. IEEE
Custom Intergr. Circuits Conf., Sep. 2006, pp. 535538.
1 V supply voltage standard cell library, and is running [7] B.-G. Nam, H. Kim, and H.-J. Yoo, A 210 MHz 15 mW unified
at 100 MHz. The proposed technique reduces both power vector and transcendental function unit for handheld 3-D graphics
dissipation and area. A comparison between DLA and prior systems, in Proc. IEEE Asian Solid-State Circuits Conf., Nov. 2006,
pp. 9598.
designs for exponentiation operation, using same technology [8] B.-G. Nam and H.-J. Yoo, A 28.5 mW 2.8 GFLOPS floating-point
is summarized in Table IX. The first column contains the multifunction unit for handheld 3D graphics processors, in Proc. IEEE
technique name. Columns 2 to 5 contain power consumption, Asian Solid-State Circuits Conf., Nov. 2007, pp. 376379.
[9] S. F. Hsiao, C. F. Chiu, and C. S. Wen, Design of a low-cost floating-
area, delay, and average error, respectively. In the first com- point programmable vertex processor for mobile graphics applications
parison, rows 2 and 3, the proposed six-region logarithmic based on hybrid number system, in Proc. IEEE Int. Conf. IC Design
converter is used with the proposed four-region antilogarithmic Technol., May 2011, pp. 14.
[10] B. G. Nam and H. J. Yoo, An embedded stream processor core
converter due to the simplicity of hardware implementation. based on logarithmic arithmetic for a low-power 3-D graphics
In the second comparison, row 5, the proposed nine-region SoC, IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 15541570,
logarithmic converter and the proposed eight-region antiloga- May 2009.
[11] J. Ahn, S. Choi, and B.-G. Nam, A reconfigurable lighting engine
rithmic converter are used which gives a good tradeoff between for mobile GPU shaders, J. Semicond. Technol. Sci., vol. 15, no. 1,
conversion error and hardware complexity. pp. 145149, Feb. 2015.
Using DLA, the average error is increased by less than 0.6% [12] B. T. Phong, Illumination for computer generated pictures, Commun.
ACM, vol. 18, no. 6, pp. 311317, Jun. 1975.
with a reduction in power of up to 51%, and a reduction [13] J. N. Mitchell, Computer multiplication and division using binary
in area, and delay larger than 40% and 48%, respectively. logarithms, IRE Trans. Electron. Comput., vol. EC-11, no. 4,
The DLA technique provides saving in power, area, and delay pp. 512517, Aug. 1962.
[14] K. H. Abed and R. E. Siferd, CMOS VLSI implementation of a low-
by at least 43%, 7%, and 45%, respectively, as compared to power logarithmic converter, IEEE Trans. Comput., vol. 52, no. 11,
conventional techniques. pp. 14211433, Nov. 2003.
[15] K. H. Abed and R. E. Siferd, VLSI implementation of a low-
power antilogarithmic converter, IEEE Trans. Comput., vol. 52, no. 9,
VII. C ONCLUSIONS pp. 12211228, Sep. 2003.
[16] T. B. Juang, S. H. Chen, and H. J. Cheng, A lower error and ROM-
DLA is proposed to perform complex functions and light- free logarithmic converter for digital signal processing applications,
ing model equations without the need for multiplications in IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 12, pp. 931935,
Dec. 2009.
GPUs. The proposed technique can perform the transcen- [17] T.-B. Juang, P. K. Meher, and K.-S. Jan, High-performance logarithmic
dental operations using simple hardware architecture. Loga- converters using novel two-region bit-level manipulation schemes, in
rithmic and antilogarithmic converters based on nonuniform Proc. IEEE VLSI Design, Autom., Test Conf., Apr. 2011, pp. 390393.
[18] D. De Caro, N. Petra, and A. G. M. Strollo, Efficient logarithmic con-
segments and piecewise linear approximations are enhanced verters for digital signal processing applications, IEEE Trans. Circuits
in this paper. The proposed converters achieve lower error Syst. II, Exp. Briefs, vol. 58, no. 10, pp. 667671, Oct. 2011.
ranges as compared to prior recent proposed techniques. [19] C.-T. Kuo and T.-B. Juang, A lower error antilogarithmic converter
using novel four-region piecewise-linear approximation, in Proc. IEEE
Up to 19% and 11%, reduction in error is achieved with the Circuits Syst. Conf., vol. 2. Dec. 2012, pp. 507510.
proposed logarithmic converter, and antilogarithmic converter, [20] C.-T. Kuo and T.-B. Juang, Lower-error antilogarithmic converters
respectively. The proposed logarithmic converters provide sav- using binary error searching schemes, Int. J. Innov. Technol. Exploring
Eng., vol. 3, no. 7, pp. 95101, Dec. 2013.
ing of at least 20% in power. At least 13% and 12% saving in [21] D. De Caro, M. Genovese, E. Napoli, N. Petra, and A. G. M. Strollo,
power and area are achieved, respectively, for the proposed Accurate fixed-point logarithmic converter, IEEE Trans. Circuits
antilogarithmic converters. DLA is implemented using the Syst. II, Exp. Briefs, vol. 61, no. 7, pp. 526530, Jul. 2014.
[22] M. Lastras and B. Parhami, A logarithmic approach to energy-efficient
90-nm CMOS technology. Avoiding the multiplication in GPU arithmetic for mobile devices, in Proc. Asilomar Conf. Signals,
logarithmic domain leads to reduction in power, area and delay Syst., Comput., Nov. 2013, pp. 21772180.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ELLAITHY et al.: DLA TECHNIQUE 9

[23] R. R. Selina, VLSI implementation of piecewise approximated Ghada H. Ibrahim received the B.Sc., M.Sc., and
antilogarithmic converter, in Proc. Int. Conf. Commun. Signal Ph.D. degrees from the Electronics and Communi-
Process. (ICCSP), Apr. 2013, pp. 763766. cations Department, Faculty of Engineering, Cairo
[24] S. L. SanGregory, C. Brothers, D. Gallagher, and R. Siferd, A fast, low- University, Cairo, Egypt, in 2000, 2005, and 2013,
power logarithm approximation with CMOS VLSI implementation, in respectively.
Proc. IEEE Circuits Syst. Conf., vol. 1. Aug. 1999, pp. 388391. She was with the Bahgat Group for MPEG
[25] F. Sheikh et al., A 2.05 GVertices/s 151 mW lighting accelerator for 3D Decoder Chip Design, Research and Development
graphics vertex and pixel shading in 32 nm CMOS, IEEE J. Solid-State Department, and then was with the Microelectronics
Circuits, vol. 48, no. 1, pp. 128139, Jan. 2013. Department, Electronics Research Institute, Cairo,
[26] J. H. Woo, J. H. Sohn, H. Kim, and H. J. Yoo, A 152-mW El-Tahrir Str., -Dokki, Giza, -Egypt, where her
mobile multimedia SoC with fully programmable 3-D graphics and research interests focused on RF circuit design for
MPEG4/H.264/JPEG, IEEE Trans. Very Large Scale Integr. (VLSI) wireless sensor networks applications. She was with MIMOS BHD, Malaysia,
Syst., vol. 17, no. 9, pp. 12601266, Sep. 2009. from 2006 to 2008, where she contributed in the establishment of an RF
Team for the design of a WIMAX RF transceiver chip. She has authored or
co-authored over 11 published papers and one pending patent. Her current
Dina M. Ellaithy received the B.S. degree in research interests include low-power digital circuits, RFIC design, printed
electronics and communications engineering and the
electronics and Micro-Electro-Mechanical Systems design, and microfabrica-
M.S. degree in electrical and electronics engineering tion.
from Ain Shams University, Cairo, Egypt, in 2005
and 2013, respectively, where she is currently pur-
suing the Ph.D. degree in electrical and electronics
engineering.
From 2007 to 2013, she was a Teaching Assistant
with the Electronics and Communications Depart-
ment, Modern Academy for Engineering and Tech-
nology, Cairo. Since 2014, she has been a Research
Assistant with the Electronics Research Institute, Cairo. Her current research
interests include low-power circuit design, low-power arithmetic, low-power
design and implementation of 3-D graphics processors, analog integrated
circuit design, and phase-locked loop design.
Amal Zaki is a Full Professor at Electronics
Magdy A. El-Moursy was born in Cairo, Egypt, Research Institute ERI, Cairo, Egypt, Chair VLSI
in 1974. He received the B.S. degree (Hons.) in Department, Electronics Research Institute, since
electronics and communications engineering and the 2011. She was the head of OBC and data handling
masters degree in computer networks from Cairo subsystem of satellite division at Egyptian Space
University, Cairo, in 1996 and 2000, respectively, Program in National Authority for Remote Sensing
and the masters and Ph.D. degrees in electrical engi- and Space Sciences. Her research interests include
neering in the area of high-performance VLSI/IC digital circuit design, computer architecture, mixed
design from the University of Rochester, Rochester, signal VLSI, and MEMS.
NY, USA, in 2002 and 2004, respectively.
In 2003, he joined STMicroelectronics, Advanced
System Technology, San-Diego, CA, USA. From
2004 to 2006, he was a Senior Design Engineer with Portland Technology
Development, Intel Corporation, Hillsboro, OR, USA. From 2006 to 2008, he
was an Assistant Professor with the Information Engineering and Technology
Department, German University in Cairo, Cairo. From 2008 to 2010, he was
a Technical Lead with the Mentor Hardware Emulation Division, Mentor
Graphics Corporation, Cairo, where he is currently a Staff Engineer with the
Design Creation and Synthesis Division. He is an Associate Professor with the
Microelectronics Department, Electronics Research Institute, Cairo. He has
authored over 70 papers, five book chapters, and three books in the fields
of high-speed and low-power CMOS design techniques and network-on-chip Abdelhalim Zekry was a Staff Member on sev-
(NoC)/system-on-chip (SoC). His current research interests include NoC/SoC, eral rewarded universities. He also supervised over
interconnect design and related circuit level issues in high-performance 70 master thesis and 25 doctorate. He is currently
VLSI circuits, clock distribution network design, digital ASIC circuit design, a Professor of Electronics with the Faculty of Engi-
VLSI/SoC/NoC design and validation/verification, circuit verification and neering, Ain Shams University, Cairo, Egypt. He has
testing, embedded systems, and low-power design. authored over 200 papers. His current research inter-
Dr. El-Moursy is an Associate Editor of the Editorial Board of the ests include the field of microelectronics and elec-
Microelectronics Journal (Elsevier), the International Journal of Circuits and tronic applications, including communications and
Architecture Design, and the Journal of Circuits, Systems, and Computers, photovoltaics.
and the Technical Program Committee of many IEEE Conferences, such as Dr. Zekry was a recipient of several prizes for his
ISCAS, ICAINA, PacRim CCCSP, ISESD, SIECPC, and IDT. outstanding research and teaching performance.

S-ar putea să vă placă și