Sunteți pe pagina 1din 6

IEEE Instrumentation and Measurement

Technology Conference
Budapest, Hungary, May 21-23,2001

Low power implementation of a Sigma Delta decimation filter for cardiac


applications
L. Ascari, A. Pierazzi, C. Morandi
Dip. Ingegneria dell’hformazione
University of Parma, Viale delle Scienze 181/A, 43100 Parma, ITALY
Ph. +39-0521-905814, Fax. +39-0521-905822, E-mail: ascari@ieee.org

Abstract - This paper reports on two main issues: low power architectures The high frequency signal coming from the modulator will be
for decimation filters, and simulation toolsfor their RTL-level power estima- converted into a multilevel signal at lower frequency by the
tion.
decimator: fs decreases by 16 times, reaching the value of
The final aim is the design of a decimator filter for a CA ADC, used in a 512Hz and the amplitude resolution increases from 1 bit to
pacemaker: in this field of application, the main request is a lowest power 10 bits.
consumption, in order to preserve the batteries.
In this work the attention is focused on the decimator circuit,
First, state-of-the-art designs arepresented, in order to choose the most suit-
able architectures for this purpose. being that the modulator has been designed at Padua University
( his power consumption is 1.4pW) E?]. The decimator is the
Matlab was used to achieve the desired performances, in terms of EA or- most area and power consuming subsystem in the converter.
der and sampling frequency, and to find the decimation ratio. Up to now Designers have the choice among many different architectures
three architectures were selected and implemented (at RTL level): the clas-
sic Hogenauer’s CIC (Cascade of Integrators Comb) structure and two implementing the same decimator, having very different per-
polyphase M u h Stage Multi Rate structures. All the topologies were im- formances in terms of circuit complexity, area occupation and
plemented in AHDL language and simulated in CADENCE environment, power dissipation. So the choice of the architecture is a deli-
to obtain a comparison in terms of the power dissipation, by adding for ev-
ery block a power estimation feature. The results of these simulations are cate task.
reported.
A method allowing a first rough quantitative estimation of the
KEYWORDS power dissipation of a certain architecture would be of great
help in this phase. The designer could compare different archi-
Power estimation, EA decimation, low power design. tectures and implement at circuital level only the best one.
I. INTRODUCTION This work presents such a method, applied on three architec-
tures: the classic Hogenauer’s CIC (Cascade of Integrators
Power dissipation is an issue of great concem in body im-
Comb) architecture and two Polyphase (PP) ones.
plantable devices, such Pacemakers. The less current absorp-
tion, the longer the mean time between battery change: if this
11. PRELIMINARIES
is not so much important for extemal devices, like hearing aids,
it’s essential for internal circuits, like pacemakers. A. Architectures
This paper deals with the acquisition and digitization of car-
The transfer function of a typical sinck decimator is:
diac signal. Thanks to the narrow bandwidth of cardiac sig-
nals (50-150 Hz), oversampling techniques are advantageous
in terms of Signal to Quantization Noise ratio; in fact there are
many other noticeable features that make these techniques in-
teresting, like the possibility of using common and cheap digi-
tal components instead of more expensive and accurate analog This function can be implemented by means of single rate sin-
ones. gle stage filters, but this solution is indicated only for low Deci-
mation factors D and low filter order k ; for higher D and k , cir-
The chosen ADC is a EA Converter: it is formed by two main cuit complexity and power consumption increase fast: in fact,
blocks: the modulator and the decimator. The modulator has a all the parts into the filter operate at high frequency (fg).
third order CRFBD (Cascade of Resonators, FeedBack form,
Delaying quantizer) architecture [?I; it oversamples the input A better solution is the use of multi-stage multi-rate archi-
signal at fs = 8192Hz, producing a two level digital output. tectures. They are based on the commutative rule: a generic

02001 IEEE
0-7803-6646-8/01/$10.00

750
H ( z D )is equivalent to a system consisting of a downsam- sine4 2048Hz sine4 512Hz
pler, whose decimation factor is D, followed by the trans-
): : :(: :
fer function H ( z ) . In this way, H ( z ) ’= can be
split in two sections: a cascade of k integrators implement-
ing H I ).( = (&)‘ and a cascade of k combs implement-
ing H~(z) = (1 - 2-l)’. The integrator and comb sections In this way, considering only one of the two decimation stages:
are separated by a downsampler, with a decimation factor D.
This is the’Hogenauer’s CIC architecture [?I. Its main posi- H ( Z ) = (1 +~ - l ) ~ ~ - 7 ~
(i+
tive characteristics are the high regularity and the absence of
multipliers and storage elements; but when the filter order or which can be implemented using the commutative rule, bring-
the decimation factor become high, these drawbacks appear: ing to the configuration of fig. 3:
the intemal word-length of the integrator section is large and
power consumption increases because this section operates at
high input oversampling frequency and with large word-length;
another effect is that the highest sampling rate is limited by the
high word-length, but not in this case, being the frequencies Figure 3. Principle of implementation of a s2nc4 block
involved very low.
By using polyphase decomposition, the block (1 + z-1)4 can
Hogenauer’s CIC decimation filter structure is represented in be expressed as:
fig. 1.
+ +
(1 z - ’ ) ~ = (1 62-2 z - ~ ) zP1(4 + + + 4z-’)
+
= E1(z2) Y 1 E 2 ( z 2 ) .

In this way E l ( z ) and E2(z) can be implemented, at a lower


frequency, instead of E1(z2) and E2(z2). To limit power
consumption in the realization of El (2) and E2 ( z ) ,multipli-
ers must be avoided, and substituted by shift-and-add opera-
tions. For instance, the coefficient 6 has been implemented as
+ +
4 2 = 22 2, by means of one “add” and two “shift”. In fig.
4 the architecture of the proposed decimator is reported.

Figure 1. Realization of Hogenauer’s CIC decimator with integrator and


comb sections. fie
I

Out@512Hz

Better results in terms of power consumption are obtained us-


ing multirate architectures. Starting from the same Transfer
Function, but performing different manipulations, we arrived
at the architectures described below.

A. 1 First Polyphase structure

First of all, recursive path are eliminated, expressing H ( z ) in


a “non recursive” form:
D-1 -n
H(z)= (Cn=o 1 Figure 4. Architecture of our low power decimator.
= (1 + .-1)’(1+ z-2)‘ . . . . . (1 + .-2+’)k
A.2 Second Polyphase structure
where D = 2 M .

The decimation factor must be 16 and was implemented by the Starting from ~ ( z=) = (1 + 2-1 + 2 - 2 + z - ~ ) ~ ,
cascade of two sinc4 stages, each having a decimation factor in some passages we arrive at:
of 4: as a consequence, D=4. The scheme of the decimator is
reported in fig. 2.
~ ( z=)~ ~ ( + Z - ~ E ~ ( +Z ~z -) ’ E ~ ( z ~+) z - ~ E ~ ( z ~ )
2 ~ )

75 1
where number of filters to be implemented ( E i ( z ) )has increased. It’s
a difficult task to preview which of the last two architectures
consumes the less. That’s why a system to compare their per-
formances in advance is important.
B. Low Level Implementation

Substantial power savings can be made also in the low-level


The corresponding architecture is reported in fig. 5. implementation of the previous architectures. At first, the logic
chosen to implement adders must be presented.
In @ 8192 Hz
sinc4 sinc4 -
Out @ 512 Hz
Being the fact that glitches are responsible of a wasteful power
consumption, we looked for logic families not allowing glitch
propagation. We focused our attention on dynamic precharge-
evaluation CMOS logic, using a 4-phase clock. Effectively,
glitches don’t propagate, but the slow clock rates at which the
decimator must work cause high-level voltage droop; the re-
fresh rate is not high enough.
So we decided to use a static CMOS logic, which isn’t afflicted
by the problems of dynamic logics at low rates.
B . l Glitches

The filter being designed has a clocked architecture, which


means that glitches don’t cause wrong outputs; their effect is
Each of FIR filters El(z), E2(z), E3(z),E4(z), was imple- nevertheless a great increase in power dissipation. The solu-
mented using the rules presented in the previous paragraph to tion adopted to this problem is the use of clocked adders: if
reduce power consumption: no multipliers were used, coeffi- an adder performs the sum in correspondence of a rising clock
cients were implemented as sum of power-of-two numbers; in edge, the following adder is sinchronized with the subsequent
addition, a technique known as Substructure Sharing was in- falling clock edge, when its input are stabilized. An example is
..
troduced in this architecture: simply speaking, it consists of reported in fig.7, in which a possible implementation of E4(z)
the re-use of part of the circuitry involved in the realization of is represented.
a multiplicative coefficient, to implement another coefficient.
As an example, the structure of E3(z) was reported here, in
fig.6. In order to obtain the coefficient 44 we have multiplied

g&
Figure 7. Architecture of the FIR filter E4(z).

A consequence of this choice is that delay elements of 0.5 and


1.5 clock periods are to be designed, in order to preserve the
correct synchronization.

Another effect is that simpler adder architectures can be used:


in fact we can adopt a minimal covering of the Karnaugh map,
Figure 6. Architecture of the FIR filter E 3 ( z ) . using a minor number of transistors. Glitches don’t have the
possibility to propagate across the structure, thanks to the tim-
by 4 the output of the structure used for the coefficient 10, and ing of adders and delay elements.
added 4 to it. This is the so-called “Substructure Sharing”.
FIR filters El(z), E2(z),E 3 ( z ) ,E4(z) have different laten-
In this second Polyphase architecture, registers and adders cies, so attention must be posed on synchronization of their
work at a rate 2 times slower than in the first one, because outputs before the final adders (see fig.5): the final architecture
of the downsampling by 4 at the input. On the other side, the is reported in fig.8.

752
In @ 8192 Hz
sinc4 -
Out @ 512 Hz C. Power Consumption Estimate
A number of different approaches to estimate power consump-
tion were developed by others ([?I [?I). Here several of them
are presented, focusing the attention on those referring to Reg-
ister Transfer abstraction level. At this level the primitives are
functional blocks, such as adders, multipliers, registers. The
difficulty in estimating power in this case stems from the fact
that the gate, circuit and layout level details of the design may
not have been specified yet. The strategies proposed for RT-
level power estimation can be divided in two classes: analytical
and empirical methods [?I.
C. 1 Analytical Methods
Figure 8. Second Polyphase architecturewith added delay elements for
synchronization.
Analytical methods attempt to relate power consumption of a
particular RTL description to fundamental quantities that de-
The total latency of the architecture implementing sinc4 block scribe the physical capacitance and activity of a design. Con-
is 3Tck, where Tck is the clock period. cepts belonging to these procedures are gate equivalents (GE),
used in the technique called CES (Chip Estimation System),
B.2 Adders Entropy, used by Najm [?] as an estimate of the activity of a
cell, Area, as an estimate of the capacitance.
Referring to the last two architectures, the first block has a
characteristic that allows saving much power: its input is a one- These methods have the advantage of requiring little informa-
bit word. This means that many adders can be eliminated and tion as input; on the other hand, no timing information enter the
substituted by ad-hoc wiring solutions. As an example, let’s calculation and, therefore, glitching power isn’t accounted for.
consider the block implementing E3(z) in the last architecture In CMOS, power dissipation is highly dependent on switching
(see fig.9). As it can be seen from the figure, only 2 adders are and glitching activity. Therefore, power predictions based on
these techniques may not have a strong relation to real hard-
ware. To solve these problems empirical methods can be used.
C.2 Empirical Methods

These methods measure the power consumption of existing im-


plementations and produce a model based on those measure-
ments. This approach is best suited for library-based approach,
~ ~ i , ~ ~ ~ ~ ~ ! - ! ! ~ it’s
but - not, necessary,
~ ~ if a~ ballpark
~ .estimate- ~of the power can
-. .._
- -.- -,.
~ ..
be acceptable. This category of models can be divided in those
assuming fixed signal activity and those trying to take into ac-
count the dependence of the circuit activity on the input sig-
nals. Power Factor Approximation (PFA) method belongs to
the first group, while ESP method belongs to the second one
[?I. The latter divides total power in a fixed component and
in a component proportional to the number of bit transitions of
the input vector.
Figure 9. To realize E 3 ( z ) ,only a one-bit adder and one 5-bits adder (both
with carry-out) are necessary.
111. RESULTS

necessary (S3 and S4) of the four represented: moreover, in the A very simple tool to compare power dissipation for different
case of S3, even if the words to be added are 6-bits long, a 1-bit architectures was implemented. It was applied on all the struc-
adder with carry out is sufficient; in the case of S4, bacause of tures mentioned above.
the particular configuration of the input words, a 5-bits adder
with carry out is required. In fact the high efficiency in terms of power consumption of
the Polyphase realization is clear and doesn’t need any proof,
These optimizations can be operated only in the first block: in because of the elimination of recursive loops, the reduction of
fact the input word-length of the second block is higher (see the high frequency operating circuitry, the slightly increasing
fig.4 and 5). of the internal word-length. In such a case the power estimate

753
is useless; nevertheless its importance grows fast in case of During this clock period 4 bit change, and this is the quantity
comparison of slightly different architectures as in the case of we have considered as an activity indication in this time step.
the second and third structure.
A counter stores all these changes in all the adders and reg-
In this case it would be interesting to know in advance which isters of the structure under test, during all the simulation pe-
architecture is the less consuming, in order to concentrate de- riod. The structure letting the counter work has been embed-
sign efforts over it: optimizations and trade-offs could be in- ded in the AHDL code describing adders and regiters: the en-
troduced very early in the design flow. The “power estimator” tering words (whose length is passed to the block as parame-
will help in such a choice. Another advantage of architecture- ter) are logically XORed with the corresponding previous ones
level tools over lower level ones is the simulation speed, due to (stored in local variables), as well as the output word, and then
the lower model complexity of the former [?I. ones are counted and stored. This is repeated at each clock
edge by which the block is activated. The language in which
A. Power Dissipation Estimate blocks were implemented was SpectreHDL and simulations
took place in CADENCE SpectreS.
The approach to power simulation is simpler than those listed
in section 11-C; it takes ideas from several of them, as it at-
tempts to take into account the switching and glitching activ- t=i+l
vv
00 11 0110
v ’

ity, the word-length in the different blocks, the activation rate t=i 1001 0100
of the blocks. The main topic is relative comparisons between
architectures, not in the absolute value of power consumption.
For this reason, capacitances are eliminated from the calcu-
lus, preferring a simple “activity” measure of the system under
test. This is a measure of the computational complexity of the
architecture. . . . . -

Let’s examine a simple block: it has inputs and output: the


t=i
t=i+l 10 Power=2+1+1=L
1 10 1
1
output is a function of the history of inputs values. Every tran-
sition of the input and the output is considered, to obtain the Figure 10. Example of the implemented Activity estimation tool, applied to a
4-bits adder.
estimated power request. Such method was used in case of
adders, shift registers and so on, but not for multipliers. One problem of this test procedure is that results are highly
Power consumption of multiplication blocks was neglected be- pattern-dependent. Nevertheless this is not a real problem be-
cause every coefficient is decomposed in the sum of power-of- cause of the physical characteristics of a general EA modu-
lated signal, consisting of a high-rate sequence of Is and Os.
two coefficients, which are implemented by bit-shifts. Further-
more the shifts are only ideal, because in practice it will be suf- In fig. 11 are reported the results for all the architectures men-
ficient to connect the bit-line inputs to the following adder in tioned above. As expected, the complexity levels of the CIC
the correct position. Moreover, power consumption of adders architecture and the Polyphase architectures are very different;
in the second and third architecture is computed accordingly between the two Polyphase structures, the last is the less con-
with the real word-lengths: for example, referring to fig.9, S1 suming one.
and S2 don’t absorb any current, whereas S3 and S4 absorb a
current proportional to 1 and 5 bit respectively. IV. NOVELTIES
An approximate estimate of the “activity” of the designed ar- This work tends to compare similar low-power architectures
chitecture is obtained; the rate at which the different blocks for decimator filters at Register Transfer Level, from a power-
operate, and glitch activity are taken into account. consumption point of view. Keeping the decimation factor
and the high-level structure (fig. 2) of the filter fixed, several
The formula used to compute activity until time T is: tradeoffs between lowering filter operating frequency and filter
complexity can be rapidly compared.

A fast but accurate evaluation of computational power con-


sumption for decimation filters at RTL was presented, espe-
where A stands for “activity”, as defined above, and N is the cially in case of similar low-power multi-rate solutions.
total number of blocks in the architecture. To clarify this con-
An approximate estimation of the activity, according the rate at
cept, let’s take a 4-bits adder, as reported in fig. 10. The inputs
which the different blocks operate, comprehending the glitch
+
at t = i and t = i 1 are reported, as well as the correspond-
activity, is obtained.
ing outputs. The arrows indicate the positions of changing bits.

154
References
a: Activity-Polyphose2 o : Activity-Polyphasel
60K - : Activity-CIC [l] R. Schreier, The Delta-Sigma Tooboz 5.1. A n a l o g B vices Inc,
April 1998.
[2] A.Gerosa, A.Novo, and A.Neviani, “h -paver sensing and dig-
50K
itization of cardiac signals based on sigmadel ta conversion,”
Proceedings of the 2000 International Symposium on Low Power
40K Electronics and Design, Jul y 2000
,--. [3] Y Gao,L . Jia, and R e n h u n e n y h -complexity high-speed
30K decimation fi ters,” Proceedings of the 8th International Sym-
v posium on Integrated Circuits, Devices and Systems (ISIC’SS),
20K Septemher 1 999.
[4] F. N.Ngm,“ A survey of pKer esti mation techniques in tisi
10K circuits,” IEEE ’Bansactions on Very Large Scale Integration
( V L S I ) S y s t q s vol. 2,Decenber 1 994.
[5] PLandmaIfyi gh-level pwer esti mation,” Proceedings of the
0.0 International Symposium on Low Power Electronics and De-
0.0 30m 60m 90m
time ( s ) sign, August 1996.
[6] PE . Landmadow-Power Archi tectural Design Methodologies
PhD thesi s, UC EM ey, 1994.

Figure 11. Comparison between the computational complexity of the three


architectures considered: Hogenauer’s CIC (“CIC”), Polyphase1 and
Polyphase2; the input signal rate is 1OOps.

755

S-ar putea să vă placă și