Sunteți pe pagina 1din 8

feature

With Newton's help and a few improvements, real-time digital systems can find root-means efficiently.

root-mean
BY BRIAN NEUNABER

Improve your
eal-time digital systems often require the calculation of a root-mean, such as a root-mean square (RMS) level or average magnitude of a complex signal. While averaging can be efficiently implemented by most microprocessors, the square root may not be especially with low-cost hardware. If the processor doesnt implement a fast square root function, it must be implemented in software; although this yields accurate results, it may not be efficient. One common method for computing the square root is Newtons method, which iteratively converges on a solution using an initial estimate. Since were computing the square root of a slowly varying average value, the previous root-mean value makes a good estimate. Furthermore, we can combine the iterative Newtons method with a first-order recursive averager, resulting in a super-efficient method for computing the root-mean of a signal. In this article, Ill develop and present three efficient recursive algorithms for computing the root-mean, illustrating each method with signal flow diagrams and example code. To some degree, each of these methods trades hardware complexity for error. Ill compare the computational performance and error of each method and suggest suitable hardware for each implementation. ROOT-MEAN The root-mean is computed as the square root of the average over time of its input. This average may be recursive or non-recursive, and Ill briefly review the general case for both. Non-recursive average The non-recursive average, or moving average, is the weighted sum of N inputs: the current input and N-1 previous inputs. In digital filter-

calculations

www.embedded.com

embedded systems design FEBRUARY 2006

37

feature
ing terminology, this is called a finite impulse response, or FIR filter: y n = an x n
n= 0

()

( )

(1)

The most common use of the moving average typically sets the weights such that an=1/N. If we were to plot these weights versus time, we would see the window of the input signal that is averaged at a given point in time. This 1/N window is called a rectangular window because its shape is an N-by-1/N rectangle. There is a trick for computing the 1/N average so that all N samples need not be weighted and summed with each output calculation. Since the weights dont change, you can simply add the newest weighted input and subtract the Nth weighted input from the previous sum: y n = y n 1 +

While this technique is computationally efficient, it requires storage and circular-buffer management of N samples. Of course, many other window shapes are commonly used. Typically, these window shapes resemble, or are a variation of, a raised cosine between /2 and /2. These windows weight the samples in the center more than the samples near the edges. Generally speaking, you should only use one of these windows when there is a specific need to, such as applying a specific filter to the signal. The disadvantage of these windows is that computational complexity and storage requirements increase with N. Recursive average The recursive average is the weighted sum of the input, N previous inputs, and M previous outputs: y n = an x n + bm y m
n= 0 m =1

() (

1 x 0 x N (2) N

() ( )

()

( )

( )

(3)

The simplest of these in terms of computational complexity and storage (while still being useful) is the first-order recursive average. In this case, the average is computed as the weighted sum of the current input and the previous output. The first-order recursive average also lends itself to an optimization when combined with the computation of the square root, which well discuss shortly. In contrast to the non-recursive average, the first-order recursive averages window is a decaying exponential (Figure 1). Technically, the recursive average has an infinite window, since it never decays all the way to zero. In digital filtering terminology, this is known as an infinite impulse response, or IIR filter. From Figure 1, we see that earlier samples are weighted more than later samples, allowing us to somewhat arbitrarily define an averaging time for the recursive average. For the first-order case, we define the averaging time as the time at which the impulse response has decayed to a factor of 1/e, or approxi-

Comparison of recursive and non-recursive averaging windows


ta

1 fs ta

y(t)

1 e

. .1 f t
s

First-order recursive 1/N Non-recursive

Figure 1

Mean-square computation using Newtons method for reciprocal square root


x(n)

m(n)

1/2 x

y(n)

z-1

z-1

Figure 2

38

FEBRUARY 2006

embedded systems design www.embedded.com

feature
mately 37%, of its initial value. An equivalent definition is the time at which the step response reaches 1(1/e), or approximately 63%, of its final value. Other definitions are possible but will not be covered here. The weighting of the sum determines this averaging time; to ensure unity gain, the sum of the weights must equal one. As a consequence, only one coefficient needs to be specified to describe the averaging time. Therefore, for first-order recursive averaging, we compute the mean level as: m n = 1 a x n + a m n 1 may work just fine, they dont take into account the application in which the square root is required. Oftentimes, you may not need exact precision to the last bit, or the algorithm itself can be manipulated to optimize the computation of the square root. I present a few basic approaches here.

Listing 1. C++ class that computes the root-mean using Newton's Method for the reciprocal square root
static const double Fs = 48000.0; // sample rate static double AvgTime = 0.1; // averaging time class RecipRootMean { public: double Mean; double RecipRootMean; double AvgCoeff; RecipRootMean() { AvgCoeff = 1.0 - exp( -1.0 / (Fs * AvgTime) ); Mean = 0.0; RecipRootMean = 1.0e-10; // 1 > initial RecipRootMean > 0

() (

( ) = x (n ) + a m (n 1) x(n)

) ()

(4)

where x(n) is the input, m(n) is the mean value, and a is the averaging coefficient. The averaging coefficient is defined as: a=e
1 fS t

} ~RecipRootMean() {} (5) double CalcRootMean(double x) { Mean += AvgCoeff * (x - Mean); RecipRootMean *= 0.5 * ( 3.0 - (RecipRootMean * RecipRootMean * Mean) ); return RecipRootMean * Mean; } };

where t is the averaging time, and fS is the sampling frequency. The root-mean may then be calculated by taking the square root of Equation 4: y n = m n

()

() ( )

2 = x n + a y n 1 x(n)

()

(6)

where y(n) is the root-mean. EFFICIENT COMPUTATION METHODS Googling fast square root will get you a plethora of information and code snippets on implementing fast squareroot algorithms. While these methods

Listing 2. C++ class that implements the floating-point version of Figure 3


static const double Fs = 48000.0; // sample rate static double AvgTime = 0.1; // averaging time class NewtonRootMean { public: double RootMean; double AvgCoeff; NewtonRootMean() { RootMean = 1.0; // > 0 or divide will fail AvgCoeff = 0.5 * ( 1.0 - exp( -1.0 / (Fs * AvgTime) ) ); } ~NewtonRootMean() {} double CalcRootMean(double x) { RootMean += AvgCoeff * ( ( x / RootMean ) - RootMean ); return RootMean; } };

Mean-square computation that combines averaging and iterative square root


x(n) x/y (1-a)/2 (1-a)/2

z-1

y(n)

Figure 3

40

FEBRUARY 2006

embedded systems design www.embedded.com

feature
Only calculate it when you need it Probably the simplest optimization is to only calculate the square root when you absolutely need it. Although this may seem obvious, it can be easily overlooked when computing the root-mean on every input sample. When you dont need an output value for every input sample, it makes more sense to compute the square root only when you read the output value. One example of an application when this technique can be used is RMS metering of a signal. A meter value that is displayed visually may only require an update every 50 to 100ms, which may be far less often than the input signal is sampled. Keep in mind, however, that recursive averaging should still be computed at the Nyquist rate. Logarithms Recall that: log If youll be computing the logarithm of a square root, its far less computationally expensive to simply halve the result instead. A common example of this optimization is the calculation of an RMS level in dB, which may be simplified as follows: RMS x in dB = 20 log mean x 2 mean x 2 = 10 log y n = y n 1

() (

f y n 1 f y n 1

( (

) )

(9)

()

If we wish to find y = m , then we need to find the root to the equation f(y)=y2-m. Substituting f(y) into Equation 9, we get: y n = y n 1

( ) ( )

(8)

() (

y n 1 m 2y n 1

(10)

Rearranging Equation 9, we get: Newtons Method Newtons Method (also called the Newton-Rapson Method) is a well known iterative method for estimating the root of an equation.1 Newtons Method can be quite efficient when you have a reasonable estimate of the result. Furthermore, if accuracy to the last bit is not required, the number of iterations can be fixed to keep the algorithm deterministic. We may approximate the root of f(x) by iteratively calculating: 1 m(n) y(n) = y(n 1) + 2 y(n 1) (11)

( x ) = 1 log ( x ) 2

(7)

where y(n) is the approximation of the square root of m(n). Equation 11 requires a divide operation, which may be inconvenient for some processors. As an alternative, we can calculate 1 m and multiply the result by m to get m . Again using Newtons Method, we find that we may itera-

42

FEBRUARY 2006

embedded systems design www.embedded.com

tively calculate the reciprocal square root as:


Y,(n)

=1Y, (n-l)[ 3- Y, (n-l) 2m(n) ]


and calculate the square root as:
y(n) = Y, (n) m(n)

(12)

(13)

effIciently handle numbers both greater and less than one. We present this implementation as a signal flow diagram in Figure 2. The averaging coefficient, a, is defined by Equation 5, and Z-I represents a unit sample delay. A code listing for a c++ class that implements the computation in Figure 2 is presented in Listing 1. In this example class, initialization is performed

in the class constructor, and each call to CalcRootMeanO performs one iteration of averaging and square-root computation. Using direct square root Let's go back and take a closer look at Equation 11. Newton's method converges on the solution as quickly as possible without oscillating around it,

Although Newton's Method for the reciprocal square root eliminates the divide operation, it can be problematic for fIxed-point processors. Assuming that m( n) is a positive integer greater than 1, Y,(n) will be a positive number less than one-beyond the range of representation for integer numbers. Implementation must be accomplished using floating-point or mixed integer/fractional number representation. ROOT-MEAN USING NEWTON'S METHOD A subtle difference between Equations 10 and 11 is that m becomes m(n), meaning that we're attempting to fInd the square root of a moving target. However, since m(n) is a mean value, or slowly varying, it can be viewed as nearly constant between iterations. Since y(n) will also be slowly varying, y ( n-l) will be a good approximation to y ( n) and require fewer iterations-one, we hope-to achieve a good estimate. To calculate the root-mean, one may simply apply Newton's Method for calculating the square root to the mean value. As long as the averaging time is long compared to the sample period (t lI/s), one iteration of the square root calculation should suffIce for reasonable accuracy. This seems simple enough, but we can actually improve the computational efficiency, which will be discussed in one of the following sections. Using reciprocal square root Unlike the iterative square-root method, however, the iterative reciprocal square-root requires no divide. This implementation is best suited for floating-point processing, which can

www.embedded.comlembedded systems design

I FEBRUARY 2006

\ '13

feature
Listing 3. C++ class that implements the fixed-point version of Figure 3
static const double Fs = 48000.0; // sample rate static double AvgTime = 0.1; // averaging time static const unsigned int sknNumIntBits = 32; // # bits in int static const unsigned int sknPrecisionBits = sknNumIntBits / 2; static const double skScaleFactor = pow(2.0, (double)sknPrecisionBits); static const unsigned int sknRoundOffset = (unsigned int)floor( 0.5 * skScaleFactor ); class IntNewtonRootMean { public: unsigned unsigned unsigned unsigned int int int int nRootMean; nScaledRootMean; nAvgCoeff; nMaxVal;

IntNewtonRootMean() { nRootMean = 1; // >0 or divide will fail nScaledRootMean = 0; double AvgCoeff = 0.5 * ( 1.0 - exp( -1.0 / (Fs * AvgTime) ) ); nAvgCoeff = (unsigned int)floor( ( skScaleFactor * AvgCoeff ) + 0.5 ); nMaxVal = (unsigned int)floor( ( skScaleFactor / AvgCoeff ) + 0.5 ); } ~IntNewtonRootMean() {} unsigned int CalcRootMean(unsigned int x) { if ( x < nMaxVal ) { nScaledRootMean += ( ( nAvgCoeff * x ) / nRootMean ) - ( nAvgCoeff * nRootMean ); } else { nScaledRootMean += nAvgCoeff * ( ( x / nRootMean ) - nRootMean ); } nRootMean = ( nScaledRootMean + sknRoundOffset ) >> sknPrecisionBits; return nRootMean; } };

average of x(n). An equivalent signalflow representation of Equation 14 is presented in Figure 3. Here, an additional y(n1) term is summed so that only one averaging coefficient is required. Note that x(n) and y(n1) must be greater than zero. A code listing for a C++ class that implements the computation shown in Figure 3 is presented in Listing 2. As in the previous example, initialization is performed in the class constructor, and each call to CalcRootMean( ) performs one iteration of averaging/square-root computation. With some care, Figure 3 may also be implemented in fixed-point arithmetic as shown in Listing 3. In this example, scaling is implemented to ensure valid results. When sufficient word size is present, x is scaled by nAvgCoeff prior to division to maximize the precision of the result. Divide-free RMS using normalization Now well look at the special case of computing an RMS value on fixed-point hardware that does not have a fast divide operation, which is typical for low-cost embedded processors. Although many of these processors can perform division, they do so one bit at a time, requiring at least one cycle for each bit of word length. Furthermore, care must be taken to insure that the RMS calculation is implemented with sufficient numerical precision. With fixed-point hardware, the square of a value requires twice the number of bits to retain the original datas precision. With this in mind, we manipulate Equation 14 into the following: y n = y n 1 + a 2y n 1

RMS computation optimized to eliminate the divide operation


x(n) x 2-int[log2( )] x

x 3a/4

z- 1

y(n)

() ( (

Figure 4

) x n ( )

2 y n 1

(15)

but if we slow this rate of convergence, the iterative equation will converge on the square root of the average of its inputs. Adding the averaging coefficient results in the following root-mean equation:

1 1 + a y(n 1) 2 1 x(n) + 1 a 2 y(n 1) y(n) =

(14)

where a is defined by Equation 5. Now y(n) converges to the square root of the

Although the expression x(n)2y(n1)2 must be calculated with double precision, this implementation lends itself to a significant optimization. Note that a/2y(n1) acts like a level-dependent averaging coefficient. If a slight time-dependent variance in the averag-

44

FEBRUARY 2006

embedded systems design www.embedded.com

feature
Listing 4. Freescale DSP563xx assembly implementation of divide-free RMS using normalization
RMS ; r4: ; r4+1: ; x0: address of output bits 24-47 [y_msw(n)] address of output bits 0-23 [y_lsw(n)] input [x(n)] equ equ equ 48000.0 0.1 @XPN(-1.0/(FS*AVG_TIME)) ;sampling rate in Hz ;averaging time in seconds ;calculate avg_coeff ;load avg_coeff ;get y_msw(n-1) ;get y_lsw(n-1) ;b=number of leading bits in y(n-1) ;a=x(n)^2 ;x0=y_msw(n-1) ;a=x(n)^2-y_msw(n-1)^2 ;y1=y_msw(n-1) ;normalize x(n)^2-y_msw(n-1)^2 by y_msw(n-1) ;x0=[x(n)^2-y_msw(n-1)^2]norm(y_msw(n-1)) ;a= AVG_COEFF ;*[x(n)^2-y_msw(n-1)^2]norm(y_msw(n-1)), ;y0=y_lsw(n-1) ;a=y(n-1)+avg_coeff ;*[x(n)^2-y_msw(n-1)^2]norm(y_msw(n-1))} ;save y_lsw(n) ;save y_msw(n)

FS AVG_TIME AVG_COEFF move move move clb mpy mac normf move mpy add move move rts

#>AVG_COEFF,x1 y:(r4)+,a y:(r4),a0 a,b x0,x0,a a,x0 -x0,x0,a x0,y1 b1,a a,x0 x1,x0,a y:(r4),y0 y,a a0,y:(r4)a,y:(r4)

Comparison of RMS calculation methods


0.006 0.005 0.004 0.003 Level 0.002 0.001 0 0 0.1 0.2 0.3

Figure 5

ing time can be toleratedwhich is often the case1/y(n1) can be grossly approximated. On a floating-point processor, shifting the averaging coefficient to the left by the negative of the exponent approximates the divide operation. This process is commonly referred to as normalization. Some fixed-point DSPs can perform normalization by counting the leading bits of the accumulator and shifting the accumulator by that number of bits.2 In both cases, the averaging coefficient will be truncated to the nearest power of two, so the coefficient must be multiplied by 3/2 to round the result. This implementation is shown in Equation 16.
46
FEBRUARY 2006

is for the Freescale (formerly Motorola) DSP563xx 24-bit fixed-point processor. Of course, this method can be implemented even without fast normalizaRMS Newton's method RMS approximation tion. You can No divide RMS approximation implement a loop to Reciprocal square root method shift x(n)2y(n1)2 to the left for each lead0.4 0.5 0.6 0.7 0.8 0.9 1 ing bit in y(n1). This Seconds will be slower but can be implemented with even the simplest y n = y n 1 of processors. 3a int{log 2 y(n1)} i + 4 2 HIGHER ORDER AVERAGING , 2 2 (16) Higher order recursive averaging may be x n y n 1 accomplished by inserting additional av int{ log 2 x } eraging filters before the iterative square where 2 is a left-shift by root. These filters may simply be one or the number ofleading bits in x l more cascaded first-order recursive sections. First-order sections have the adFigure 4 is the signal-flow diagram vantage of producing no overshoot in that represents Equation 16. Just as in the step response. In addition, there is Figure 3, x(n) and y(n1) must be only one coefficient to adjust and quangreater than zero. tization effects (primarily of concern for A sample code listing that implefixed-point implementation) are far less ments Figure 4 is shown in Listing 4. than that of higher-order filters. This assembly-language implementation The implementer should be aware

() ( ()

embedded systems design www.embedded.com

feature
that cascading first-order sections changes the definition of averaging time. A simple but gross approximation that maintains the earlier definition of step response is to simply divide the averaging time of each first-order section by the total number of sections. However, it is the implementers responsibility to verify that this approximation is suitable for the application. Second-order sections may also be used, if you want (for example) a Bessel-Thomson filter response. If second-order sections are used, its best to choose an odd-order composite response since the averaging square-root filter approximates the final first-order filter with Q=0.5. Care must be taken to minimize the overshoot of this averaging filter. Adjusting the averaging time of this filter in real time will prove more difficult, since there are a number of coefficients that must be adjusted in unison to ensure stability. RESULTS Three methods of calculating the RMS level are compared in Figure 5. The averaging time is set to 100ms, and the input is one second of 1/f noise with a 48kHz sampling frequency. The first trace is the true RMS value calculated using Equation 6. The second trace is the RMS calculation using Equation 14. The third trace is the no-divide calculation of Equation 16. The fourth trace is the RMS value using the reciprocal squareroot method of Equation 13. For the most part, the four traces line up nicely. All four approximations appear to converge at the same rate as the true RMS value. As expected, the largest deviation from the true RMS value is the approximation of Equation 16. This approximation will have the greatest error during large changes in the level of the input signal, although this error is temporary: the optimized approximation will converge upon the true RMS value when the level of the input signal is constant. The errors between the three approximations and the true RMS value are shown in Figure 6. The error of the RMS approximation using Equation 14 slowly decreases until it is below 1e7, which is sufficient for 24-bit accuracy. The optimized approximation of Equation 16 is substantially worse, at about 1e4, but still good enough for many applications. The approximation that uses the reciprocal square root is in the noiseless than 1e9. For highly critical floating-point applications, this is the efficient method of choice. As you would expect, the errors discussed above will be worse with shorter averaging times and better with longer averaging times. Table 1 summarizes the approximate error vs. averaging time of these three methods, along with suitable hardware architecture requirements. SUITABLE FOR AVERAGE READER By combining recursive averaging with Newtons method for calculating the square root, youll gain a very efficient method for computing the root-mean. Although the three methods I presented here are developed for different hardware and each, to some degree, trades off hardware capabilities for error, most of you should find one of these methods suitable for your application.
Brian Neunaber is currently digital systems architect of software and firmware at QSC Audio Products. He has designed real-time digital audio algorithms and systems for QSC and St. Louis Music and has an MSEE from Southern Illinois University. You may contact him at brian_neunaber@ qscaudio.com.

Error comparison of RMS calculation methods


1 0.1 0.01 1.10-3 1.10-4 Error 1.10-5 1.10-6 1.10-7 1.10-8 1.10-10 0 0.1 0.2 0.3 0.4 0.5 Seconds 0.6 0.7 0.8 1.10-9

Error, Newton's method Error, no divide Error, reciprocal root

REFERENCES:
1.D. G. Zill. Calculus with Analytic Geometry, 2nd ed., PWS-Kent, Boston, pp. 170-176, 1988. 2.Motorola. DSP56300 Family Manual, Rev. 3, Motorola Literature Distribution, Denver, 2000.
1

0.9

Figure 6

Approximate error vs. averaging time of RMS calculation methods Calculation method Approximate error t=1ms t=10ms t=100ms t=1 sec
Reciprocal root (Figure 2) Combined root-mean (Figure 3) Divide-free root-mean (Figure 4)
Table 1

Preferred hardware
floating-point hardware divide any

1e6 3e6 1e4

1e8 3e7 1e4

1e9 1e7 1e4

1e10 1e7 3e5

48

FEBRUARY 2006

embedded systems design www.embedded.com

S-ar putea să vă placă și