Real-Time Fundamental Frequency Estimation by

Real-Time Fundamental Frequency Estimation by Least-Square Fitting
Andrew Choi Department of Computer Science University of Hong Kong Pokfulam Road, Hong Kong Tel: +852 2859 7068 Fax: +852 2559 8447 E-mail: choi@cs.hku.hk June 30, 1995
Abstract
The real-time performance of a fundamental frequency estimation algorithm depends not only on its computational e ciency but also on its ability to obtain accurate estimates from short signal segments. Previous frequency-domain algorithms make use of spectral analysis algorithms that require the application of a window function, which cause them to fail when signal segments are short and their fundamental frequencies are low. A new spectral analysis algorithm based on least-square tting, which does not require the application of a window function, is introduced. This algorithm operates by minimizing the square error of tting a sinusoid to the signal segment. Special properties of the shape of the error function allow the spectrum of the signal segment to be deduced from it and the algorithm to be implemented e ciently. The proofs of these properties are given. A fundamental frequency estimation algorithm based on this spectral analysis algorithm is then described. Its computation time is analyzed and we demonstrate its real-time performance by a set of experiments using actual sound data.
EDICS number: SA 2.6.1
I Introduction
A pseudo-periodic sound produced by a musical instrument is composed chie y of harmonic components whose frequencies are integer multiples of a fundamental frequency. The problem of fundamental frequency estimation (FFE) is central to the automatic analysis of musical signals. Pitch-to-MIDI converters are devices through which conventional musical instruments can be attached to digital synthesizers as controllers. The sound from the musical instrument is analyzed to determine the notes played on it and MIDI1 messages are generated and sent to cause these notes to be played on the synthesizer. In the application of FFE algorithms to pitch-to-MIDI converters, the fundamental frequencies must be estimated in real time. These devices are an essential part of the front ends of interactive music systems 1]. This paper will focus on FFE for monophonic signals. Such signals are of course generated by a monophonic musical instrument, such as a trumpet. The assumption of a monophonic input signal is also met when separate transducers are used to collect sounds from the independent sound-producing elements of an instrument. Various kinds of individual transducers have been manufactured for collecting signals from the di erent strings of guitars and violins. A number of performance measures a ect the suitability of an FFE algorithm for use in real time. The response time is the delay between the instant a note is played on the instrument and the instant the synthesizer begins to generate the corresponding sound. Assuming that the delay due to the synthesizer is negligible, the response time is equal to the sum of the length of the initial signal segment analyzed and the computation time of the FFE algorithm. Ideally the response time must be so short that the player of the instrument does not perceive the delay. To illustrate the di culty of this problem and the scale of the involved quantities, consider the note E2|the lowest note that can be played on the guitar|which has a fundamental frequency of 82.4 Hz. A 15-millisecond segment of this note contains only a little more than a complete cycle of the waveform. A computation time of 15 milliseconds for the FFE algorithm will result in a response time of 30 milliseconds. An FFE algorithm for this application must therefore correctly determine the fundamental frequency from such a short initial signal segment and have a small computational requirement. Economic and engineering constraints have resulted in
1
MIDI is a standard protocol for communication among synthesizers.
commercial pitch-to-MIDI converters whose response times are well over 30 milliseconds for notes with low pitches 2]. Delays on these converters are very noticeable to the player of the instrument. Another important performance measure of an FFE algorithm for the type of applications being considered is the resolution at which it can distinguish neighboring frequencies. A high resolution is necessary when small and continuous changes of pitch in time{such as that resulting from the execution of vibrato, glissando, and pitch bends{ need to be detected. Some FFE algorithms have lower recognition resolution inherently because they subdivide the frequency range into \bins." A high resolution is necessary in these musical contexts because an error of just a few percent in frequency is perceivable in many musical sounds. Certainly, a good FFE algorithm must also estimate the fundamental frequency of a signal segment with high accuracy. A more subtle, but related performance measure, however, is how accuracy deteriorates when the length of the signal segment falls below the algorithm's normal operating range. In many cases a \graceful degradation" is preferred because MIDI pitch-bend messages can be sent to a synthesizer to adjust the pitch of a note after it has started to sound. The initial estimates, if incorrect, should be as accurate as possible so that large (and noticeable) subsequent adjustments in pitch are unnecessary.
II Previous Work
The problem of FFE in speech signals has been studied for quite some time (see, for example, 3, 4]). Unlike their counterparts for musical signals, FFE algorithms for speech signals generally do not assume that strong harmonic components are present in the signal. These algorithms also operate within a narrower frequency range. Many existing FFE algorithms for musical signals generate estimates reliably when the signal segment analyzed is su ciently long. Frequency-domain FFE algorithms operate by performing spectral analysis on the signal by segment and applying a pattern matching technique to the spectrum to determine each segment's fundamental frequency. Amuedo 5], for example, identi es sinusoidal components in a signal by the peaks in the power spectrum and examines how the hypothesis for each component to be the fundamental frequency is reinforced by the other components. Pearson and Wilson 6] 3
consider a multiresolution approach for the spectral analysis step. Doval and Rodet 7, 8] apply a maximum likelihood analysis to determine the fundamental frequency also using peaks in the power spectrum. Brown 9] computes the cross-correlation of the constantQ transform of a segment of the signal with a xed comb pattern. The calculation of the constant-Q transform and a fast algorithm for approximating it are considered in 10] and 11], respectively. The accuracy of a frequency-domain FFE algorithm depends on the quality of its spectral analysis step. Since the constant-Q transform and the Fourier transform actually compute the spectrum of the periodic extension of a signal segment, a window function must rst be applied to minimize spectral leakage due to discontinuity at the ends of the observation interval (see, for example, 12]). Applying a window function to a signal segment with low fundamental frequency which already contains very few samples removes much information that is useful in determining its spectrum. Parametric spectral analysis techniques such as autoregressive (AR) methods and autoregressivemoving average (ARMA) methods 13, 14] do not require the application of a window function and are capable of generating power spectra accurately from short signal segments. Their used in real-time FFE is considered elsewhere 15]. The FFE algorithm introduced in this paper is similar to these methods in that its spectral analysis step does not require the application of a window function. An alternative approach for designing FFE algorithms is based on computing an autocorrelation between the waveform and a delayed version of itself and determining the fundamental frequency by maximizing the degree of their similarity. Ney 16] uses time-warping to account for small variations in the signal waveform. The estimated period is the amount of shift that results in the best match of a segment of the signal with a future segment. Lane 17] adapts the center frequency of a bandpass lter to match the fundamental frequency of the signal using a convergence algorithm. Cook, Morrill, and Smith 2] use a least mean square adaptive algorithm to determine the coe cients of a lter that predicts a segment of a signal from an earlier segment. The phase of the lter is computed from these coe cients, which is then used to estimate the period. A common weakness among autocorrelation-based FFE algorithms is their need for an initial estimate of the fundamental frequency, since they are adaptive algorithms by nature. They become dependent on good initial estimates as the lengths of the 4
signal segments analyzed decrease. Under such conditions, these algorithms may fail to converge or may converge to an incorrect estimate. In this regard, the performance of this category of FFE algorithms does not degrade gracefully. A hybrid technique, described in 18], rst determines a coarse estimate of the fundamental frequency using a frequency-domain algorithm. The phase change of the component closest in frequency to the coarse estimate between two segments of the signal separated by one sample is then used to estimate the fundamental frequency accurately. Kuhn 19] describes a simpler hybrid technique. The e ect of segment lengths of signals on the accuracy of FFE algorithms is studied in 20]. A dynamic programming algorithm is described for matching harmonics to peaks in the constant-Q transform of the signal. The result is an FFE algorithm capable of correctly handling short signal segments. However, since it uses the constant-Q transform, it has the same disadvantages of frequency-domain algorithms described above, namely that much information about the spectrum is discarded by the application of the window function. A modi cation to the autocorrelation-based algorithm in 2] that enhances its performance in real time is also considered in 20]. Central to the FFE algorithm introduced in this paper is a new spectral analysis algorithm based on least-square tting. It will be described in the next section. Two properties of the least-square tting are crucial for the spectral analysis algorithm: one that allows the sinusoidal components of the input signal to be identi ed and one that allows the algorithm to be implemented e ciently. These properties are proved in section IV. Section V describes the FFE algorithm and analyzes its computation time. The experimental results of its application to real signals will be presented in section VI.
III A Least-Square Spectral Analysis Algorithm

Let the discrete signal segment to be analyzed be denoted by wk , k = 1; : : :; N . In general, this signal can be composed of a number of harmonic components. Suppose a sinusoidal function of the form
w ^k
= a sin(fk) + b cos(fk)
is to be matched to wk . The frequency of the sinusoid is f , given in a unit such that f = 2 f 0=s, where f 0 and s are the frequency of the sinusoid and the sampling frequency 5
of wk in hertz, respectively. The values of a and b determine the amplitude and phase of w ^k . The square error of the match, a function of f , a, and b, is given by
e=
N X
For a given value of f , a pair of values can be chosen for a and b to minimize the square error of the match by setting
@e @a @e @b
k=1
(w ^ k ? w k )2 :
=a =a
N X
k=1 N X
sin(fk) sin(fk) + b
N X
The solution to this pair of equations is given by

a
k=1
sin(fk) cos(fk) + b
k=1 N X k=1
cos(fk) sin(fk) ? cos(fk) cos(fk) ?
N X
k=1 N X k=1
wk sin(fk) = 0; wk cos(fk) = 0:
where
P
? RW ; b = QW ? P X ; = QX P R ? Q2 P R ? Q2
N X k=1
cos(fk) sin(fk); R = =?
N X k=1
sin(fk) sin(fk); Q =
W
N X k=1
cos(fk) cos(fk);
=?
N X k=1
wk sin(fk); X
N X k=1
wk cos(fk):
Let e (f ) be the value of e when a and b are set to a and b , respectively, computed from that given value of f . In other words, e (f ) is the minimum square error of matching a sinusoid of frequency f to the signal segment wk over all possible amplitudes and phases for such sinusoids. Figure 1 shows a plot of the function e (f ) for a typical 300-sample signal segment of a C4 note sampled from an electric guitar. This gure illustrates two properties of this function: one that allows the spectrum of the signal segment wk to be deduced from the values of the function and one that allows the spectrum to be computed e ciently. These properties will be stated informally and without proof in this section since they are needed in the description of the spectral analysis algorithm. Their proofs will be given in section IV.
component of the signal segment wk . The value of f of the minimum point at the bottom of a tough is approximately equal to the frequency of the corresponding sinusoidal component.
Property 1 Each \signi cant" trough in the function e (f ) corresponds to a sinusoidal
In gure 1, the three deepest troughs reach their minimum values at f = 0:074, f = 0:147, and f = 0:224, respectively. Since the sampling frequency is 22255 Hz, these correspond to sinusoidal components with frequencies 262.10 Hz, 520.67 Hz, and 793.41 Hz, respectively. The frequency of C4 is 261.63 Hz and these are therefore approximately the fundamental, second harmonic, and third harmonic of the C4 note, respectively. 2 =N , and this width is independent of the frequencies of the sinusoidal components of the input signal, provided that these are located su ciently far apart from each other.
Property 2 The \width" of each signi cant trough in the function e (f ) is at least
In the example, since the value of N is 300, the width of each signi cant trough will be greater than 0.0209. Since musical signals contain harmonic components whose frequencies are integer multiples of the fundamental frequency, the troughs corresponding to them are regularly spaced in f . Troughs will be separated from each other if the fundamental frequency is somewhat larger than 2 =N , which corresponds to 74.03 Hz for N = 300. Since the fundamental frequency can occur at any value of f , it is impractical to evaluate the function e (f ) at a large number of values of f in order to identify the troughs. However, property 2 makes it necessary to compute e (f ) only at values of f that are evenly spaced at a distance of 2 =(3N ) apart. Doing so guarantees that at least three consecutive values of f will \fall into" each trough, where the function value in the middle is smaller than those of its two neighbors. An additional test can be used to eliminate troughs that are too shallow. In the example, since N = 300, e (f ) must be evaluated at 0:5=(2 =(3N )) 72 points. Since the value of e (f ) within each trough is unimodal, once a trough is detected, the golden ratio search or the successive parabolic interpolation 21] can be used to obtain the value of f at which the minimum occurs. Either technique converges on the minimum value by successively reducing the interval in which it is known to lie. The resulting spectral analysis algorithm, based on least-square tting, will be referred to as algorithm 1 below, and is summarized in gure 2. In the following section, we will prove properties 1 and 2 of the function e (f ).
IV Proofs of Properties 1 and 2 of e (f )

The shape of the function e (f ) will rst be analyzed with the assumption that wk is sampled from a single sinusoid. Properties 1 and 2 will be shown to hold under this assumption. We will then show how the proof can be generalized to cases where wk is sampled from a signal consisting of two or more sinusoidal components. Let the given signal segment be sampled from a single sinusoid. I.e., let
wk = c sin(gk) + d cos(gk);
where g is its frequency, and c and d together determine its amplitude and phase. As before, let w ^k = a sin(fk) + b cos(fk); and
e=
N X
k=1 Using continuous approximations for wk and w ^k , we have Z N ^ (x) ? w(x))2dx; e e ~ = (w

0
(w ^ k ? w k )2 :
where
w(x) = c sin(gx) + d cos(gx); w ^(x) = a sin(fx) + b cos(fx):

Z
Expanding terms and distributing the integral, we get

e ~=
(w ^ (x))2dx ? 2
w ^(x)w(x)dx +
(w(x))2dx:
(1)
Let the three de nite integrals in equation 1 be denoted by e ~1 , e ~2 , and e ~3, respectively. RN 2 Since w(x) is xed, e ~3 = 0 (w(x)) dx is constant. For a given value of f , the values of a and b that minimize the error e ~ can be obtained by setting @ e ~=@a = 0 and @ e ~=@b = 0. RN Computing the de nite integral e ~1 = 0 (w ^(x))2dx, we get
e ~1 =
2ab + 2a2fN + 2b2fN ? 2ab cos(2fN ) ? a2 sin(2fN ) + b2 sin(2fN ) : 4f

@e ~1 @a @e ~1 @b fN ) a sin(2fN ) = 2bf + aN ? b cos(2 ? 2f ; 2f a cos(2fN ) b sin(2fN ) = 2a + bN ? + f 2f 2f :
Thus,
Letting,
P
= N ? sin(2fN )=(2f );
Q = (1 ? cos(2fN ))=(2f ); R = N + sin(2fN )=(2f );
we can write,
@e ~1=@a = aP
Z
+ bQ; @ e ~1=@b = aQ + bR:
The second integral in equation 1 can be expanded to give

e ~2
= ?2 ac sin(fx) sin(gx) + ad sin(fx) cos(gx) + 0 bc cos(fx) sin(gx) + bd cos(fx) cos(gx) dx:
Computing the de nite integrals gives e ~2 = ac sin((f + g )N ) ? sin((f ? g )N ) + f +g f ?g cos((f + g )N ) + cos((f ? g )N ) ? 2f + ad f +g f ?g f 2 ? g2 cos((f + g )N ) + cos((g ? f )N ) ? 2g + bc f +g g?f g2 ? f 2 sin((f + g )N ) ? sin((f ? g )N ) : bd ? f +g f ?g Letting sin((f + g )N ) ? sin((f ? g )N ) ; S= f +g f ?g
T U
(2) (3) (4) (5)
+ g )N ) + cos((f ? g )N ) ? 2f ; = cos((ff + g f ?g f 2 ? g2 = ? sin((f + g )N ) ? sin((f ? g )N ) ; f +g f ?g
+ g )N ) + cos((g ? f )N ) ? 2g ; = cos((ff + g g?f g2 ? f 2

V
we can write
@e ~2=@a = cS + dT ; @ e ~2=@b = cU + dV:
Thus, to minimize e ~, set @ e ~(f )=@a = 0; @ e ~(f )=@b = 0 to get the pair of equations
aP
+ bQ + W = 0; aQ + bR + X = 0;
where W = cS + dT ; X = cU + dV . The solution of this pair of equations is
? RW ; b = QW ? P X : = QX P R ? Q2 P R ? Q2 Let e ~ (f ) be the value of e ~ when a and b are set to their optimal values a and b , respectively. Also, let e ~1 and e ~2 be the values of e ~1 and e ~2 , respectively, when a and b are set to their optimal values. Then,
a e ~ (f ) = e ~1 + e ~2 + e ~3 :
(6)
Substituting P , Q, and R back into the expression for e ~1 gives

e ~1 = P a =2 + Rb =2 + Qa b :
2 2
Substituting S , T , U , and V back into the expression for e ~2 results in

e ~2 = a (cS + dT ) + b (cU + dV ):
The third term of equation 6 has a constant value that depends only on the given signal segment, and is given by 2 2 gN ) ? c2 sin(2gN ) + d2 sin(2gN ) : e ~3 = 2cd + 2c gN + 2b gN ? 2cd cos(2 4g Thus, given the values of c, d, g , and N , we can compute the function e ~ (f ). The function values of e ~ (f ) for c = 1, d = 1, g = 0:1, and N = 300 are plotted in gure 3. Note that the function satis es properties 1 and 2, i.e., the single trough in the graph corresponds to the only sinusoidal component of the input signal segment whose frequency is 0.1, and the widest part of the trough is wider than 0.0209. Because of its complexity, the expression for e ~ (f ) cannot be simpli ed much more than its present form. Fortunately, properties 1 and 2 can still be shown to hold by a careful analysis of the shapes of the functions of which the expression for e ~ (f ) is composed. The analysis will be carried out with the assumption that the values of f and g lie within the range 4 =N; 0:5] (e.g., when N = 300, this range is 0:0418; 0:5]). The lower frequency limit is necessary because below it, the shape of e ~ (f ) deviates from that in gure 3. The functions P , Q, and R are plotted in gure 4 for N = 300. Note that these functions depend only on N . The expressions for P and R have the same second term| sin(2fN )=(2f )|whose value approaches zero as f increases. Its value lies within the 10
interval ?1=(2f ); 1=(2f )], and for the range of values of f considered, this term contributes no more than N=(8 ), or approximately 4% of the value of N , to the function values of P or R. Thus, the values of P and R are within 4% of N when f is in 4 =N; 0:5]. The value of Q is bounded in the same way as the second term of P and R. Therefore, the value of Q is also 4% of N when f is in 4 =N; 0:5]. The values of the functions S , T , U , and V for g = 0:1 and N = 300 are plotted in gure 5. The value of the rst term in the expression for S (de ned in equation 2) lies within the interval ?1=g; 1=g] for any value of f . Within the range of values of g considered, the largest absolute value for this term occurs when g = 4 =N , when it can contribute as much as N=(4 ), or 8% of the value of N . The second term dominates the value of the function in the neighborhood of f = g , since its absolute value is on the order of N . This term has the same period as the sinusoid sin(fN ) and reaches a minimum value at f = g . The function V (de ned in equation 5) is similar to S except its rst term is the negative of that of S . In the neighborhood of f = g , the second term again dominates and the function has a similar shape to S . The rst term in the expression for T (equation 3) is identical to the rst term of S except for it phase. Therefore it also remains small relative to N . The sum of the second and third terms in the expression for T is a function with the same period as sin(fN ) and evaluates to zero at f = g . This sum dominates the value of T in the neighborhood of f = g . The second and third terms must be considered together because individually they tend to in nity as f approaches g . By a similar analysis, the function U (equation 4) can be found to have a shape that is approximately equal to the negative of T . Consider the expression a = (QX ? RW )=(P R ? Q2 ). Since P N , Q 0, R N for f in 4 =N; 0:5], a can be approximated by ?W=N in this range of values of f . A similar analysis establishes that b can be approximated by ?X=N . Since e ~2 = a W + b X; it can be approximated by
e ~2 W 2 + X2
?W=N
? X=N
= ?1=N (W 2 + X 2):
Substituting back the expressions for W and X and expanding terms gives = (cS + dT )2 + (cU + dV )2 = c2S 2 + d2T 2 + c2 U 2 + d2V 2 + 2cdST + 2cdUV: 11
The rst four terms of this expression can be approximated by the expression (c2 + d2)(S 2 + T 2 ) because S 2 V 2 and T 2 U 2 . The expression c2 + d2 is the amplitude of the single sinusoidal component in wk . The sum of the last two terms is 2cd(ST + UV ), which has small values relative to the rst four terms throughout the entire range of f since ST and UV almost cancel out each other. The functions S 2 + T 2 and ST + UV with g = 0:1 and N = 300 are plotted in gure 6. The function values of S 2 + T 2 are small except in the neighborhood of a main lobe whose maximum value occurs at f = g , and whose width is that of a full cycle of the sinusoid sin fN , as a result of the summation of S 2 and T 2. This width is therefore 2 =N . As a result, e ~2 is approximately ?(1=N )(c2 + d2)(S 2 + T 2), which has the same shape as the negative of S 2 + T 2. To analyze the shape of e ~1 , consider its de nition
e ~1 =
Z
(a sin(fx) + b cos(fx))2dx:
The value of this expression is the de nite integral, from 0 to N , of a sinusoid whose amplitude is given by a 2 + b 2 . As f varies, this amplitude changes in a manner given by 2 2 (1=N 2)(W 2 + X 2). Thus e ~1 can be approximated by (1=N 2)(c2 + d2)(S 2 + T 2 ). a +b Since the height of the main lobe of this function is smaller in magnitude than the depth of the trough of e ~2 , the resulting shape of e ~(f ) is that shown in gure 3. Property 1 is satis ed by e ~(f ) because the main lobe in e ~1 and the trough in e ~2 are centered about f = g . Property 2 is satis ed by it because the widest part of the main lobe of S 2 + T 2 has width 2 =N . This discussion of properties 1 and 2 can be extended to cases when the given signal segment wk consists of a number of sinusoidal components, as long as these components are separated in frequencies from one another by at least 2 =N . We will illustrate how to do so when wk consists of two sinusoidal components. This can be generalized to more sinusoidal components as necessary. Let the continuous approximation of the given signal segment be
w(x) = c sin(gx) + d cos(gx) + c0 sin(g 0x) + d0 cos(g 0x):
Since e(f ) is de ned to minimize the square error when a single sinusoidal is matched to the given signal segment, w ^ (x) remains the same, i.e.,
w ^(x) = a sin(fx) + b cos(fx):
12
As before,
e ~(f ) =
N
0
(w ^(x))2dx ? 2
w ^ (x)w(x)dx +
(w(x))2dx;
and as before denote the three integrals in this expression by e ~1, e ~2 , and e ~3, respectively. The expressions for e ~1 and e ~3 remain the same. However,
e ~2 = a(cS + dT ) + b(cU + dV ) + a(c0S 0 + d0T 0 ) + b(c0U 0 + d0V 0 );
where S 0, T 0, U 0 , and V 0 are obtained from S , T , U , and V , respectively, by replacing the occurences of g in them by g 0. Let
W
= cS + dT ; W 0 = c0S 0 + d0T 0 ; X = cU + dV ; X 0 = c0 U 0 + d0V 0:

e ~2 = a(W @e ~2 =@a = W
Then, rewrite and
+ W 0 ) + b(X + X 0);
+ W 0; @ e ~2=@b = X + X 0: + bQ; @ e ~1=@b = aQ + bR;
Since e ~1 is the same as before,

@e ~1=@a = aP
where P , Q, and R are also given by the same expressions as before. To minimize e ~(f ), set aP + bQ + W + W 0 = 0; aQ + bR + X + X 0 = 0: The solution of this pair of equations is given by Q(X + X 0) ? R(W + W 0 ) ; a =
b
By an analysis similar to the single-component case, it can be concluded that

a
P R ? Q2 Q(W + W 0 ) ? P (X + X 0) : P R ? Q2
?(W + W 0)=N ; b
?(X + X 0)=N:
Therefore,
e ~2
= ?(1=N ) (W + W 0 )2 + (X + X 0)2 ] = ?(1=N ) W 2 + X 2 + W 02 + X 02 + 2W W 0 + 2XX 0] 13
The sums W 2 + X 2 and W 02 + X 02 can be analyzed as before. These two functions have the same shapes as S 2 + T 2 and S 02 + T 02 , respectively. Therefore the main lobe in W 2 + X 2 is centered about f = g and that in W 02 + X 02 is centered about f = g 0. When they are added, the result is a function with two lobes, one at f = g and one at f = g 0. The function 2W W 0 has small values throughout most of the range of f because W and W 0 have values close to zero except in the neighborhoods f = g and f = g 0, respectively. The function 2XX 0 behaves in a manner similar to 2W W 0 . R Since e ~1 is again given by e ~1 = 0N (a sin(fx) + b cos(fx))2dx, it is again a de nite integral of a sinusoid with amplitude a 2 + b 2 . The shape of this function is thus that of the negative of e ~2 . Note that this analysis can be generalized to cases when wk is sampled from a signal composed of three or more sinusoidal components. This completes the proofs of properties 1 and 2.
V Computation Time Analysis

After the frequencies of the minimum points of the deepest troughs in the error function e (f ) have been obtained, the fundamental frequency is estimated as follows. Let f1 , f2 ; : : :, fk be these frequencies, where f1 < f2 < : : : < fk . In our experiments, only the three deepest troughs are used (these are identi ed by the three smallest values of e (f )). Therefore, k 3. When the fundamental frequency of the signal is low, the trough in e (f ) that corresponds to the fundmental may be distorted near f = 0 and algorithm 1 may fail to detect it. Assuming that the harmonic components corresponding to the fundamental may or may not be detected, two cases must be distinguished: that f1 ; f2; : : :; fk correspond to (i) the fundamental, second, third, : : :, and k-th harmonics of the signal, respectively, or (ii) the second, third, : : :, k-th, and (k + 1)-th harmonics of the signal, respectively. In the rst case, the estimate of the fundamental frequency is the average of fi =i over i = 1; 2; : : :; k and in the second case, it is the average of fi =(i + 1) over i = 1; 2; : : :; k. The estimate corresponding to the set of values with a smaller standard deviation is reported by the FFE algorithm, since the standard deviation measures the consistency among the troughs in estimating the fundamental frequency. The response time of the FFE algorithm is the sum of the length of the initial 14
signal segment analyzed and its computation time. Since it uses the simple algorithm described above for estimating fundamental frequencies, it spends the majority of its computation time on spectral analysis. The spectral analysis algorithm (algorithm 1 in gure 2) in turn spends most of its time on computing e (f ) for the di erent values of f in steps 2 and 4. Instructions for implementing the loops and tests in algorithm 1 and for generating the fundamental frequency estimates require a negligible amount of time relative to that for evaluating e (f ) and will be ignored in the following analysis. For a given value of f , the steps for computing e (f ) require the numbers of instruction cycles tabulated in table 1. In the table, t denotes the number of instruction cycles needed for computing the sine or cosine function and d denotes that for a division. The formulas given in the table assume that the target DSP processor is capable of single-cycle multiply-and-accumulate operations. An example of such a processor is the Analog Devices ADSP-2100. For this processor, t = 25 and d = 33. Since the values of sin(fk) and cos(fk), for k = 1; 2; : : :; N and P , Q, and R depend only on f and not on wk , step 2 of algorithm 1 evaluates e (f ) for the same M values of f for any signal segment analyzed. These values can thus be precomputed and stored in a table. The evaluation of e (f ) for each value of f thus requires only the computation of (iii), (iv), and (v) in table 1, which takes 8N + 2d + 9 instruction cycles. Therefore the entire step 2 of algorithm 1 requires M (8N + 2d + 9) instruction cycles. In step 4 of algorithm 1, e (f ) is evaluated for di erent values of f that depend on the fundamental frequency of the given wk . Therefore, the evaluation of e (f ) for a single value of f requires all ve steps in table 1, or 2Nt + 11N + 2d + 9 instruction cycles. An iterative algorithm such as the golden ratio search or successive parabolic interpolation is used to nd the minimums of the troughs. Since the troughs of the function e (f ) can be closely approximated by quadratic curves, the successive parabolic interpolation technique can locate the minimum value of a trough with high accuracy in only a few iterations. The computation of e (f ) in step 4 for all the di erent values of f requires a total of L(2Nt + 11N + 2d + 9) instruction cycles, where L is the product of the number of troughs examined and the number of iterations for each trough. A typical value for L is 12. The total number of instruction cycles required for the evaluation of e (f ) for all the di erent values of f in algorithm 1 is therefore M (8N +2d +9)+ L(2Nt +11N +2d +9). 15
When N = 300, M = 72, d = 33, t = 25, and L = 12, a total of 398,700 instruction cycles are needed. A 30-MIPS DSP processor will require 13.29 milliseconds to process such a signal segment. If N is reduced to 200, the same processor requires 7.59 milliseconds to process each signal segment.
VI Experiments
A set of experiments was performed to study the real-time performance of the new FFE algorithm. The test inputs were sampled from an electric guitar at a sampling rate of 22255 Hz and the notes G2 (98.1 Hz), G3 (196.0 Hz), G4 (392.0 Hz), and G5 (784.0 Hz) were used. These notes range over most of the playing range of the guitar and have lower frequencies than test inputs used by previous papers on FFE. The results shown are representative of the results of repeated runs of the experiment and for notes with di erent fundamental frequencies. Three FFE algorithms were tested: algorithm CQCC, which is based on the constantQ transform and cross-correlation 9], algorithm CQDP, which is based on the constantQ transform and dynamic programming 20], and algorithm LS, the FFE algorithm introduced in this paper, which performs spectral analysis by algorithm 1 in gure 2 and FFE by the simple decision algorithm described in the previous section. These algorithms were implemented in C++ in Unix and did not run in real time. The beginning of the signal was identi ed automatically by a simple threshold algorithm on the power in an initial segment of the signal. The three FFE algorithms were applied to the initial segments of the sampled notes with lengths between 5 and 30 milliseconds, in increments of 1.25 milliseconds. Figures 7, 8, 9, and 10 plot the fundamental frequency estimates reported by these algorithms as functions of the lengths of the initial segments analyzed, for notes G2, G3, G4, and G5, respectively. For notes with low fundamental frequencies, represented by G2, algorithm LS performs better than algorithm CQDP by requiring a slightly shorter initial segment to achieve an accurate estimate. Its accuracy for segments that are even shorter is also higher and it is more stable than algorithm CQDP (i.e., the estimates do not uctuate as much). Both algorithm LS and algorithm CQDP require a much shorter initial segment (by 15 milliseconds) than algorithm CQCC to obtain accurate estimates. 16
For notes with intermediate fundamental frequencies, represented by G3, both algorithm LS and algorithm CQDP converge to correct estimates of the fundamental frequency when the length of the initial signal segment is greater than 7.5 milliseconds. In fact, algorithm CQDP produces an estimate with a slightly smaller absolute error than that produced by algorithm LS for the 6.25-milliseconds initial signal segment. However, the absolute errors of algorithm LS are much smaller than those of algorithm CQDP for initial signal segment lengths of 5 milliseconds or smaller. Also, a uctuation of the estimate occurs at 16.25 milliseconds for algorithm CQDP, due to its less stable nature. Both algorithms require initial signal segments that are about 5 milliseconds shorter than those required by algorithm CQCC to obtain accurate estimates. The plots for G4 and G5, representative of notes with higher fundamental frequencies, show that algorithm LS converges somewhat quicker than algorithm CQDP as the length of the signal segment increases. All three FFE algorithms perform well when the initial signal segments are longer than 10 milliseconds in length for these notes with high fundamental frequencies.
VII Summary
A new spectral analysis algorithm based on a least-square tting was described. It performs its function without requiring the application of a window function so that it is capable of correctly analyzing relatively shorter signal segments. The algorithm minimizes the square error of tting a sinusoid to the signal segment. Two properties of the error function of the least-square tting are proved: the positions of the troughs in it correspond to the frequencies of the sinusoidal components of the signal segment and their guaranteed widths allow them to be detected by evaluating the error function at a small, evenly-spaced set of frequencies. An FFE algorithm based on this spectral analysis algorithm was also described. Since it has higher accuracy than previous FFE algorithms for short signal segments, it thus more suitable for use in real time. The computation time for this FFE algorithm was analyzed. Experimental results that demonstrate the real-time performance of the algorithm were also described.
17
References
1] R. Rowe. Interactive Music Systems: Machine Listening and Composing. MIT Press, 1993. 2] P.R. Cook, D. Morrill, and J.O. Smith. A MIDI control and performance system for brass instruments. In Proc. of ICMC 1993, pages 130{133, Tokyo, Japan, 1993. 3] M.J. Ross, H.L. Sha er, A. Cohen, R. Freudberg, and H.J. Manley. Average magnitude di erence function pitch extractor. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-22(5):353{362, October 1974. 4] L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, and C.A. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-24(5):399{418, October 1976. 5] J. Amuedo. Periodicity estimation by hypothesis-directed search. In Proc. of ICASSP '85, pages 395{398, Tampa, Florida, May 1985. 6] E.R.S. Pearson and R.G. Wilson. Musical event detection from audio signals within a multiresolution framework. In Proc. of ICMC 1990, pages 156{158, Glasgow, 1990. 7] B. Doval and X. Rodet. Fundamental frequency estimation using a new harmonic matching method. In Proc. of ICMC 1991, pages 555{558, Montreal, Cananda, 1991. 8] B. Doval and X. Rodet. Estimation of fundamental frequency of musical sound signals. In Proc. of ICASSP 1991, pages 3657{3660, Toronto, Canada, May 1991. 9] J.C. Brown. Musical fundamental frequency tracking using a pattern recognition method. J. Acoust. Soc. Am., 92(3):1394{1402, September 1992. 10] J.C. Brown. Calculation of a constant Q spectral transform. J. Acoust. Soc. Am., 89(1):425{434, January 1991. 11] J.C. Brown and M.S. Puckette. An e cient algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Am., 92(5):2698{2701, November 1992. 12] F.J. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1):51{83, January 1978. 18
13] S. L. Marple, Jr. Digital Spectral Analysis with Applications. Prentice-Hall, 1987. 14] S. M. Kay and S. L. Marple, Jr. Spectral analysis|a modern perspective. Proceedings of the IEEE, 69(11):1380{1419, November 1981. 15] A. Choi. Real-time fundamental frequency estimation by autoregressive spectral analysis. In preparation. 16] H. Ney. A time warping approach to fundamental period estimation. IEEE Trans. on Systems, Man, and Cybernetics, SMC-12(3):383{388, May/June 1982. 17] J.E. Lane. Pitch detection using a tunable IIR lter. Computer Music Journal, 14(3):46{59, Fall 1990. 18] J.C. Brown and M.S. Puckette. A high resolution fundamental frequency determination based on phase changes of the fourier transform. J. Acoust. Soc. Am., 94(2, Pt. 1):662{667, August 1993. 19] W.B. Kuhn. A real-time pitch recognition algorithm for music applications. Computer Music Journal, 14(3):60{71, Fall 1990. 20] A. Choi. On the improvement of the real-time performance of two fundamental frequency recognition algorithms. In Proc. of First Brasilian Symp. on Computer Music, pages 27 {32, Caxambu, MG, Brasil, August 1{5 1994. 21] G.E. Forsythe, M.A. Malcolm, and C.B. Moler. Computer Methods for Mathematical Computations. Prentice-Hall, 1977.
19
Values computed Instruction cycles (i) sin(fk) and cos(fk) for k = 1; 2; : : :N 2Nt (ii) P , Q, and R 3N (iii) W and X 2N (iv) a and b 2d + 9 (v) e (f ) 6N Table 1: Instruction cycles required by Algorithm 1 for each value of f .
1e+06 9e+05 8e+05 e*(f) 7e+05 6e+05 5e+05 4e+05 0 0.1 0.2 f 0.3 0.4 0.5
Figure 1: A plot of e (f ) for a 300-sample segment of a C4 note.
Algorithm 1
1. Divide 0; 0:5] into M = 0:5=(2 =(3N )) intervals by choosing f0 , f1 , : : :, fM so that f0 = 0, fM = 0:5, and fi ? fi?1 = 2 =(3N ) for i = 1; 2; : : :; M . 2. Compute e (f0), e (f1), : : :, e (fM ). Identify those values of fi for which e (fi?1 ) > e (fi ) < e (fi+1 ). 3. Eliminate troughs that are too shallow by selecting only those intervals in step 2 for which the function value at the middle point is signi cantly smaller than those of its two neighbors, e.g., 0:9e (fi?1 ) > e (fi ) < 0:9e (fi+1 ). 4. For each interval selected in step 3, apply the golden ratio search (or successive parabolic interpolation) to locate the value of f of the minimum in the trough by searching the interval fi?1 ; fi+1]. Figure 2: The least-square spectral analysis algorithm. 20
350 300 250 ~ e*(f) 200 150 100 50 0 0 0.05 0.1 0.15 f 0.2 0.25 0.3
Figure 3: A plot of e ~ (f ) with g = 0:1 and N = 300.
600 500 400 300 200 100 0 0 0.05 0.1 f 0.15 0.2 P Q R
Figure 4: Plots of P , Q, and R with N = 300.
21
300 200 100 0 -100 -200 -300 -400 0 0.05 0.1 0.15 f 0.2 0.25 0.3 S T U V
Figure 5: Plots of S , T , U , and V with g = 0:1 and N = 300.
100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 -10000 0 0.05 0.1 0.15 f 0.2
S2+T2 ST+UV
0.25
0.3
Figure 6: Plots of S 2 + T 2 and ST + UV with g = 0:1 and N = 300.
22
600 500 400 Hz 300 200 100 0 5 10 15 20 segment length (ms)
CQCC CQDP LS
25
30
Figure 7: Fundamental frequency estimates for a G2 note.
450 400 350 300 Hz 250 200 150 100 50 5 10 15 20 segment length (ms) 25 30 CQCC CQDP LS
23
450 400 350 300 Hz 250 200 150 100 5 10 15 20 segment length (ms) 25 30 CQCC CQDP LS
900 800 700 600 Hz 500 400 300 200 100 5 10 15 20 segment length (ms) 25 30 CQCC CQDP LS
24

Real-Time Fundamental Frequency Estimation by

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Real-Time Fundamental Frequency Estimation by

Încărcat de

Drepturi de autor:

Formate disponibile

Real-Time Fundamental Frequency Estimation by Least-Square Fitting

EDICS number: SA 2.6.1

MIDI is a standard protocol for communication among synthesizers.

III A Least-Square Spectral Analysis Algorithm

The solution to this pair of equations is given by

cos(fk) sin(fk) ? cos(fk) cos(fk) ?

Property 1 Each \signi cant" trough in the function e (f ) corresponds to a sinusoidal

IV Proofs of Properties 1 and 2 of e (f )

k=1 Using continuous approximations for wk and w ^k , we have Z N ^ (x) ? w(x))2dx; e e ~ = (w

w(x) = c sin(gx) + d cos(gx); w ^(x) = a sin(fx) + b cos(fx):

Expanding terms and distributing the integral, we get

2ab + 2a2fN + 2b2fN ? 2ab cos(2fN ) ? a2 sin(2fN ) + b2 sin(2fN ) : 4f

Q = (1 ? cos(2fN ))=(2f ); R = N + sin(2fN )=(2f );

+ bQ; @ e ~1=@b = aQ + bR:

The second integral in equation 1 can be expanded to give

= ?2 ac sin(fx) sin(gx) + ad sin(fx) cos(gx) + 0 bc cos(fx) sin(gx) + bd cos(fx) cos(gx) dx:

(2) (3) (4) (5)

+ g )N ) + cos((f ? g )N ) ? 2f ; = cos((ff + g f ?g f 2 ? g2 = ? sin((f + g )N ) ? sin((f ? g )N ) ; f +g f ?g

+ g )N ) + cos((g ? f )N ) ? 2g ; = cos((ff + g g?f g2 ? f 2

@e ~2=@a = cS + dT ; @ e ~2=@b = cU + dV:

where W = cS + dT ; X = cU + dV . The solution of this pair of equations is

Substituting P , Q, and R back into the expression for e ~1 gives

Substituting S , T , U , and V back into the expression for e ~2 results in

= cS + dT ; W 0 = c0S 0 + d0T 0 ; X = cU + dV ; X 0 = c0 U 0 + d0V 0:

Then, rewrite and

+ W 0; @ e ~2=@b = X + X 0: + bQ; @ e ~1=@b = aQ + bR;

Since e ~1 is the same as before,

By an analysis similar to the single-component case, it can be concluded that

= ?(1=N ) (W + W 0 )2 + (X + X 0)2 ] = ?(1=N ) W 2 + X 2 + W 02 + X 02 + 2W W 0 + 2XX 0] 13

V Computation Time Analysis

Figure 1: A plot of e (f ) for a 300-sample segment of a C4 note.

Figure 3: A plot of e ~ (f ) with g = 0:1 and N = 300.

Figure 4: Plots of P , Q, and R with N = 300.

Figure 5: Plots of S , T , U , and V with g = 0:1 and N = 300.

Figure 6: Plots of S 2 + T 2 and ST + UV with g = 0:1 and N = 300.

600 500 400 Hz 300 200 100 0 5 10 15 20 segment length (ms)

Figure 7: Fundamental frequency estimates for a G2 note.

Figure 8: Fundamental frequency estimates for a G3 note.

Figure 9: Fundamental frequency estimates for a G4 note.

Figure 10: Fundamental frequency estimates for a G5 note.

S-ar putea să vă placă și