Documente Academic
Documente Profesional
Documente Cultură
0 Introduction
Time Segment processing plays an important role in the field of digital audio effects. In this paper we explore various methods of time stretching and pitch shifting, implementing each of these ideas in MatLab. The results of each effect, is illustrated in the attached graphs and most importantly in the accompanying sound files.
Fig. 1: Variable speed replay leading to time and spectral envelope compression / expansion. [DAFX] The above described method was implemented in MatLab using the following code. Matlab Code: VariableSpeed.m
The following audio files demonstrate the results of variable speed replay. Original: Faster: Slower: Original: Slower: Faster: Bass_skit.wav VSR_Bass_skit_faster.wav VSR_Bass_skit_slower.wav VSR_fretless1.wav VSR_fretless1_slower.wav VSR_fretless1_faster.wav
Page 1 of 18
3. The area of overlap intervals are searched for a discrete-time lag of maximum similarity. At the point of maximum similarity, the overlapping segments are weighted by a fade in or fade out function to eliminate abrupt changes. The segments add together for an audio sample of changed time length.
Fig. 2: Sola Time Stretching [DAFX] The above described algorithm was implemented in MatLab using TimeScaleSOLA.m file (DAFX, p.209-211). Matlab Code: TimeScaleSOLA.m
B. Synthesis:
1. 2. Choice of the corresponding analysis segment, identified by the time mark. Overlap and add the selected segment. At this point it is decided if the signal is going to be shrunk or stretched based on the scaling factor. If the scaling factor is less than 1, some segments will be discarded (time compression) and if the factor is more than 1, some segments will be repeated (time expansion). Determination of the time instant where the next synthesis will be centered in order to preserve pitch.
The following audio files demonstrate the time stretching by SOLA technique. Original: brass_jazz.wav Shrink: SOLA_brass_jazz_shrink.wav Stretch: SOLA_brass_jazz_stretch.wav
3.
A. Analysis:
1. Determination of the pitch period. Signal is divided into small blocks for which the pitch is considered constant. Pitch detection for each block is performed. Extraction of a segment (block) centered over each pitch mark using a Hanning window [BJ95-a] with the length of two pitch periods to allow for a smooth
Fig. 4: PSOLA Synthesis [DAFX] The pitch detection itself was fairly difficult task to accomplish. As demonstrated in Figure 5 below, our pitch detector worked pretty well on some of the sound files. We have run into problems while processing files with broad range of vocals and
2.
Page 2 of 18
instruments. More than likely this was caused by numerous fundamental frequencies present in those sound files.
of time stretching needs to be applied in addition to variable speed replay. Pitch shifting followed by time stretching algorithm is illustrated in the following figure.
Fig. 5: output of the pitch marker program There are also limitations on the stretching factors which have a limited range (0.25 to 2) for speech and sound experiences some business due to regular repetition of identical input signals. The PSOLA algorithm was implemented in MatLab as per psola.m file (DAFX, p.213-214). It requires the pitch marks which were obtained using a pitch marker function written in MatLab. The following code was used: Pitch Marker: Psola: TimeStretchPsola: PitchMarker.m psola.m TimeStretchPSOLA.m
Fig. 7: Pitch shifting by time scaling and resampling [DAFX] Pitch Shifting by Time Stretching and Re-sampling was implemented using the psola.m function (DAFX, p.213-214) and the below included MatLab code. Psola: psola.m Pitch Shifting by Time Stretch and re-sample: PitchShiftingByTimeStretchingResampling.m The following audio files demonstrate the Pitch Shifting by Time Stretching and Re-sampling technique. Original: Higher: Lower: brass_jazz.wav pitch_brass_jazz_higher.wav pitch_brass_jazz_lower.wav
The following audio files demonstrate the time stretching by PSOLA technique. Original: Shrink: Stretch: brass_jazz.wav PSOLA_brass_jazz-shrink.wav PSOLA_brass_jazz-stretch.wav
Page 3 of 18
are read faster or slower to produce higher or lower pitches. Blocks are read simultaneously in sets of two with a time delay of 1.5 of the block length in order to produce a continuous usable output.
Fig. 10: Plot of original signal (blue) and the pitch shifted signal (red) As it can be seen from the following figure, signal with a higher pitch is raised compared to the original signal and the harmonics are spaced further apart. Similarly, lower pitch signal shows lower frequency content with the harmonics closer together.
Fig. 8: Pitch shifting by delay line [DAFX] In order to control delay line modulation a saw tooth type function is used. Similar approach is proposed in [Dat87]. A more advanced method is presented in [DZ99] where and overlap-add scheme was proposed. This method does not need any fundamental frequency estimation. Instead of overlapping two segments, this method uses three parallel time varying delay lines all overlapping each other. Blocks overlap 2/3 of the block length. The following figure illustrates this method. Fig. 11: Spectrogram of original signal (top), higher pitch (middle), and lower pitch (bottom) The above described Algorithm was implemented using the vibrato.m file (DAFX, p.68-69) as the starting point. The file was modified to use a sawtooth function instead of a sine wave. Delayline: Delayline.m PitchShiftDelayLineModulation: PitchShiftDelayLineModulation.m The following audio files demonstrate the delay line modulation technique. Original: Higher: Lower: x1.wav x1DelayLine_HIGH40.wav x1DelayLine_LOW40.wav
Fig. 9: Pitch shifting by overlap-add scheme [DAFX] Following figure represents the original signal vs. the pitch shifted signal.
Page 4 of 18
Fig. 12: Pitch shifting by PSOLA method: frequency resampling the spectral envelope [DAFX] The harmonics are scaled according to the scaling factor, but the amplitudes are determined by sampling the spectral envelope. PSOLA algorithm can therefore be used for pitch shifting a voice signal and preserving its format. By preserving the format of the signal we are effectively preserving the voice identity. [ML95, BJ95] As it can be seen from the following figures, PSOLA analysis for pitch shifting is identical to the analysis for time stretching. The difference is apparent in the synthesis part where instead of just simply adding or removing segments and therefore stretching the time, we are now adding or removing segments by overlapping windows and therefore preserving the duration of the signal while changing its pitch.
Fig. 14: PSOLA Synthesis for pitch shifting [DAFX] The above described Algorithm was implemented using the psolaF.m function (DAFX, p. 225) Psola_Format: psola_format.m PsolaF: psolaF.m The following audio files demonstrate the results of
4 Conclusions
Effects used for time stretching and pitch shifting described in this report are all based on using small segments of signal which are then processed using methods like time scaling by resampling or amplitude multiplication by an envelope. Main point is that the waveform of each segment is not changed which is the key to preserving the characteristics of the source signal. Applying these effects to actual audio samples, we were able to confirm the theory behind all of these algorithms. Methods described in this report offer a basic tool for time and pitch manipulation, and due to their low computational complexity, are efficient tools for real time signal processing. However, quality of the produced results of these algorithms limits their scope of application. More advanced methods of for time stretching and pitch shifting are available where higher quality of the final product is desired.
Fig. 13: PSOLA Analysis for pitch shifting (same as for time stretching)
Page 5 of 18
5 References
[BB89] K. Bogdanowicz and R. Blecher. Using Multiple Processors for real-time audio effects. In AES 7th International Conference, pp. 337-342, 1989. BB89.pdf [BJ95] R. Bristow-Johnson. A detailed analysis of a time-domain format-corrected pitch shifting algorithm. J. Audio Eng. Soc., 43(5):340-352, 1995. BJ95.pdf [BJ95-a] R. Bristow-Johnson. A detailed analysis of a time-domain format-corrected pitch shifting algorithm. J. Audio Eng. Soc., 43(5):347, 1995. BJ95.pdf [DAFX] U. Zolzer. Digital Audio Effects. John Wiley and Sons, pp. 202-225, 2005. http://www.dafx.de/ [Dat87] J. Dattorro. Using Digital Signal Processor Chips in a Stereo Audio Time Compressor / Expander. In Proc. 83rd AES Convention, Preprint 2500, 1987. dat87.pdf [DZ99] S. Disch and U. Zolzer. Modulation and delay line based digital audio effects. In Proc. DAFX-99 Digital Audio Effects Workshop, pp.5-8, Trondheim, December 1999. [HMC89] C. Hamon, E. Moulines and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In Proc. ICASSP, pp.238-241, 1989. [MC90] E. Moulines and F. Charpentier. Pitch synchronous waveform processing technique for textto speech synthesis using diphones. Speech Communication, 16:175-205, 1995. mc90.doc [MEJ86] J. Makhoul and A. El-Jaroudi. Time-scale modification in medium to low rate speech coding. In Proc. ICASSP, pp.1705-1708, 1986. [ML95] E. Moulines and J. Laroche. Non-parameter technique for pitch-scale and time-scale modification of speech. Speech Communication, 9(5/6):453-467, 1990. [RW85] S. Roucos and A.M. Wilgus. High quality time-scale modification for speech. In Proc. ICASSP, pp. 493-496, 1985.
Page 6 of 18
Page 7 of 18
% % % % % % % % %
TimeScaleSOLA.m
Parameters:
analysis hop size block length time scaling factor overlap interval
Sa = N = 0.25 L =
Sa = input('Analysis hop size Sa in samples N = input('Analysis block size N in samples if Sa > N disp('Sa must be less than N !!!') end M = ceil(length(DAFx_in)/Sa); % Segmentation into blocks of length N every Sa samples % leads to M segments alpha =input('Time stretching factor alpha Ss =round(Sa*alpha); L =input('Overlap in samples (even) = ');
= ');
if Ss >= N disp('alpha is not correct, Ss is >= N') elseif Ss > N-L disp('alpha is not correct, Ss is > N-L') end DAFx_in(M*Sa+N)=0; Overlap = DAFx_in(1:N); % **** Main TimeScaleSOLA loop **** for ni=1:M-1 grain=DAFx_in(ni*Sa+1:N+ni*Sa); XCORRsegment=xcorr(grain(1:L),Overlap(1,ni*Ss:ni*Ss+(L-1))); [xmax(1,ni),index(1,ni)]=max(XCORRsegment); fadeout=1:(-1/(length(Overlap)-(ni*Ss-(L-1)+index(1,ni)-1))):0; fadein=0:(1/(length(Overlap)-(ni*Ss-(L-1)+index(1,ni)-1))):1; Tail=Overlap(1,(ni*Ss-(L-1))+ ... index(1,ni)-1:length(Overlap)).*fadeout; Begin=grain(1:length(fadein)).*fadein; Add=Tail+Begin; Overlap=[Overlap(1,1:ni*Ss-L+index(1,ni)-1) ... Add grain(length(fadein)+1:N)]; end; % **** end TimeScaleSOLA loop **** % Output in WAV file sound(Overlap,44100); wavwrite(Overlap,Fs,'7_3_2_SOLA_brass_jazz_shrink.wav');
Page 8 of 18
Page 9 of 18
end % % % %
Page 10 of 18
if m(1)<=P(1), %remove first pitch mark m=m(2:length(m)); P=P(2:length(P)); end if m(length(m))+P(length(P))>length(in) %remove last pitch mark m=m(1:length(m)-1); else P=[P P(length(P))]; end Lout=ceil(length(in)*alpha); out=zeros(1,Lout); %output signal tk = P(1)+1; %output pitch mark
while round(tk)<Lout [minimum i] = min( abs(alpha*m - tk) ); %find analysis segment pit=P(i); st=m(i)-pit; en=m(i)+pit; gr = in(st:en) .* hanning(2*pit+1); iniGr=round(tk)-pit; endGr=round(tk)+pit; if endGr>Lout, break; end out(iniGr:endGr) = out(iniGr:endGr)+gr'; %overlap new segment tk=tk+pit/beta; end %while
Page 11 of 18
Page 12 of 18
Time Segment Processing % Pitch Shifting by Time Stretching and Resampling (7.4.2)
[x,f_s,nbits]=wavread('brass_jazz.wav'); y=zeros(1,length(x)); m=pitch_detector(x); alpha=1.5; beta=1; y=psola(x,m,alpha,beta); y=resample(y,length(x),length(y)); wavwrite(y, f_s, '7_4_2_brass_jazz-high.wav'); alpha=0.75; beta=1; y=psola(x,m,alpha,beta); y=resample(y,length(x),length(y)); wavwrite(y, f_s, '7_4_2_brass_jazz-low.wav');
Page 13 of 18
Page 14 of 18
figure (2) plot(x,'b') hold plot(y,'r') hold off % figure % plot(y_a) % figure
Page 15 of 18
Page 16 of 18
Time Segment Processing % Pitch Shifting by PSOLA and Formant Preservation (7.4.4)
clear all close all [x,f_s,nbits]=wavread('brass_jazz.wav'); y=zeros(1,length(x)); alpha=1; beta=1; gamma=2; m=PitchMarker(x); y=psolaF(x,m,alpha,beta,gamma); wavwrite(y, f_s, '7_4_4_brass_jazz_psola_formant.wav'); alpha=1; beta=1.5; gamma=1; y=psolaF(x,m,alpha,beta,gamma); wavwrite(y, f_s, '7_4_4_brass_jazz_psola_formant_high.wav'); alpha=1; beta=0.75; gamma=1; y=psolaF(x,m,alpha,beta,gamma); wavwrite(y, f_s, '7_4_4_brass_jazz_psola_formant_low.wav');
Page 17 of 18
Page 18 of 18