Sunteți pe pagina 1din 37

Project Report On Speech compression and decompression

Submitted in the partial fulfillment of the requirement for the award of degree of

Bachelors of Technology In Electronics & Communication Engineering

Under The Guidance of Miss. Lovleen kaur

Submitted by pankaj singh negi (10807965) saurabh lohani(10808634)

Department of Electronics & Comm. Engg


Lovely Professional University Phagwara140 401, Punjab (India)

Ref:__________

Dated: 27/04/2012

Certificate
Certified that this project entitled speech compression and decompression submitted by saurabh lohani (10808634), pankaj singh negi (10807965) students of Electronics & Communication Engineering Department, Lovely Professional University, Phagwara Punjab in the partial fulfillment of the requirement for the award of Bachelors of Technology (Electronics & Communication Engineering) Degree of LPU, is a record of students own study carried under my supervision & guidance.

This report has not been submitted to any other university or institution for the award of any degree.

Name of mentor Miss. Lovleen kaur

Acknowledgement

We would like to express our deep sense of gratitude and indebtedness to Miss. Lovleen kaur mam who guided us at all stages in the preparation of this dissertation. This project would no have been possible without her valuable suggestion and encouragement. It would not be out of place to mention here that my revered parents have always been a great source of inspiration to me. My head bows in obeisance to them. We are highly appreciative of all others who directly or indirectly contributed to its completion. Last but not the least all that I am capable of doing I owe to THE ALMIGHTY.

Saurabh lohani(10808634) Pankaj singh negi (10807965)

Abstract
The objective of the project is to develop speech compression and decompression system using ADSP-2105/2115 processor. It is proposed to employ ADPCM (Adaptive Pulse Code Modulation) for Compression and Decompression.The analog speech signal is digitized by sampling. For maintaining the voice quality, each sample has to be represented by 13 or 16 bits. In compression technique the digitized samples are represented by an equivalent 4 to 8 bit samples. In decompression the compressed samples are expanded back to original sample size and converted back to analog signals. the main focus of any speech recognition system (SRS) is to facilitate and improve the direct audible man- machine communication and provide an alternative access to machines, a speech compression system (SCS) focuses on reducing the amount of redundant data while preserving the integrity of signals. The compression of speech signals has many practical applications. One example is in digital cellular technology where many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer messages to be stored than otherwise.

TABLE OF CONTENTS
1. 2.
3. 4. 5.

Introduction................................................................... Speech representation


compression and decompression algorithms Differential pulse code modulation(DPCM).. Adaptive differential pulse code modulation(ADPCM)

6.
7. 8. 9.

hardware requirements..
Functioning. Application.. Software implementation.

10. MATLAB source code. 11. Sampled speech signal 12. Calculating threshold.. 13. voice ,unvoiced and mixed speech frames.. 14. Performance measure 1. signal to noise ratio 2.peak signal to noise ratio 3. normalized root mean square 4. retained signal energy 5. compression ratios 15. Future work.. 1. Enhancing quality 2. Improving compression ratio 16. conclusion

Introduction
Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4kHz, speech can convey information with the emotion of a human voice. People want to be able to hear someone.s voice from anywhere in the world . as if the person was in the same room.As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission. Today applications of speech coding and compression have become very numerous. Many applications involve the real time coding of speech signals, for use in mobile satellite communications, cellular telephony, and audio for videophones or video teleconferencing systems. Other applications include the storage of speech for speech synthesis and playback, or for the transmission of voice at a later time. Some examples include voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software. Traditionally speech coders can be classified into two categories: waveform coders and analysis/synthesis vocoders (from .voice coders.). Waveform coders attempt to copy the actual shape of the signal produced by the microphone and its associated analogue circuits [9]. A popular waveform coding technique is pulse code modulation (PCM), which is used in telephony today. Vocoders use an entirely different approach to speech coding, known as parametercoding, or analysis/synthesis coding where no attempt is made at reproducing the exact speech waveform at the receiver, only a signal perceptually equivalent to it. These systems provide much lower data rates by using a functional model of the human speaking mechanism at the receiver. One of the most popular techniques for analysissynthesis coding of speech is called Linear Predictive Coding (LPC). Some higher quality vocoders include RELP (Residual Excited Linear Prediction) and CELP (Code Excited Linear Prediction) . This project looks at a new technique for analysing and compressing speech signals using wavelets. Very simply wavelets are mathematical functions of finite duration with an average value of zero that are useful in representing data or other functions. Any signal can be represented by a set of scaled and translated versions of a basic function called the .mother wavelet.. This set of wavelet functions forms the wavelet coefficients at different scales and positions and results from taking the wavelet ransform of the original signal. The coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corresponding wavelet coefficients. Speech is a non-stationary random process due to the time varying nature of the human speech production system. Non-stationary signals are characterised by numerous transitory drifts, trends and abrupt changes. The localisation feature of wavelets, along with its time-frequency resolution properties makes them well suited for coding speech signals.

SPEECH REPRESENTATIONS

Extracting information from a speech signal to be used in a recognition engine or for compression purposes relies usually on transforming such a signal to a different domain than its original state. Although, processing a signal in the time domain can be beneficial to obtain measures such as zero crossing and others, most important properties of the signal resides in the timefrequency and time-scale domains. Thissection contains a review and a comparison of the different methods and techniques that allow such extractions. In this paper, x(t) represents the continuous speech signal to be analyzed. In order to digitally process a signal x(t), it has to be sampled at a certain rate. 20000 Hz is a standard sampling frequency for the Digits and the English alphabets in. To make the distinction in the representation with the digitized signals, the latter is referred to as x(m). Most speech processing schemes assume slow changes in the properties of speech with time, usually every 10-30 milliseconds. This assumption influenced the creation of short time processing, which suggests the processing of speech in short but periodic segments called analysis frames or just frames. Each frame is then represented by one or a set of numbers, and the speech signal has then a new time-dependent representation. In many speech recognition systems like the ones introduced in, frames of size 200 samples and a sampling rate of 8000 Hz (i.e., 200 1000/8000 = 25 milliseconds) are considered. This segmentation is not error free since it creates blocking effects that makes a rough transition in the representation (or measurements) of two consecutive frames. To remedy this rough transition, a window is usually applied to data of twice the size of the frame and overlapping 50% the consecutive analysis window. This multiplication of the frame data by a window favors the samples near the center of the window over those at the ends resulting into a smooth representation. If the window length is not too long, the signal properties inside it remains constant. Taking the Fourier Transform of the data samples in the window after adjusting their length to a power of 2, so one can apply the Fast Fourier Transform , results in time-dependent Fourier transform which reveals the frequency domain properties of the signal .The spectrogram is the plot estimate of the short-term frequency content of the signals in which a three-dimensional representation of the speech intensity, in different frequency bands, over time is portrayed . The vertical dimension corresponds to frequency and the horizontal dimension to time. The darkness of the pattern is proportional to the energy of the signal. The resonance frequencies of the vocal tract appear as dark bands in the spectrogram . Mathematically, the spectrogram of a speech signal is the magnitude square of the Short Time Fourier Transform of that signal . In the literature one can find many different windows that can be applied to the frames of speech signals for a short-term frequency analysis. Three of them are depicted in Figure

Compression and Decompression Algorithms


The simplest way to realize, for example, a voice recorder is to store the A/D conversion results (e.g., 12-bit samples) directly in the flash memory. Most of the time, the audio data does not use the complete A/D converter range, which means that redundant information is stored in the flash memory. Compression algorithms remove this redundant information, thereby reducing the data that must be stored. Adaptive differential pulse code modulation (ADPCM) is such a compression algorithm. Various ADPCM algorithms exist, and differential coding and adaptation of the step-size of the quantizer scheme is common to all. Before taking a closer look at the IMA ADPCM algorithm, which is used in the associated code, a short description of the differential PCM coding is given.

Differential Pulse Code Modulation (DPCM)


DPCM encodes the analog audio input signal using the difference between the current and the previous sample. Figure 1 shows a DPCM encoder and decoder block diagram. In this example, the signal difference, d(n), is determined using a signal estimate, Se(n), rather than the previous input. This ensures that the encoder uses the same information available to the decoder. If the true previous input sample were used by the encoder, an accumulation of quantization errors could occur. This leads to a drift of the reconstructed signal from the original input signal. By using a signal estimate as shown in Figure 1, the reconstructed signal, Sr(n), can be prevented from drifting from the original input signal. The reconstructed signal, Sr(n), is the input to the predictor, which determines the next signal estimate, Se(n+1).

Figure 2 shows a small part of a recorded audio stream. Analog audio input samples (PCM values) and the differences between successive samples (DPCM values) are compared in the two diagrams in Figure 2. The range of the PCM values is between 26 and 203, with a delta of 177 steps. The encoded DPCM values are within a range of 44 and 46, with a delta of 90 steps. Despite a quantizer step size of one, this DPCM encoding already shows a compression of the input data. The range of the encoded DPCM values could be further decreased by selecting a higher quantizer step size.

Adaptive Differential Pulse Code Modulation (ADPCM)


ADPCM is a variant of DPCM that varies the quantization step size. Amplitude variations of speech input signals are seen between different speakers or between voiced and unvoiced segments of the speech input signal. The adaptation of the quantizer step size takes place every sample and ensures equal encoding efficiency for both low and high input signal amplitudes. Figure 3 shows the modified DPCM block diagram including the step-size adaptation.

The ADPCM encoder calculates the signal estimate, (Se), by decoding the ADPCM code. This means that the decoder is part of an ADPCM encoder. Hence, the encoded audio data stream can only be replayed using the decoder. This means that the decoder must track the encoder. The initial encoder and decoder signal estimate level, as well as the step-size adaptation level, must be defined before starting encoding or decoding. Otherwise, the encoded or decoded value could exceed the scale. HARDWARE The objective of the project is to develop speech compression and decompression system using ADSP-2105/2115 processor. It is proposed to employ ADPCM (Adaptive Pulse Code Modulation) for Compression and Decompression. The analog speech signal is digitized by sampling. For maintaining the voice quality, each sample has to be represented by 13 or 16 bits. In compression technique the digitized samples are represented by an equivalent 4 to 8 bit samples. In decompression the compressed samples are expanded back to original sample size and converted back to analog signals. The Hardware consists of DSP processor ADSP 2105/2115 as CPU, CODEC, EPROM RAM, Amplifier sections, Mic and speaker. The CODEC has been interfaced to ADSP processor through its serial port. The optional hardware include PC serial port interface consisting of serial I/O port and RS-232 level converter .The TTL logic levels of serial port are converted to RS232 level using level converter, so that the system can directly communicate with the standard serial port (com1/com2) of personal computer.

MIC

RAM

AMPLIFIER SPEAKER

CODEC
SYSTEM BUS

EPROM

RESET INTERRUPT CLOCK

SERIAL PORT

BUFFER
RS-232

RS-232 ADSP

PC BUS

LPORT
2105/2115 LATCH

LEVEL CONVERTER OPTIONAL HARDWARE

BLOCK DIAGRAM OF SPEECH COMPRESSION AND DECOMPRESSION USING ADSP-2105/2115

FUNCTIONING: The system is operated through the reset and interrupt switch. Once the system is resetted it will be ready to accept the speech signals through the Mic and CODEC. The analog speech signals are amplified by the pre-amplifier and fed to the CODEC for analog to digital conversion .The CODEC transmits the digitized signal to the ADSP 2105/2115 processor, which then compress the speech data using the ADPCM techniques and store in RAM. When the processor is interrupted, it reads the compressed data from RAM expands the data and send the data to CODEC. The CODEC converts the digital data to analog signal, which is amplified and output through the speaker. APPLICATION: The speech compression and decompression techniques are implemented in the applications like :1. 2. 3. 4. 5. Cellular phones Voice mail transmission Speech recognition system Voice storage IVRS(Interactive Voice Response System)

Software Implementation
In this project we are making use of MATLAB software to Implement the function and working of this project.MATLAB stands for Matrix Laboratory. According to The Mathworks, its producer, it is a "technical computing environment". We will take the more mundane view that it is a programming language. This section covers much of the language, but by no means all. We aspire to at the least to promote a reasonable proficiency in reading procedures that we will write in the language but choose to address this material to those who wish to use our procedures and write their own programs. MATLAB is one of a few languages in which each variable is a matrix (broadly construed) and "knows" how big it is. Moreover, the fundamental operators (e.g. addition, multiplication) are programmed to deal with matrices when required. And the MATLAB environment handles much of the bothersome housekeeping that makes all this possible. Since so many of the procedures required for Macro-Investment Analysis involve matrices, MATLAB proves to be an extremely efficient language for both communication and implementation MATLAB is a programming environment for algorithm development, data analysis, visualization, and numerical computation. Using MATLAB, you can solve technical computing problems faster than with traditional programming languages, such

as C, C++, and Fortran. You can use MATLAB in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. For a million engineers and scientists in industry and academia, MATLAB is the language of technical computing.

Matlab source code:close all; clear all disp('load speech data'); load speech.dat; lg=length(speech); t=[0:1:lg-1]/8000; disp('loading finished'); disp('mulaw companding') nspeech=speech/(2^15); % 15 bits mu=input('input mu =>'); for x=1:lg munspeech(x) =mulaw(nspeech(x),1,mu); %mulaw compression end disp('finished mu-law companding'); disp('start to quantization') bits = input('input bits=>'); for x=1:lg [pq uindx(x)]= midtread(bits,1,nspeech(x)); [pq muindx(x)]= midtread(bits,1,munspeech(x)); end %% % transmission % disp('expander'); for x=1:lg qunspeech(x)= mtrdec(bits,1,uindx(x)); qmunspeech(x)=mtrdec(bits,1,muindx(x)); end for x=1:lg expnspeech(x)= muexpand(qmunspeech(x),1,mu); end quspeech=qunspeech.*2^15; qspeech =expnspeech.*2^15; disp('finished')

qerr = speech-qspeech; subplot(2,1,1),plot(t, speech, 'w', t, qspeech, 'c',t,qspeech-speech,'r');grid subplot(2,1,2),plot(t, speech, 'w', t, quspeech,'b',t,quspeech-speech,'r');grid disp('speech:orginal data 15 bits'); disp('quspeech: PCM in quantized'); disp('qspeech: mulow deccoded'); disp('SNR speech and qspeech'); snr(speech,qspeech); disp('SNR speech quspeech'); snr(speech,quspeech);
function qvalue = mulaw(vin, vmax, mu) vin = vin/vmax; qvalue = vmax*sign(vin)*log(1+mu*abs(vin))/log(1+mu); function rvalue = muexpand(y,vmax, mu) y=y/vmax; rvalue=sign(y)*(vmax/mu)*((1+mu)^abs(y) -1); function [ pq, indx ] = midtread(NoBits,Xmax,value) % function [pq indx] = midtread(NoBits, Xmax, value) % this routine is created for simulation of uniform quatizer. % % NoBits: number of bits used in quantization. % Xmax: overload value. % value: input to be quantized. % pq: output of quantized value % indx: codeword integer % Note: the midtread method is used in this quantizer. %

if NoBits == 0

pq = 0; indx=0; else delta = 2*abs(Xmax)/(2^NoBits-1); Xrmax=delta*(2^NoBits/2-1); if abs(value) >= Xrmax tmp = Xrmax; indx=(2^NoBits/2-1); else tmp = abs(value); end indx=round(tmp/delta); pq =round(tmp/delta)*delta; if value < 0 pq = -pq; indx=-indx; end end function pq = mtrdec(NoBits,Xmax,indx) % function pq = mtrdec(NoBits, Xmax, indx) % this routine is created for simulation of uniform quatizer. % % NoBits: number of bits used in quantization. % Xmax: overload value % pq: output of quantized value % indx: codeword integer

% Note: the midtread method is used in this quantizer.

if NoBits == 0 pq = 0; else delta = 2*abs(Xmax)/(2^NoBits-1); pq=indx*delta; end function snr = calcsnr(speech, qspeech) function snr = calcsnr(speech, qspeech) % this routine is created for calculation of SNR % speech: original speech waveform.

% qspeech: quantized speech. % snr: output SNR in dB. % % Note: midrise method is used in this quantizer. % qerr = speech-qspeech; snr = 10*log10(sum(speech.*speech)/sum(qerr.*qerr)) % Waveform coding using DCT and MDCT for a block size of 16 samples % main program close all; clear all load speech.dat % create scale factors N=16; % block size scalef4bits=sqrt(2*N)*[1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768]; % provided by the instructor

scalef3bits=sqrt(2*N)*[256 512 1024 2048 4096 8192 16384 32768]; scalef2bits=sqrt(2*N)*[4096 8192 16384 32768]; scalef1bit=sqrt(2*N)*[16384 32768]; scalef=scalef2bits; nbits =3; % ensure the block size to be 16 samples. x=[speech zeros(1,16-mod(length(speech),16))]; Nblock=length(x)/16; DCT_code=[]; scale_code=[]; % encoder for i=1:Nblock xblock_DCT=dct(x((i-1)*16+1:i*16)); diff=(scalef-(max(abs(xblock_DCT)))); iscale(i)=min(find(diff=min(diff(find(diff>=0))))); %find a scale factor xblock_DCT=xblock_DCT/scalef(iscale(i)); % scale the input vector for j=1:16 [DCT_coeff(j) pp]=biquant(nbits,-1,1,xblock_DCT(j)); end DCT_code=[DCT_code DCT_coeff ]; end %decoder Nblock=length(DCT_code)/16; xx=[]; for i=1:Nblock DCT_coefR=DCT_code((i-1)*16+1:i*16); for j=1:16

xrblock_DCT(j)=biqtdec(nbits,-1,1,DCT_coefR(j)); end xrblock=idct(xrblock_DCT.*scalef(iscale(i))); xx=[xx xrblock]; end % Transform coding using MDCT xm=[zeros(1,8) speech zeros(1,8-mod(length(speech),8)), zeros(1,8)]; Nsubblock=length(x)/8; MDCT_code=[]; % encoder for i=1:Nsubblock xsubblock_DCT=wmdct(xm((i-1)*8+1:(i+1)*8)); diff=(scalef-max(abs(xsubblock_DCT))); iscale(i)= iscale(i)=min(find(diff=min(diff(find(diff>=0))))); %find a scale factor xsubblock_DCT=xsubblock_DCT/scalef(iscale(i)); % scale the input vector for j=1:8 [MDCT_coeff(j) pp]=biquant(nbits,-1,1,xsubblock_DCT(j)); end MDCT_code=[MDCT_code MDCT_coeff]; end %decoder % recover thr first subblock Nsubblock=length(MDCT_code)/8; xxm=[]; MDCT_coeffR=MDCT_code(1:8); for j=1:8

xmrblock_DCT(j)=biqtdec(nbits,-1,1,MDCT_coeffR(j)); end xmrblock=wimdct(xmrblock_DCT*scalef(iscale(1))); xxr_pre=xmrblock(9:16) % recovered first block for overlap and add for i=2:Nsubblock MDCT_coeffR=MDCT_code((i-1)*8+1:i*8); for j=1:8 xmrblock_DCT(j)=biqtdec(nbits,-1,1,MDCT_coeffR(j)); end xmrblock=wimdct(xmrblock_DCT*scalef(iscale(i))); xxr_cur=xxr_pre+xmrblock(1:8); % overlap and add xxm=[xxm xxr_cur]; xxr_pre=xmrblock(9:16); end % set for the next overlap

subplot(3,1,1);plot(x,'k');grid; axis([0 length(x) -10000 10000]) ylabel('Original signal'); subplot(3,1,2);plot(xx,'k');grid;axis([0 length(xx) -10000 10000]); ylabel('DCT coding') subplot(3,1,3);plot(xxm,'k');grid;axis([0 length(xxm) -10000 10000]); ylabel('W-MDCT coding'); xlabel('Sample number'); function [ tdac_coef ] = wmdct(ipsig) % % This function transforms the signal vector using the W-MDCT % usage:

% ipsig: inpput signal block of N samples (N=even number) % tdac_coe: W-MDCT coefficents (N/2 coefficients) % N = length(ipsig); NN =N; for i=1:NN h(i) = sin((pi/NN)*(i-1+0.5)); end for k=1:N/2 tdac_coef(k) = 0.0; for n=1:N tdac_coef(k) = tdac_coef(k) + ... h(n)*ipsig(n)*cos((2*pi/N)*(k-1+0.5)*(n-1+0.5+N/4)); end end tdac_coef=2*tdac_coef;

function [ opsig ] = wimdct(tdac_coef) % % This function transform the W-MDCT coefficients back to the signal % usage: % tdac_coeff: N/2 W-MDCT coeffcients % opsig: output signal black with N samples % N = length(tdac_coef); tmp_coef = ((-1)^(N+1))*tdac_coef(N:-1:1); tdac_coef = [ tdac_coef tmp_coef];

N = length(tdac_coef); NN =N; for i=1:NN f(i) = sin((pi/NN)*(i-1+0.5)); end for n=1:N opsig(n) = 0.0; for k=1:N opsig(n) = opsig(n) + ... tdac_coef(k)*cos((2*pi/N)*(k-1+0.5)*(n-1+0.5+N/4)); end opsig(n) = opsig(n)*f(n)/N; end

Openfile.m function sdata = openfile(fName); % openfile : function to read a speech file with a .od extension % call syntax: sdata = openfile(fName); % -------------------------------% Read sound file data into a column vector sdata = dlmread(fName); Play.m function play(M); % PLAYFILE: Plays a sound file which is stored as a vector % call syntax: playfile(M); % -----------------------% Play sound file soundsc(M, 8000, 8); Main.m % Speech Compression Simulation Program % User Inputs fileName = c:\program files\matlab\work\s180.od; wavelet = db10; % Compress speech

[tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet); % Decompress speech rS = decompress(tC,tL, wavelet); % Performance calculations [SNR, PSNR, NRMSE] = pefcal(fileName, rS); Compress.m function [tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet); % Compress : compresses speech signals wavelet coefficients % Inputs: speech signal file name, wavelet % Outputs: compressed coefficients, length vector, compression score % and retained energy % Call syntax: [tC, tL, PZEROS, PNORMEN] = compress(fileName, wavelet); % -------------------------------------% Initialise other variables N = 5; % level of decomposition ALPHA = 1.5; % compression parameter SORH = h; % hard thresholding % Read speech file sdata = openfile(fileName); % Compute the DWT to level N [C,L] = wavedec(sdata,N,wavelet); % Calculate level dependent thresholds [THR,NKEEP] = lvlThr(C,L,ALPHA); % Compress signal using hard thresholding %[XC,CXC,LXC,PERF0,PERFL2] = Trunc(lvd,C,L,wavelet,N,THR,SORH); % Encode coefficients cC = encode(CXC); % Transmitted coefficients; tC = cC; % Transmitted coefficients vector length tL = L; % Percentage of zeros PZEROS = PERF0; % Retained energy PNORMEN = PERFL2; % Compression ratio with encoding CompRatio = length(sdata)/length(tC) Decompress.m function rSignal = decompress(tC,tL, wavelet);

% Decompress : uncompress DWT coefficients and reconstructs signal % Inputs: encoded wavelet coefficients, coeff vector length % Output: reconstructed signal % Call syntax: rSignal = decompress(tC,tL, wavelet); % ----------------------------------% Decode coefficients rC = decode(tC); % Reconstruct signal from coefficients rSignal=waverec(tC,tL,wavelet); % ----------------------------% Initialise variables zeroseq = flse; % True if previous array entry was a zero zerocount = 0; % Count of no of zeros in sequence j= 1; % Start index value for compressed coefficients compC = [ ]; % compressed coefficients vector % Start iterating thru array for m=1:length(C) if (C(m) == 0) & (zeroseq == flse) % First zero compC = [compC C(m)]; j = j+1; zeroseq = true; zerocount = 1; % Reached end of array and last value is zero if m == length(C) compC = [compC zerocount]; end elseif (C(m) == 0) & (zeroseq == true) % Sequence of zeros zerocount = zerocount + 1; % Reached end of array and last value is zero if m == length(C) compC = [compC zerocount]; end elseif (C(m) ~= 0) & (zeroseq == true) % End of zeros compC = [compC zerocount C(m)]; j = j+2; zeroseq = flse; zerocount = 0; else % Non-zero entry compC = [compC C(m)];

j = j+1; end end cC = compC; Decode.m function rC = Decode(cC); % Decode: function to decode consecutive zero valued coefficients % Call syntax: rC = Decode(cC); % ---------------------------% Initialise variables dcompC = [ ]; % Empty reconstructed coefficients array i = 1; % Initial index of loop % Start iterating thru array while i <=length(cC) if cC(i) ~= 0 % Non-zero entry dcompC = [dcompC cC(i)]; i = i + 1; else % Zero entry count = cC(i+1); for m=1:count % Add zeros dcompC = [dcompC 0]; end i = i + 2; end end rC = dcompC; Pefcal.m function [SNR, PSNR, NRMSE] = pefcal(fileName, rS); % Pefcal: Peformance Calculations function file % Calculates Signal to Noise Ratio, Peak Signal to Noise Ratio % and Normalized Root Mean Square Error % Get original speech signal origdata = openfile(fileName); % Resize reconstructed signal for the mathematics to work rS = rS(1:length(origdata)); % Signal to Noise Ratio sqdata = origdata.^2; % Square of original speech signal sqrS = rS.^2; % Square of reconstructed signal msqdata = mean(sqdata); % Mean square of speech signal sqdiff = (sqdata-sqrS); % Square difference

msqdiff = mean(sqdiff); % Mean square difference SNR = 10*log10(msqdata/msqdiff); % Signal to noise ratio % Peak Signal to Noise Ratio N = length(rS); % Length of reconstructed signal X = max(abs(sqdata)); % Maximum absolute square of orig signal diff = origdata - rS; % Difference signal endiff = (norm(diff))^2; % Energy of the difference between the % original and reconstructed signal PSNR = 10*log10((N*(X^2))/endiff); % Peak Signal to noise ratio % Normalised Root Mean Square Error diffsq = diff.^2; % Difference squared mdiffsq = mean(diffsq); % Mean of difference squared mdata = mean(origdata); % Mean of original speech signal scaledsqS = (origdata - mdata).^2; % Squared scaled data mscaledsqS = mean(scaledsqS); % Mean of squared scaled data NRMSE = sqrt(mdiffsq/mscaledsqS); % Normalized Root Mean Square Error Comp.m function [tC, tL, PZEROS, PNORMEN, cScore, nFrames] = comp(fileName, wavelet, N, frameSize) % Comp: function simulates real time compression of speech signals % Inputs: speech signal file name, wavelet and frame size % If frame size is 0 no frames are used % Outputs: compressed coefficients and compression ratio % Call Syntax: [tC, tL, PZEROS, PNORMEN, cScore, nFrames] = comp(fileName, wavelet, N, frameSize) % Calculate no of frames fileSize = FileSize(fileName); if frameSize == 0 frameSize = fileSize; end numFrames = ceil(fileSize/frameSize); % Initialise other variables %tC = [ ]; % transmitted coefficients vector tXC = [ ]; % uncompressed coefficients vector %lenOrigC = 0; % length of original coefficients PERF0V = [ ]; % vector of % truncation for each frame PERFL2V = [ ]; % vector of % retained energy for each frame for i=1:numFrames % Read a frame from the speech file sdata = FrameSelect(i,frameSize,fileName, fileSize);

% Compute the DWT to level N [C,L] = wavedec(sdata,N,wavelet); % Calculate default thresholds [THR, SORH, KEEPAPP] = gblThr(cmp,wv,sdata); SORH = h; KEEPAPP = 0; % Can threshold approximation coefficients also % Compress signal using hard thresholding [XC,CXC,LXC,PERF0,PERFL2]=Trunc(gbl,C,L,wavelet,N,THR,SORH,KEEPAPP); % Encode coefficients cC = encode(CXC); % Transmitted coefficients tXC = [tXC cC]; % Truncation % Vector PERF0V = [PERF0V PERF0]; % Retained Energy Vector PERFL2V = [PERFL2V PERFL2]; end % Return Values tC = tXC; tL = tXC; PZEROS = mean(PERF0V); PNORMEN = mean(PERFL2V); cScore = fileSize/length(tC); nFrames = numFrames; Decomp.m function rSignal = decomp(tC,tL,wavelet,numFrames,frameSize); % Decomp: function simulates real time decoding of signals % Inputs: encoded wavelet coefficients, coeff vector length % Call syntax: rSignal = decompress(tC,tL,numFrames,frameSize); % Outputs: reconstructed signal % Call Syntax: rSignal = decomp(tC,tL,wavelet,numFrames,frameSize); % Initialise other variables rS = [ ]; % reconstructed signal frameSize = sum(tL)-frameSize; % frame size of DWT coefficients % Decode coefficients rC = decode(tC); for i=1:numFrames % Range of frame R1 = (i-1)*frameSize + 1; R2 = i*frameSize;

% Read coefficients in frame fC = rC(R1:R2); % Reconstruct frame signal X = waverec(fC,tL,wavelet); % Total reconstructed signal rS=[rS; X]; end % Return output rSignal = rS; Filesize.m function fSize = FileSize(fName); % FileSize: counts no of samples in a speech file % Call syntax: fSize = FileSize(fName); % --------------------data = OpenFile(fName); fSize=length(data); Frameselect.m function v = FrameSelect(fNum,fSize,fileName,fileSize); % FrameSelect : reads a frame of data from a speech file into a column vector % call syntax: v = FrameSelect(fNum,fSize,fileName); % -------------------------------% Read the corresponding frame from the sound file into a column vector % range = [R1 C1 R2 C2] C1 = C2 = 0 since only one column % R1 = First Value, R2 = Last Value R1 = fSize*(fNum-1); R2 = (fSize*fNum - 1); R3 = R2; % Adjust range value for last frame if R2 >= fileSize R2 = fileSize - 1; end range = [R1 0 R2 0]; v = dlmread(fileName,,range); % If data for last frame is smaller than frame size % Zero pad the frame if R3~=R2 N = (R3-R2); for i= 1:N v = [v;0]; end

end Optimal.m % Optimal Wavelet For Speech Compression % This script file determines the percentage of Speech Frame Energy % Concentrated by wavelets in the first N/2 Coefficients % -----------------------------------------------------------------% Inputs: speech signal file name, wavelet and frame size % Outputs: compressed coefficients and compression ratio % User Inputs fileName = c:\program files\matlab\work\s180.od; wavelet = db10; frameSize = 160; % Calculate no of frames fileSize = FileSize(fileName); if frameSize == 0 frameSize = fileSize; end numFrames = ceil(fileSize/frameSize); % Vector to Store Retained Energy of Each Frame PREV = [ ]; % Step thru each frame and calculate Retained Energy %for i=1:numFrames % Read a frame from the speech file sdata = FrameSelect(8,frameSize,fileName, fileSize); % Compute the DWT to level 5 [C,L] = wavedec(sdata,5,wavelet); % Calculate Energy Retained in first N/2 Coefficients xC = C(1:(length(C)/2)); RE = 100*(norm(xC))^2/(norm(C))^2; PREV = [PREV ; RE]; %end PREV Voiced.m % Vocied, Unvoiced and Mixed Frames % This script file plots frames % -----------------------------------------------------------------% Inputs: speech signal file name, wavelet and frame size % Outputs: compressed coefficients and compression ratio % User Inputs fileName = c:\program files\matlab\work\s180.od;

wavelet = db10; frameSize = 1024; % Calculate no of frames fileSize = FileSize(fileName); if frameSize == 0 frameSize = fileSize; end numFrames = ceil(fileSize/frameSize); % Read frame i from the speech file i = 9; sdata = FrameSelect(i,frameSize,fileName, fileSize); % Compute the DWT to level 5 [C,L] = wavedec(sdata,5,wavelet); % Calculate Energy Retained in first N/2 Coefficients xC = C(1:(length(C)/2)); RE = 100*(norm(xC))^2/(norm(C))^2; % Plot frame and wavelet transfrom coefficients subplot(2,1,1); plot(sdata,r); title(Mixed Speech Segment); subplot(2,1,2); plot(C); title(DWT Coefficients Using Db10 Wavelet);

Result

Sampled Speech Signals


The sample speech files used for compression are .OD files. These files contain discrete signal values, which can be easily read in and played by Matlab at a sampling frequency of 8 KHz. Alternatively WAV files could also be used and processed. Matlab 6 uses an interpreter to run any code written without actually compiling the code. Due to this it is far to slow to design and implement a real time speech coder in Matlab alone. Thus real time speech coding could only be simulated by dividing the input sample speech into frames of 20 ms (160 samples) and then decomposing and compressing each frame. For processing recorded speech however larger frame sizes can be used. Since the speech files used in this design are of very short duration (few seconds), the entire speech vector could be decomposed without dividing it into frames.

Calculating Thresholds
For the truncation of small-valued transform coefficients, two different thresholding techniques are used, Global Thresholding and By-Level Thresholding. The aim of Global Thresholding is to retain the largest absolute value coefficients, regardless of the scale in the wavelet decomposition tree. Global thresholds are calculated by setting the % of coefficients to be truncated. Level dependent thresholds are calculated using the Birge-Massart strategy [15]. This thresholding scheme is based on an approximation result from Birge and Massart and is well suited for signal compression. This strategy keeps all of the approximation coefficients at the level of decomposition J.The number of detail coefficients to be kept at level i starting from 1 to J are given by the formula:

is a compression parameter and its value is typically 1.5. The value of M denotes the how scarcely distributed the wavelet coefficients are in the transform vector. If L denotes the length of the coarsest approximation coefficients then M takes on the values in Table 4.1, depending on the signal being analysed.

Thus this approach to thresholding selects the highest absolute valued coefficients at each level.

Encoding Zero-Valued Coefficients


After zeroing wavelet coefficients with negligible values based on either calculating threshold values or simply selecting a truncation percentage, the transform vector needs to be compressed. In this implementation consecutive zero valued coefficients are encoded with two bytes. One byte is used to specify a starting string of zeros and the second byte keeps track of the number of successive zeros. Due to the scarcity of the wavelet representation of the speech signal, this encoding method leads to a higher compression ratios than storing the non-zero coefficients along with their respective positions in the wavelet transform vector, as suggested in

the Literature Review. This encoding scheme is the primary means of achieving signal compression. In Matlab however, the coding of this compression algorithm using vectors results in relatively slow performance, with unacceptable delays for real time speech coding. This encoding process can be speeded up significantly by programming it in another language such as C++

Voiced, Unvoiced and Mixed Speech Frames


From the previous speech signal analysed three different types of speech segments can be identified, based on the amount of energy the wavelet concentrates in the first N/2 coefficients. The figures below show each speech frame with its wavelet decomposition at level 5, using the Daubechies 10 wavelet. The structure of the plotted wavelet transform vector is [cA5, cD5, cD4, cD3, cD2, cD1] and the lengths of the respective coefficients are given by (50, 50, 81, 144, 270, 521). The total length of the coefficients is 1116, which is greater than the frame size of 1024.

Performance Measures:
A number of quantitative parameters can be used to evaluate the performance of the wavelet based speech coder, in terms of both reconstructed signal quality after decoding and compression scores. The following parameters are compared: ! Signal to Noise Ratio (SNR), ! Peak Signal to Noise Ratio (PSNR), ! Normalised Root Mean Square Error (NRMSE), ! Retained Signal Energy ! Compression Ratios

The results obtained for the above quantities are calculated using the following formulas: 1. Signal to Noise Ratio

is the mean square of the speech signal and original and reconstructed signals. Peak Signal to Noise Ratio

2e

is the mean square difference between the

N is the length of the reconstructed signal, X is the maximum absolute square value of the signal x and 2 x r is the energy of the difference between the original and reconstructed signals. Normalised Root Mean Square Error

x(n) is the speech signal, r(n) is the reconstructed signal, and x(n) is the mean of the speech signal. Retained Signal Energy

x( n ) is the norm of the original signal and r( n ) is the norm of the reconstructed signal. For one-dimensional orthogonal wavelets the retained energy is equal to the L2-norm recovery performance.

Compression Ratio

cWC is the length of the compressed wavelet transform vector.

Future Work Enhancing Quality


Listening tests conducted on a male spoken sentence .Cats and dogs each hate the other., using different wavelets revealed that the /s/ sound in the word .dogs. tends to be slightly distorted and if not heard carefully can be mistaken for just the singular term. The /s/ sound is an unvoiced excitation. The transforms for three different frames from this speech signal are analysed in section . Figure sows that for an unvoiced speech frame the wavelet coefficients are spread across all frequency bands. Therefore this frame will not undergo significant compression when a threshold is applied. If a relatively low value is used for the truncation threshold, the reconstructed signal frame will be severely distorted. Thus wavelets are inefficient at coding unvoiced speech frames. By detecting unvoiced speech frames and directly encoding them using some form of bit encoding, like entropy coding, no unvoiced data is lost and there is only a marginal increase in the bit rate . A cheap wavelet with few vanishing moments can be used to detect voiced or unvoiced speech frames, using the technique suggested in the Literature.

Improving Compression Ratios


Further data compaction is possible by exploiting the redundancy in the encoded transform coefficients. A bit encoding scheme could be used to represent the data more efficiently. A common loss-less coding technique is Entropy coding. Two common entropy coding schemes are Prefix coding and tree-structured Huffman coding. Both these forms of entropy coding require a prior knowledge of the nature of the source data, such as probability distribution of the source output data . In practice however, probabilistic models are usually not known a priori. Thus a model of the data must be constructed from the data set itself. An example of a data compaction code that encodes directly from the data stream without constructing an explicit model is the Ziv-Lempel code . Ziv-Lempel coding is a universal variableto-fixed-length data compaction code that is practical, has good performance and does not require an externally constructed source model. The use of such a scheme in the last stage of the wavelet transform speech coder will enable the transmission of voice at low bit rates.

Conclusion:
speech coding is currently an active topic for research in the areas of Very Large Scale Integrated (VLSI) circuit technologies and Digital Signal Processing (DSP). The Discrete Wavelet Transform performs very well in the compression of recorded speech signals. For real time speech processing however, its performance is not as good. Therefore for real time speech coding it is recommended to use a wavelet with a small number of vanishing moments at level 5 decomposition or less. The wavelet based compression software designed reaches a signal to noise ratio of 17.45 db at a compression ratio of 3.88 using the Daubechies 10 wavelet. The performance of the wavelet scheme in terms of compression scores and signal quality is comparable with other good techniques such as code excited linear predictive coding (CELP) for speech, with much less computational burden. In addition, using wavelets the compression ratio can be easily varied, while most other compression techniques have fixed compression ratios.

S-ar putea să vă placă și