Sunteți pe pagina 1din 60

CHAPTER 1

1
INTRODUCTION

1.1 SPEECH PROCESSING:

1.1.1 Speech Signals


A speech signal consists of three classes of sounds. They are voice, fricative and
plosive sounds. Voiced sounds are caused by excitation of the vocal tract with quasi-periodic
pulses of airflow. Fricative sounds are formed by constricting the vocal tract and passing air
through it, causing Turbulence that result in a noise-like sound. Plosive sounds are created by
closing up the vocal tract, building up air behind it then suddenly releasing it; this is heard in
the sound made by the letter Figure shows a discrete time representation of a speech signal.

By looking at it as a whole we can tell that it is non-stationary. That is, its mean values
vary with time and cannot be predicted using the above mathematical models for random
processes. However, a speech signal can be considered as a linear composite of the above three
classes of sound, each of these sounds are stationary and remain fairly constant over intervals
of the order of 30 to 40 ms. The theory behind the derivations of many adaptive filtering
algorithms usually requires the input signal to be stationary. Although speech is non-stationary
for all time, it is an assumption that the short term stationary behavior outlined above will
prove adequate for the adaptive filters to function as desired

FIG 1.1 Representation of Speech Signal

2
1.1.2 Speech generation

Speech generation and recognition are used to communicate between humans and
machines. Rather than using your hands and eyes, you use your mouth and ears. This is very
convenient when your hands and eyes should be doing something else, such as: driving a car,
performing surgery, or (unfortunately) firing your weapons at the enemy. Two approaches are
used for computer generated speech: digital recording and vocal tract simulation. In digital
recording, the voice of a human speaker is digitized and stored, usually in a compressed form.

During playback, the stored data are uncompressed and converted back into an analog
signal. An entire hour of recorded speech requires only about three megabytes of storage, well
within the capabilities of even small computer systems. This is the most common method of
digital speech generation used today. Vocal tract simulators are more complicated, trying to
mimic the physical mechanisms by which humans create speech. The human vocal tract is an
acoustic cavity with resonate frequencies determined by the size and shape of the chambers.
Sound originates in the vocal tract in one of two basic ways, called voiced and fricative sounds.
With voiced sounds, vocal cord vibration produces near periodic pulses of air into the vocal
cavities. In comparison, fricative sounds originate from the noisy air turbulence at narrow
constrictions, such as the teeth and lips. Vocal tract simulators operate by generating digital
signals that resemble these two types of excitation. The characteristics of the resonate chamber
are simulated by passing the excitation signal through a digital filter with similar resonances.
This approach was used in one of the very early DSP success stories, the Speak & Spell, a
widely sold electronic learning aid for children.

1.1.3 Speech Production:


Speech is produced when air is forced from the lungs through the vocal
cords and along the vocal tract. The vocal tract extends from the opening in the vocal cords
(called the glottis) to the mouth, and in an average man is about 17 cm long. It introduces
short-term correlations (of the order of 1 ms) into the speech signal, and can be thought of as a
filter with broad resonances called formants. The frequencies of these formants are controlled
by varying the shape of the tract, for example by moving the position of the tongue. An
3
important part of many speech codecs is the modeling of the vocal tract as a short-term filter.
As the shape of the vocal tract varies relatively slowly, the transfer function of its modeling
filter needs to be updated only relatively infrequently (typically every 20 ms or so).
The vocal tract filter is excited by air forced into it through the vocal cords. Speech sounds
can be broken into three classes depending on their mode of excitation.
• Voiced sounds are produced when the vocal cords vibrate open and closed, thus
interrupting the flow of air from the lungs to the vocal tract and producing quasi-
periodic pulses of air as the excitation. The rate of the opening and closing gives the
pitch of the sound. Varying the shape of, and the tension in, the vocal cords, and the
pressure of the air behind them can adjust this. Voiced sounds show a high degree of
periodicity at the pitch period, which is typically between 2 and 20 ms. This long-term
periodicity can be seen in Figure 1 which shows a segment of voiced speech sampled at
8 kHz. Here the pitch period is about 8 ms or 64 samples.
• Unvoiced sounds result when the excitation is a noise-like turbulence produced
by forcing air at high velocities through a constriction in the vocal tract while the
glottis is held open. Such sounds show little long-term periodicity as can be seen from
Figures 3 and 4 although short-term correlations due to the vocal tract are still present.
• Plosive sounds result when a complete closure is made in the vocal tract, and air
pressure is built up behind this closure and released suddenly.
Some sounds cannot be considered to fall into any one of the three classes
above, but are a mixture. For example voiced fricatives result when both vocal cord vibration
and a constriction in the vocal tract are present.

Although there are many possible speech sounds which can be produced, the shape of the
vocal tract and its mode of excitation change relatively slowly, and so speech can be
considered to be quasi-stationary over short periods of time (of the order of 20 ms). Speech
signals show a high degree of predictability, due sometimes to the quasi-periodic vibrations of
the vocal cords and also due to the resonances of the vocal tract. Speech coders attempt to
exploit this predictability in order to reduce the data rate necessary for good quality voice
transmission

From the technical, signal-oriented point of view, the production of speech is widely
described as a two-level process. In the first stage the sound is initiated and in the second stage
4
it is filtered on the second level. This distinction between phases has its orgin in the source-
filter model of speech production

FIG 1.2 Source Filter Model of Speech Production

The basic assumption of the model is that the source signal produced at the glottal level is
linearly filtered through the vocal tract. The resulting sound is emitted to the surrounding air
through radiation loading (lips). The model assumes that source and filter are independent of
each other. Although recent findings show some interaction between the vocal tract and a
glottal source (Rothenberg 1981; Fant 1986), Fant's theory of speech production is still used as
a framework for the description of the human voice, especially as far as the articulation of
vowels is concerned.

1.2 What is Speech Processing?


The term speech processing basically refers to the scientific discipline
concerning the analysis and processing of speech signals in order to achieve the best benefit in
various practical scenarios. The field of speech processing is, at present, undergoing a rapid
growth in terms of both performance and applications. The advances being made in the field of
microelectronics, computation and algorithm design stimulate this. Nevertheless, speech
processing still covers an extremely broad area, which relates to the following three
engineering applications:

5
• Speech Coding and transmission that is mainly concerned with man-to man voice
communication;
• Speech Synthesis which deals with machine-to-man communications;
• Speech Recognition relating to man-to machine communication.

1.3 Speech Coding:

Speech coding or compression is the field concerned with compact digital


representations of speech signals for the purpose of efficient transmission or storage. The
central objective is to represent a signal with a minimum number of bits while maintaining
perceptual quality. Current applications for speech and audio coding algorithms include
cellular and personal communications networks (PCNs), teleconferencing, desktop multi-
media systems, and secure communications.

1.4 Speech Synthesis:

The process that involves the conversion of a command sequence or input text (words
or sentences) into speech waveform using algorithms and previously coded speech data is
known as speech synthesis. The inputting of text can be processed through by keyboard,
optical character recognition, or from a previously stored database. A speech synthesizer can
be characterized by the size of the speech units they concatenate to yield the output speech as
well as by the method used to code, store and synthesize the speech. If large speech units are
involved, such as phrases and sentences, high-quality output speech (with large memory
requirements) can be achieved. On the contrary, efficient coding methods can be used for
reducing memory needs, but these usually degrade speech quality.

1.5 Noise Sources

Sources of noise exist throughout the environment. One type of noise is due to
turbulence and is therefore totally random and impossible to predict. Engineers like to look at
6
signals, noise included, in the frequency domain. That is, "How is the noise energy distributed
as a function of frequency?"
These turbulent noises tend to distribute their energy evenly across the frequency bands
and are therefore referred to as "Broadband Noise". Very commonly we come across a word
“white noise” white noise comes under the category of Broadband Noise. White Noise is a
noise having a frequency spectrum that is continuous and uniform over a specified frequency
band.
Note: White noise has equal power per hertz over the specified frequency band. (Synonym
additive white Gaussian noise) Examples of broadband noise are the low frequency noise
from jet planes and the impulse noise of an explosion.
A large number of environmental noises are different. These "Narrow Band Noises"
concentrate most of their noise energy at specific frequencies. When the source of the noise is
a rotating or repetitive machine, the noise frequencies are all multiples of a basic "Noise
Cycle" and the noise is approximately periodic. This "Tonal Noise" is common in the
environment as man made machinery tends to generate it (along with a smaller amount of
broadband noise) at increasingly high levels.

7
CHAPTER 2

8
\ACOUSTIC ECHO CANCELLATION

2.1 ECHO
Echo is a phenomenon where a delayed and distorted version of an original sound or
electrical signal is reflected back to the source. With rare exceptions, conversations take
place in the presence of echoes. Echoes of our speech are heard as they are reflected from
the floor, walls and other neighboring objects. If a reflected wave arrives after a very short
time of direct sound, it is considered as a spectral distortion or reverberation. However, when
the leading edge of the reflected wave arrives a few tens of milliseconds after the direct
sound, it is heard as a distinct echo [1].
Since the advent of telephony echoes have been a problem in communication networks. In
particular, echoes can be generated electrically due to impedance mismatches at various
points along the transmission medium. The most important factor in echoes is called end-
to-end delay, which is also known as latency. Latency is the time between the generation of
the sound at one end of the call and its reception at the other end. Round trip delay, which
is the time taken to reflect an echo, is approximately twice the end-to-end delay.
Echoes become annoying when the round trip delay exceeds 30 ms. Such an echo is
typically heard as a hollow sound. Echoes must be loud enough to be heard. Those less than
thirty (30) decibels (dB) are unlikely to be noticed. However, when round trip delay exceeds
30 ms and echo strength exceeds 30 dB, echoes become steadily more disruptive. However,
not all echoes reduce voice quality. In order for telephone conversations to sound natural,
callers must be able to hear themselves speaking. For this reason, a short instantaneous
echo, termed side tone, is deliberately inserted. The side tone is coupled with the caller’s
speech from the telephone mouthpiece to the earpiece so that the line sounds connected.

2.2 Need for Echo Cancellation


In this new age of global communications, wireless phones are regarded as essential
communications tools and have a direct impact on people’s day-to-day personal and business
communications. As new network infrastructures are implemented and competition between
wireless carriers increases, digital wireless subscribers are
9
becoming ever more critical of the service and voice quality they receive from network
providers. Subscriber demand for enhanced voice quality over wireless networks has
driven a new and key technology termed echo cancellation, which can provide near wire
line voice quality across a wireless network.
Today’s subscribers use speech quality as a standard for assessing the overall quality of a
network. Regardless of whether or not the subscribers’ opinion is subjective, it is the key to
maintaining subscriber loyalty. For this reason, the effective removal of hybrid and acoustic
echoes, which are inherent within the telecommunications network infrastructure, is the key
to maintaining and improving the perceived voice quality of a call. Ultimately, the search
for improved voice quality has led to intensive research into the area of echo cancellation.
Such research is conducted with the aim of providingsolutions that can reduce background
noise and remove hybrid and acoustic echoes before any transcoder processing occurs. By
employing echo cancellation technology, the quality of speech can be improved
significantly. This chapter discusses the overall echo problem. A definition of echo
precedes the discussion of the fundamentals of echo cancellation and the voice quality
challenges encountered in today’s networks.

2.3 Echoes in Telecommunications

Telecommunications is about transferring information from one location to another.


This includes many forms of information: telephone conversations, television signals,
computer files, and other types of data. To transfer the information, you need a channel
between the two locations. This may be a wire pair, radio signal, optical fiber, etc.
Telecommunications companies receive payment for transferring their customer's information,
while they must pay to establish and maintain the channel. The financial bottom line is simple:
the more information they can pass through a single channel, the more money they make. DSP
has revolutionized the telecommunications industry in many areas Signaling tone generation
and detection, frequency band shifting, filtering to remove power line hum, etc. Three specific
examples from the telephone network will be discussed here: multiplexing, compression, and
echo control.
2.4 The Process of Echo Cancellation
An echo canceller is basically a device that detects and removes the echo of the signal
from the far end after it has echoed on the local end’s equipment. In the case of circuit
10
switched long distance networks, echo cancellers reside in the metropolitan
Central Offices that connect to the long distance network. These echo cancellers remove
electrical echoes made noticeable by delay in the long distance network.
An echo canceller consists of three main functional components:
Adaptive filter
Doubletalk detector
Non-linear processo

A brief overview of these components is presented in this chapter.

Input signal x(n)

Doubletalk Fig 2.1


Filtered signal
Doubletalk decision Block
detector

Reference
signal y(n)

Adaptive Filter diagra


m of
generi
c echo
cancell
or

11
Non-Linear Clear signal e(n)
Processor

12
2.4.1 Echo control

Echoes are a serious problem in long distance telephone connections. When you
speak into a telephone, a signal representing your voice travels to the connecting receiver,
where a portion of it returns as an echo. If the connection is within a few hundred miles, the
elapsed time for receiving the echo is only a few milliseconds. The human ear is
accustomed to hearing echoes with these small time delays, and the connection sounds
quite normal. As the distance becomes larger, the echo becomes increasingly noticeable
and irritating. The delay can be several hundred milliseconds for intercontinental
communications, and is particularity objectionable. Digital Signal Processing attacks this
type of problem by measuring the returned signal and generating an appropriate anti signal
to cancel the offending echo.
This same technique allows speakerphone users to hear and speak at the same time
without fighting audio feedback (squealing). It can also be used to reduce environmental
noise by canceling it with digitally generated anti noise.

2.4.2 Echo Location

A common method of obtaining information about a remote object is to bounce a


wave off of it.

For example, radar operates by transmitting pulses of radio waves, and examining
the received signal for echoes from aircraft. In sonar, sound waves are transmitted through
the water to detect submarines and other submerged objects. Geophysicists have long
probed the earth by setting off explosions and listening for the echoes from deeply buried
layers of rock. While these applications shave a common thread, each has its own specific
problems and needs. Digital Signal Processing has produced revolutionary changes in all
three areas.

Radar is an acronym for Radio Detection And Ranging. In the simplest radar system, a
radio transmitter produces a pulse of radio frequency energy a few microseconds long. This
13
pulse is fed into a highly directional antenna, where the resulting radio wave propagates
away at the speed of light. Aircraft in the path of this wave will reflect a small portion of
the energy back toward a receiving antenna, situated near the transmission site. The
distance to the object is calculated from the elapsed time between the transmitted pulse and
the received echo. The direction to the object is found more simply; you know where you
pointed the directional antenna when the echo was received. The operating range of a radar
system is determined by two parameters: how much energy is in the initial pulse, and the
noise level of the radio receiver. Unfortunately, increasing the energy in the pulse usually
requires making the pulse longer. In turn, the longer pulse reduces the accuracy and
precision of the elapsed time measurement. This result in a conflict between two important
parameters: the ability to detect objects at long range, and the ability to accurately
determine an object's distance.
DSP has revolutionized radar in three areas, all of which relate to this basic problem.
First, DSP can compress the pulse after it is received, providing better distance
determination without reducing the operating range. Second, DSP can filter the received
signal to decrease the noise. This increases the range, without degrading the distance
determination. Third, DSP enables the rapid selection and generation of different pulse
shapes and lengths. Among other things, this allows the pulse to be optimized for a
particular detection problem. Now the impressive part: much of this is done at a sampling
rate comparable to the radio frequency used, at high as several hundred megahertz! When it
comes to radar, DSP is as much about high-speed hardware design as it is about algorithms.

2.5 Type of Echoes

In telephone communication, there are two main types of echo:


1- Hybrid (or line)
2- Acoustic echoes.

2.5.1 Hybrid Echo (or line)

The telephone network echo results from the Impedance mismatch at the hybrids of
a public switched telephony network (PSTN) exchange, where the subscriber two-wire
14
lines are connected to four-wire lines. If a communication is just between two fixed
handset telephones, then only the network echo will occur.

The hybrid echo can be and, in fact, is created in the hybrid part. So part of the
signal being sent to the hybrid on the 4-wire side is returned back as the echo superimposed
on the signal being received from the hybrid on the 4-wire side. Line echo path delays are
very short and each hybrid has a single echo path. Echo paths don’t change or change very
slowly over time because of very slow changes of the electrical circuitry parameters and
wire lines parameters in the network. Each hybrid circuit is slightly different each echo tail
is different as well. Many factors determine the echo path. It is even possible for an echo
tail to change while a circuit is active. This could happen when a second telephone
extension is taken off-hook in parallel with the first one.

FIG 2.2 Long-distance phone network with 2-to-4-line


conversion

Due to these variations in echo tails, it is necessary for an echo canceller to adapt to
the tail continuously. Adaptive Filtering is employed within echo cancellers to this end.
The adaptive filters should converge quickly, but not so quickly that they might diverge
under some conditions. This is especially important when a circuit is first established. The
amount of time it takes the echo canceller to adapt to an echo path is referred to as the
"convergence time".

2.5.2 Acoustic echoes

15
The communication is between one or more hands-free telephones (or speaker
phones), then acoustic feedback paths are set up between the telephone's loudspeaker and
microphone at each end. In the case of acoustic echo, if the time delay is not long, then the
echo can be perceived as soft reverberation, which adds artistic quality, for example in a
concert hall. However, a strong echo that arrives a few tens of milliseconds or more after
the initial direct sound will be highly undesirable and irritating. This acoustic coupling is
due to the reflection of the loudspeaker's sound waves from walls, ceiling, windows and
other objects back to the microphone. The coupling can also be due to the direct path from
the loudspeaker to the microphone, see Figure.

Adaptive cancellation of such acoustic echo has become very important in hands-free
communication systems, e.g. tele-conference, video-conference and PC telephony systems.
The effects of an echo depend mostly on the time delay between the initial and reflected
sound waves (or sound signals), and the strength of the reflected sounds.

In the case of acoustic echo, if the time delay is not long, then the echo can be
perceived as soft reverberation, which adds artistic quality, for example in a concert hall.
However, a strong echo that arrives a few tens of milliseconds or more after the initial
direct sound will be highly undesirable and irritating. In the case of network echo, a short
delayed echo cannot be distinguished from the normal side-tone of the telephone, which is
intentionally inserted to make the communication channel sounds \alive". However, an
echo, with a round trip delay of more than 40 msec will cause significant disturbances to
the talker. Such a long delay is caused by the propagation time over long distances and/or
the digital encoding of the transmitted signals.

16
FIG 2.3 Sources of acoustic echo in a room when using a hand-free
telephone
Acoustic echo cancellation is a common occurrence in today telecommunication
systems. It occurs when an audio source and sink operate in full duplex mode, an example
of this is a hands-free loudspeaker telephone. In this situation the received signal is output
through the telephone loudspeaker (audio source); this audio signal is then reverberated
through the physical environment and picked up by the systems microphone (audio sink).
The effect is the return to the distant user of time delayed and attenuated images of their
original speech signal. The signal interference caused by acoustic echo is distracting to
both users and causes reduction In the quality of the communication.

Adaptive filtering techniques to reduce this unwanted echo, thus increasing


communication quality. These echoes can be very annoying to callers. A widely used
technique to suppress echoes is to employ adaptive echo cancellers.

2.6 Adaptive Echo Cancellers

A technique to remove or cancel echoes is shown in Figure. The echo canceller


mimics the transfer function of the echo path (or room acoustic) to synthesize a replica of
the echo, and then subtracts that replica from the combined echo and near-end speech (or
disturbance) signal to obtain the near end signal alone. However, the transfer function is
unknown in practice, and so it must be identified. The solution to this problem is to use an
adaptive filter the method used to cancel the echo signal is known as adaptive filtering.
Adaptive filters are dynamic filters which iteratively alter their characteristics in order
to achieve an optimal desired output. An adaptive filter algorithmically alters its parameters
in order to minimize a unction of the difference between the desired output d(n) and its
actual output y(n). This function is known as the cost function of the adaptive algorithm.
Figure shows a block diagram of the adaptive echo cancellation system implemented
throughout this thesis. Here the filter H(n) represents the impulse response of the acoustic
environment, W(n) represents the adaptive filter used to cancel the echo signal. The
adaptive filter aims to equate its output y(n) to the desired output d(n) (the signal
reverberated within the acoustic environment). At each iteration the error signal, e(n)=d(n)-
y(n), is fed back into the filter, where the filter characteristics are altered accordingly.
17
FIG 2.4 Block diagram of Adaptive Echo Canceller

2.6.1 Choice of Algorithm


A wide variety of recursive algorithms have been developed in the literature for the
operation of linear adaptive filters, In the final analysis, the choice of one algorithm over
another is determined by one or more of the following factors

2.6.2 Rate of Convergence

This is defined as the number of iterations required for the algorithm, in response to
stationary inputs, to converge “close enough” to the optimum wiener solution in the mean-
square error sense. A fast rate of convergence allows the algorithm to adapt rapidly to a
stationary environment of unknown statistics.

2.6.3 Miss adjustment

18
For an algorithm of interest, this parameter provides a quantitative measure of the
amount which the final value of the mean-square error, averaged over an ensemble of
adaptive filters, deviates from the minimum mean-square error produced by the Wiener
filter.

2.6.4 Tracking

When an adaptive filtering algorithm operates in a non-stationary environment. The


algorithm is required to track statistical variations in the environment. Two contradictory
features, however, influence the tracking performance of the algorithm
(1) Rate of convergence, and
(2) steady-state fluctuation due to algorithm noise.

2.6.5 Robustness

For an adaptive filter to be robust, small disturbances (I.e., disturbances with small
energy) can only result in small estimation errors. The disturbances may arise from a
variety of factors, internal or external to the filter.

2.6.6 Computational requirements:

Here the issues of concern include

(a) The number of operations (i.e., multiplications, divisions, and additions/


subtractions)
Required to make one complete iteration of the algorithm.
(b) The size of memory locations required to store the data and the program,
(c) The investment required to program the algorithm on a computer.

2.7 Approach to Develop Linear Adaptive Filter


19
Stochastic Gradient Approach
The stochastic gradient approach uses a tapped-delay line, or transversal filter, as
the structural basis for implementing the linear adaptive filter. For the case of stationary
inputs, the cost function, also referred to as the index of performance, is defined as the
mean square error (i.e., the mean square value of the difference between the desired
response and the transversal filter output). This cost function is precisely a second order
function of the tap weights in the transversal filter.

To develop a recursive algorithm for updating the tap weights of the adaptive
transversal filter, we proceed in two stages, First, we use an iterative procedure to solve the
Wiener Hopf equations (i.e., the Matrix equation defining the optimum Wiener solution);
the iterative procedure is based on the method of steepest descent, which is a well known
technique in optimization theory. This method required the use of a gradient vector, the
value of which depends on two parameters: the correlation Matrix of the tap inputs in the
transversal filter and the cross correlation vector between the desired response and the
same tap inputs. Next, we use instantaneous values for this correlation, so as to derive an
estimate for the gradient vector, making it assume a stochastic character in general.

The resulting algorithm is widely known as the least mean square (LMS) algorithm,
, the essence of which for the case of a transversal filter operating on real valued data may
be described as

Where the error signal is defined as the difference between some desired response
and the actual response of the transversal filter produced by the tap input vector.

20
CHAPTER 3

21
ADAPTIVE FILTERS

3.1 Introduction

Adaptive digital Filters have been used for several decades to model systems whose
properties are a priori unknown. Pole-zero modeling using an output error criterion
involves finding an optimum point on a (potentially) multimodal error surface, a problem
for which there is no entirely satisfactory solution. In this chapter we discuss previous work
on the application of genetic-type algorithms to this task and describe our own work
developing an evolutionary algorithm suited to the particular problem

Discrete-time (or digital) filters are ubiquitous in todays signal processing


applications. Filters are used to achieve desired spectral characteristics of a signal, to reject
unwanted signals, like noise or interferers, to reduce the bit rate in signal transmission, etc.

3.1.1 What is adaptive filter

The notion of making filters adaptive, i.e., to alter parameters (coefficients) of a


filter according to some algorithm, tackles the problems that we might not in advance
know, e.g., the characteristics of the signal, or of the unwanted signal, or of a systems
influence on the signal that we like to compensate. Adaptive filters can adjust to unknown
environment, and even track signal or system characteristics varying over time.

3.1.2 Signal Processing and Adaptive Filters

Digital Signal Processing (DSP) is used to transform and analyze data and
signals that are either inherently discrete or have been sampled from analogue sources.
With the availability of cheap but powerful general-purpose computers and custom-
designed DSP chips, digital signal processing has come to have a great impact on many
different disciplines from electronic and mechanical engineering to economics and

22
meteorology. In the field of biomedical engineering, for example, digital filters are used to
remove unwanted 'noise' from electrocardiograms (ECG) while in the area of consumer
electronics DSP techniques have revolutionized the recording and playback of audio
material with the introduction of compact disk and digital audio tape technology. The
design of a conventional digital signal processor, or filter, requires a priori knowledge of
the statistics of the data to be processed. When this information is inadequate or when the
statistical characteristics of the input data are known to change with time, adaptive fillters
are employed.

Adaptive fillters are employed in a great many areas of telecommunications


for such purposes as adaptive equalization, echo cancellation, speech and image encoding,
and noise and interference reduction. Adaptive filters have the property of self-
optimization. They consist, primarily, of a timevarying filter, characterized by a set of
adjustable coefficients and a recursive
algorithm which updates these coefficients as further information concerning the
statistics of the relevant signals is acquired.
A desired response d(n), related in some way to the input signal, is made available to
the adaptive filter. The characteristics of the adaptive _lter are then modified so that its
output ^y(n), resembles d(n) as closely as possible. The difference between the desired and
adaptive filter responses is termed the error and is defined as
e(n) = y(n) - d(n) (1)

Ideally, the adaptive process becomes one of driving the error, e(n) towards zero. In
practice, however, this may not always be possible and so an optimization criterion, such as
the mean square error or some other measure of fitness, is employed. Adaptive filters may
be divided into recursive and non-recursive categories depending on their inclusion of a
feedback path. The response of non-recursive, or finite impulse-response (FIR) filters is
dependent upon only a finite number of previous values of the input signal. Recursive, or
infinite impulse-response (IIR) filters, however, have a response which depends upon all
previous input values, the output being calculated using not only a finite number of
previous input values directly, but also one or more previous output values. Many real-
world transfer functions require much more verbose descriptions in FIR than in recursive
form. The potentially greater computational efficiency of recursive filters over their non-
23
recursive counterparts is, however, tempered by several shortcomings, the most important
of which are that the filter is potentially unstable and that there are no wholly satisfactory
adaptation algorithms.
There are two main types of adaptive IIR filtering algorithms, which di_er in the
formulation of the prediction error used to assess the appropriateness of the current
coefficient set during adaptation. In the equation-error approach the error is a linear
function of the coefficients. Consequently the mean square error is a quadratic function of
the coefficients and has a single global minimumand no local minima. This means that
simple gradient-based algorithms can be used for adaptation. However in the presence of
noise (which is present in all real problems) equation-errorbased algorithms converge to
biased estimates of the filter coefficients .
The second approach, the output-error formulation, adjusts the coefficients of the time-
varying digital filter directly in recursive form. The response of an output-error IIR filter is
characterized by the recursive difference equation:
Adaptive filters are employed in a great many areas of telecommunications for such
purposes as adaptive equalization, echo cancellation, speech and image encoding, and noise
and interference reduction. Adaptive fillters.

3.2Adaptive Transversal Filters

Adaptive digital filters have been used for several decades to model systems whose
properties are a priori unknown. Pole-zero modeling using an output error criterion
involves finding an optimum point on a (potentially) multimodal error surface, a problem
for which there is no entirely satisfactory solution. In this chapter we discuss previous work
on the application of genetic-type algorithms to this task and describe our own work
developing an evolutional algorithm suited to the particular problem.

In a transversal filter of length N, as depicted in fig. 1, at each time n the output sample y[n] is
computed by a weighted sum of the current and delayed input samples x[n], x[n − 1], . . .

24
Here, the ck[n] are time dependent filter coefficients (we use the complex conjugated
coefficients C+k[n] so that the derivation of the adaption algorithm is valid for complex signals,
too). This equation re-written in vector form, using x[n] = x[n], x [n −1], . . . , x [n − N + 1]] T the
tap-input vector at time n, and c[n] = [c0[n], c1[n], . . . , cN−1[n]], the coefficient vector at
time n, is y[n] = cH[n]x[n].

Both x[n] and c[n] are column vectors of length N, cH[n] = (c+)T [n] is the hermitian of vector c
+
[n] (each element is conjugated , and the column vector is transposed T into a row vector).

FIG 3.1 Transversal filter with time dependent coefficients

In the special case of the coefficients c[n] not depending on time n: c[n] = c the transversal
filter structure is an FIR1 filter of length N. Here, we will, however, focus on the case that the filter
coefficients are variable, and are adapted by an adaptation algorithm.

3.3 The LMS Adaptation Algorithm

The LMS (least mean squares) algorithm is an approximation of the steepest descent algorithm
which uses an instantaneous estimate of the gradient vector of a cost function. The estimate of the
gradient is based on sample values of the tap-input vector and an error signal. The algorithm
iterates over each coefficient in the filter, moving it in the direction of the approximated gradient .

25
For the LMS algorithm it is necessary to have a reference signal d[n] representing the
desired filter output. The difference between the reference signal and the actual output of the
transversal filter (eq. 2) is the error signal

e[n] = d[n] − cH[n]x[n]. (3)

A schematic of the learning setup is depicted in fig. 2.

FIG 3.2 Adaptive transversal filter learning

The task of the LMS algorithm is to find a set of filter coefficients c that minimize the expected
value of the quadratic error signal, i.e., to achieve the least mean squared error (thus the name). The
squared error and its expected value are (for simplicity of notation and perception we drop the
dependence of all variables on time n in eqs. 4 to 7)

26
Note, that the squared error e2 is a quadratic function of the coefficient vector c, and thus has
only one (global) minimum (and no other (local) minima), that theoretically could be found if the
correct expected values in eq. 5 were known.

The gradient descent approach demands that the position on the error surface according to the
current coefficients should be moved into the direction of the ‘steepest descent’, i.e., in the
direction of the negative gradient of the cost function J = E(e2) with respect to the coefficient
vector2

The expected values in this equation, E (d x) = p, the cross-correlation vector between the
desired output signal and the tap-input vector, and E(x xH) = R, the auto-correlation matrix of the
tap-input vector, would usually be estimated using a large number of samples from d and x. In the
LMS algorithm, however, a very short-term estimate is used by only taking into account the current

samples: , leading to an update equation for the filter

coefficients

27
Here, we introduced the ‘step-size’ parameter μ, which controls the distance we move along
the error surface. In the LMS algorithm the update of the coefficients, eq. 7, is performed at every
time instant n,
C [n + 1] = c[n] + μ e+ [n] x [n].

3.3.1 Choice of step-size:

The ‘step-size’ parameter μ introduced in eq. 7 and 8 controls how far we move along the error
function surface at each update step. μ certainly has to be chosen μ > 0 (otherwise we would move
the coefficient vector in a direction towards larger squared error). Also, μ should not be too large,
since in the LMS algorithm we use a local approximation of P and R in the computation of the
gradient of the cost function, and thus the cost function at each time instant may differ from an
accurate global cost function.

Furthermore, too large a step-size causes the LMS algorithm to be instable, i.e., the coefficients
do not converge to fixed values but oscillate. Closer analysis [1] reveals, that the upper bound for μ

for stable behavior of the LMS algorithm depends on the largest eigenvalue of the tap-input
auto-correlation matrix R and thus on the input signal. For stable adaptation behavior the step-size
has to be

Summary of the LMS algorithm

1. Filter operation: y[n] = cH[n]x[n]

28
2. Error calculation: e[n] = d[n] − y[n]

where d[n] is the desired output

3. Coefficient adaptation: c [n + 1] = c[n] + μ e+ [n] x[n]

3.4 Applications of Adaptive Filters

Two possible application scenarios of adaptive filters are given in fig 1.4., system
identification and inverse filtering. For system identification the adaptive filter is used to
approximate an unknown system. Both the unknown system and the adaptive filter are
driven by the same input signal and the adaptive filter coefficients are adjusted in a way,
that the output signal resembles the output of the unknown system, i.e., the adaptive filter is
used to approximate the unknown system.

For inverse modeling or equalization the adaptive filter is used in series with the
unknown system and the learning algorithm tries to compensate the influence of the
unknown system on the test signal u[n] by minimizing the (squared) difference between the
adaptive filters output and the delayed test signal.

FIG 3.3: Two applications of adaptive filters: System identification (left) and inverse
modeling/ equalization (right)

Applications of adaptive filters further include the adaptive prediction of a signal, used for
example in ADPCM4 audio coding, adaptive noise or echo cancellation, and adaptive beam-
29
forming (shaping of the acoustic/radio ‘beam’ transmitted/received by an array of
loudspeakers/microphones/antennas).

30
CHAPTER 4

31
SOFTWARE DESCRIPTION

4.1 MATLAB Introduction:

MATLAB is a high performance language for technical computing .It integrates


computation visualization and programming in an easy to use environment. Mat lab stands
for matrix laboratory. It was written originally to provide easy access to matrix software
developed by LINPACK (linear system package) and EISPACK (Eigen system package)
projects. MATLAB is therefore built on a foundation of sophisticated matrix software in
which the basic element is matrix that does not require pre dimensioning

4.1.1Typical uses of MATLAB

1. Math and computation


2. Algorithm development
3. Data acquisition
4. Data analysis, exploration ands visualization
5. Scientific and engineering graphics

4.1.2 The main features of MATLAB

1. Advance algorithm for high performance numerical computation, especially


in the Field matrix algebra
2. A large collection of predefined mathematical functions and the ability to
define one’s own functions.
3. Two-and three dimensional graphics for plotting and displaying data
4. A complete online help system
5. Powerful, matrix or vector oriented high level programming language for
individual applications.

32
6. Toolboxes available for solving advanced problems in several application
areas

Fig.4.1 Features and capabilities of MATLAB

4.2 The MATLAB System:

The MATLAB system consists of five main parts:

4.2.1 Development Environment.

This is the set of tools and facilities that help you use MATLAB functions
and files. Many of these tools are graphical user interfaces. It includes the MATLAB
desktop and Command Window, a command history, an editor and debugger, and browsers
for viewing help, the workspace, files, and the search path.

33
4.2.2 The MATLAB Mathematical Function Library:

This is a vast collection of computational algorithms ranging from elementary


functions, like sum, sine, cosine, and complex arithmetic, to more sophisticated functions
like matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.

4.2.3 The MATLAB Language:

This is a high-level matrix/array language with control flow statements,


functions, data structures, input/output, and object-oriented programming
features. It allows both "programming in the small" to rapidly create quickly
and dirty throwaway programs, and "programming in the large" to create large
and complex application programs.

4.2.4 Graphics:

MATLAB has extensive facilities for displaying vectors and matrices as


graphs, as well as annotating and printing these graphs. It includes high-level
functions for two-dimensional and three-dimensional data visualization, video
processing, animation, and presentation graphics. It also includes low-level
functions that allow you to fully customize the appearance of graphics as well
as to build complete graphical user interfaces on your MATLAB applications.

5.2.5 The MATLAB Application Program Interface (API)

This is a lib. that allows you to write C and Fortran prog’s that interact
with MATLAB. It includes facilities for calling routines from MATLAB (dynamic
linking), calling MATLAB as a computational engine, and for reading and writing
MAT-files.

4.3 Starting MATLAB:

On Windows platforms, start MATLAB by double-clicking the MATLAB shortcut icon


on your Windows desktop. On UNIX platforms, start MATLAB by typing mat lab at the
operating system prompt. You can customize MATLAB startup. For example, you can

34
change the directory in which MATLAB starts or automatically execute MATLAB
statements in a script file named startups

4.4 MATLAB Desktop:

When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user
interfaces) for managing files, variables, and applications associated with MATLAB. The
following illustration shows the default desktop. You can customize the arrangement of
tools and documents to suit your needs. For more information about the desktop tools.

35
4.5 MATLAB Working Environment:

4.5.1 MATLAB Desktop:

Matlab Desktop is the main Mat lab application window. The desktop contains five
sub windows: the command window, the workspace browser, the current directory
window, the command history window, and one or more figure windows, which are shown
only when the user displays a graphic.

The command window is where the user types MATLAB commands and expressions
at the prompt (>>) and where the output of those commands is displayed. MATLAB
defines the workspace as the set of variables that the user creates in a work session. The
workspace browser shows these variables and some information about them. Double
clicking on a variable in the workspace browser launches the Array Editor, which can be
used to obtain information and income instances edit certain properties of the variable.

The current Directory tab above the workspace tab shows the contents of the current
directory, whose path is shown in the current directory window. For example, in the
windows operating system the path might be as follows: C:\MATLAB\Work, indicating
that directory “work” is a subdirectory of the main directory “MATLAB”, WHICH IS
INSTALLED IN DRIVE C. clicking on the arrow in the current directory window shows a
list of recently used paths. Clicking on the button to the right of the window allows the user
to change the current directory.

MATLAB uses a search path to find M-files and other MATLAB related files, which
are organize in directories in the computer file system. Any file run in MATLAB must
reside in the current directory or in a directory that is on search path. By default, the files
supplied with MATLAB and math works toolboxes are included in the search path. The
easiest way to see which directories is on the search path. The easiest way to see which
directories are soon the search paths, or to add or modify a search path, is to select set path
from the File menu the desktop, and then use the set path dialog box. It is good practice to
add any commonly used directories to the search path to avoid repeatedly having the
change the current directory.
36
The Command History Window contains a record of the commands a user has entered
in the command window, including both current and previous MATLAB sessions.
Previously entered MATLAB commands can be selected and re-executed from the
command History window by right clicking on a command or sequence of commands. This
action launches a menu from which to select various options in addition to executing the
commands. This is a use to select various options in addition to executing the commands.
This is useful feature when experimenting with various commands in a work session.

4.5.2 Using the MATLAB Editor to create M-Files:

The MATLAB editor is both a text editor specialized for creating M-files and a
graphical MATLAB debugger. The editor can appear in a window by itself, or it can be a
sub window in the desktop. M-files are denoted by the extension .m, as in pixelup.m. The
MATLAB editor window has numerous pull-down menus for tasks such as saving,
viewing, and debugging files. Because it performs some simple checks and also uses color
to differentiate between various elements of code, this text editor is recommended as the
tool of choice for writing and editing M-functions. To open the editor , type edit at the
prompt opens the M-file filename’s in an editor window, ready for editing. As noted
earlier, the file must be in the current directory, or in a directory in the search path.

4.5.3 Getting Help:

The principal way to get help online is to use the MATLAB help browser, opened as a
separate window either by clicking on the question mark symbol (?) on the desktop toolbar,
or by typing help browser at the prompt in the command window. The help Browser is a
web browser integrated into the MATLAB desktop that displays a Hypertext Markup
Language (HTML) documents. The Help Browser consists of two panes, the help navigator
pane, used to find information, and the display pane, used to view the information. Self-
explanatory tabs other navigator pane are used to perform a search.

For example, help on a specific function is obtained by selecting the search


tab, selecting Function Name as the Search Type, and then typing in the function name in
the Search for field. It is good practice to open the Help Browser at the beginning of a
MATLAB session to have help readily available during code development or other
MATLAB task.

37
Another way to obtain for a specific function is by typing doc followed by the function
name at the command prompt. For example, typing doc format displays documentation for
the function called format in the display pane of the Help Browser. This command opens
the browser if it is not already open.

M-functions have two types of information that can be displayed by the


user. The first is called the H1 line, which contains the function name and alone line
description. The second is a block of explanation called the Help text block. Typing help at
the prompt followed by a function name displays both the H1 line and the Help text for
that function in there command window.

Occasionally, this information can be more up to date than the documentation of the M-
function in question.

Typically look for followed by a keyword displays all the H1 lines that
contain that keyword. This function is useful when looking for a particular topic without
knowing the names of applicable functions. For example, typing look for edge at the
prompt displays the H1 lines containing that keyword. Because the H1 line contains the
function name, it then becomes possible to look at specific functions using the other help
methods. Typing look for edge-all at the prompt displays the H1 line of all functions that
contain the word edge in either the H1 line or the Help text block. Words that contain the
characters edge also are detected. For example, the H1 line of a function containing the
word polyedge in the H1 line or Help text would also be displayed.

4.5.4 Saving and Retrieving a Work Session:

There are several ways to save and load an entire work session or selected workspace
variables in MATLAB. The simplest is as follows.

To save the entire workspace, simply right-click on any blank space in the workspace
Browser window and select Save Workspace As from the menu that appears. This opens a
directory window that allows naming the file and selecting any folder in the system in
which to save it. Then simply click Save. To save a selected variable from the workspace,
select the variable with a left click and then right-click on the highlighted area. Then select

38
Save Selection As from the menu that appears. This again opens a window from which a
folder can be selected to save the variable.

To select multiple variables, use shift click or control click in the familiar manner, and
then use the procedure just described for a single variable. All files are saved in the double-
precision, binary format with the extension. mat. These saved files commonly are referred
to as MAT-files. For example, a session named, says mywork_2003-02-10, and would
appear as the MAT-file mywork_2003_02_10.mat when saved. Similarly, a saved video
called final video will appear when saved as final_video.mat

To load saved workspaces and/or variables, left-click on the folder icon on the toolbar of
the workspace Browser window. This causes a window to open from which a folder
containing MAT-file or selecting open causes the contents of the file to be restored in the
workspace Browser window.It is possible to achieve the same results described in the
preceding paragraphs by typing save and load at the prompt, with the appropriate file
names and path information. This approach is not as convenient, but it is used when
formats other than those available in the menu method are required.

39
CHAPTER 5

40
SIMULATION AND RESULT
The previous chapters provide a detailed sketch of an Acoustic Echo canceller, (AEC).
In this chapter the flowchart for the software simulation and the results of simulation of
the AEC algorithm, which was performed in MATLAB are discussed. The idea that
drove the simulation was to show that convincible results could be achieved in the
software environment.
4.1 Why MATLAB?
MATLAB is a powerful, general-purpose, mathematical software package. MATLAB
possesses excellent graphics and matrix handling capabilities. It integrates mathematical
computing in a powerful language to provide a flexible environment for technical
computing. The salient features of MATLAB are its in-built mathematical toolboxes
and graphic functions. Additionally, external routines that are written in other languages
such as C, C++, Fortran and Java, can be integrated with MATLAB applications.
MATLAB also supports importing data from files and other external devices. Most of
the functions in MATLAB are matrix-oriented and can act on arrays of any appropriate
dimension. MATLAB also has a separate toolbox for signal processing applications,
which provided simpler solutions for many of the problems encountered in this research.
The MATLAB software environment suited the needs of this research for the
following reasons:
• The input signals (far-end and near-end talker signals) were voices. These voices
were stored as wav files and the wav files were easily imported into the code.
• The intermediate signals (echo signals) and output signals (error signal and signals
obtained after echo cancellation) were obtained as wav files. Thus the audio of the
voice signals could be literally be heard, which aided immensely judgments with
respect to the results obtained.
• The signal processing toolbox has in-built functions for almost all signal
processing applications. The toolbox helped the efficiency of the code since these
functions could be called wherever necessary instead of writing separate sub-
routines.
• Since MATLAB supports graphics, the results of a simulation could be presented
41
in a graphical format with ease.
4.2 Simulation Flowchart

The flowchart for the simulation of the echo canceller algorithm is presented in

42
Start

Get far-end
signal, x(n)

Create echo signal,


r(n) from the far-
end signal

Get near-end
signal, v(n)

Combine r(n) and


v(n) to obtain the
desired signal, d(n)

Yes
Does Doubletalk Filter Loop Filter
Exist coefficients are frozen

No

NLMS Loop
Update Filter
coefficients

S ubtract estimated Do Nonlinear


echo from d(n) to Processing to
produce residual remove residual
error signal, e(n) echo

Figure5.1: Flowchart of the MATLAB Simulation

43
4.3 Description of the Simulation Setup

This section describes the simulation environment, its requirements and the

procedures adopted.

1. The input signals, both far-end and near-end signals, were simulated and given

to the AEC, which executed on a PC with the MATLAB environment.

2. The input signals seven seconds in duration.

3. A sampling rate of 8000 Hz was used for all the signals in the simulation.

4. The graphs plotted have x-axes denoting the time and y-axes denoting the

amplitude or magnitude of the signal.

44
ALGORITHM
Acoustic echo cancellation algorithm tolerable for double talk
% Ieee-2008 by kensaku fujji
warning off;
close all;
clear all;
clc;
load nearspeech %%% load near speech male voice signal
from filter desgning toolbox
n = 1:length(v);
t = n/fs;
figure,plot(t,v);
axis([0 10 -1 1]);
xlabel('Time [sec]');
ylabel('Amplitude');
title('Original Speech Signal');
%******************************************************
M = 512; %%%% no of iterations
H=(log(0.99*rand(1,M)+0.01).*sign(randn(1,M)).*exp(-0.002*(1:M)));
H=H/norm(H)*4; % response exponentialy decreasing random signal
%********************************************************************
load farspeech %%% load far speech male voice signal
from filter desgning toolbox
x = x(1:length(x));
dhat = filter(H,1,x); %%% take the filter response of the
input signal eqn (3)
figure,plot(t,dhat);
axis([0 10 -1 1]);
xlabel('Time [sec]');
ylabel('Amplitude');
title(' Echoed Speech Signal');
%****************************************************************
d = dhat + v+0.0001*randn(length(v),1); %%%% creating an echo
signalwhich is to be fed for DTD
figure,plot(t,d); %%%% plottng the signal
axis([0 10 -1 1]);
xlabel('Time [sec]');
ylabel('Amplitude');
title('Desired Signal');
% %************************************************************
mu = 0.016;
W0 =zeros(1,512);
x = x(1:length(W0)*floor(length(x)/length(W0)));
d = d(1:length(W0)*floor(length(d)/length(W0)));
% Construct the Adaptive Filter
hFDAF = adaptfilt.lms(512,mu); %%% this is the command taen
from adaptive filtering toolbox in filter desgning toolkit
[y,e] = filter(hFDAF,x,d); %%% y(n) is response and e(n)
is error eqn(1)
figure,plot(e,'r');
xlabel('Time [sec]');
ylabel('Amplitude');
title('Residue Error of Acoustic Echo Canceller');

45
figure,plot(y,'k');
xlabel('Time [sec]');
ylabel('Amplitude');
title('Output of Acoustic Echo Canceller');
%*********************************************************************
Hd2 = dfilt.dffir(ones(1,512));
erle = filter(Hd2,(e-v(1:length(e))).^2)./(filter(Hd2,dhat(1:length(e)).^2));
erledB1 = -20*log10(erle);
figure,semilogy(flipud(erledB1(1:8000)));hold on;
xlabel('Iterations');
ylabel('Estimated Error [dB]');
title('convergence parameter');
%************************************************************************
%%% sub ADF %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
et=[];
d1 = dhat+v+0.0001*randn(length(v),1); %%%% creating an echo
signalwhich is to be fed for DTD
W1 = zeros(1,128);
x1 = x(1:length(W0)*floor(length(x)/length(W0)));
d1 = d(1:length(W0)*floor(length(d)/length(W0)));
shFDAF = adaptfilt.lms(128,mu); %%% this is the command
taen from adaptive filtering toolbox in filter desgning toolkit
[y1,e1] = filter(shFDAF,x1,d1); %%% y(n) is response and
e(n) is error eqn(1)
for j=1:10000
et(j)=d1(j)-(y(j)+y1(j));
Qn(j)=et(j)^2/10000;
end
for j=1:10000
Q(j)=e(j)^2/10000;
end
hu=fliplr(sort(et.^2));
hu(1501:end)=-0.0001;

erle = filter(Hd2,(et'-v(1:length(et))).^2)./
(filter(Hd2,dhat(1:length(et)).^2));
erledB = -20*log10(erle);
semilogy(flipud(erledB(1:8000)),'k');
legend('Conventional','proposed');
figure,
semilogy(fliplr(sort(Qn)));title('convergence Property Proposed
Method');xlim([1 11000]);

xlabel('Iterations');ylabel('Estimated error (dB)');


for j=1:10000
P(j)=(y1(j))^2;
pp(j)=(2*0.0001*P(j))/Qn(j);
end
figure,semilogy(flipud(sort(e(1:20000))));hold on;
semilogy(hu,'r');title('comparision of convergence Properties');
legend('convenional','proposed');

46
4.4 Results

This section presents a graphical representation of the results obtained by simulating the
algorithm in MATLAB

Figure 5.2 Original speech signal

Figure above shows the original speech signal being passed through the double talk detector .

47
Figure 5.3 Desired signal

The above figure shows the desired signal , the speech signal along with the echo signal.

48
Figure 5.4 Echoed speech signal

Figure above shows the echo signal produced during the double talk

49
Figure 5.5 Output of acoustic echo cancellor

The output of the acoustic echo cancellor

50
Figure 5.6 Residue error of acoustic echo cancellor

51
Figure 5.7 Convergence parameter

52
Figure 5.8 Convergence property proposed method

53
Figure 5.9 Comparision of convergence properties

54
APPLICATIONS
• Today people are more interested in hands-free communication. In such a situation, the
use a regular loudspeaker and a high-gain microphone, in place of a telephone receiver,
might seem more appropriate. This would allow more than one person to participate in a
conversation at the same time such as a teleconference environment.
• Another advantage is that it would allow the person to have both handsfree and to move
freely in the room. However, the presence of a large acoustic coupling between the
loudspeaker and microphone would produce a loud echo that would makeconversation
difficult. Furthermore, the acoustic system could become instable, which would produce
a loud howling noise to occur.
• The solution to these problems is the elimination of the echo with an echo suppression or
echo cancellation algorithm. The echo suppressor offers a simple but effective method to
counter the echo problem. However, the echo suppressor possesses a main disadvantage
since it supports only half-duplex communication. Half-duplex communication permits
only one speaker to talk at a time. This drawback led to the invention of echo cancellers.
An important aspect of echo cancellers is that full-duplex communication can be
maintained, which allows both speakers to talk at the same time.
• This objective of this research was to produce an improved echo cancellation algorithm,
which is capable of providing convincing results. The three basic components of an echo
canceller are an adaptive filter, a doubletalk detector and a nonlinear processor. The
adaptive filter creates a replica of the echo and subtracts it from the combination of the
actual echo and the near-end signal. The doubletalk detector senses the doubletalk.
Doubletalk occurs when both ends are talking, which stops the adaptive filter in order to
avoid divergence. Finally, the nonlinear processor removes the residual echo from the
error signal.
• . Since there has been a revolution in the field of personal computers, in recent years, this
research attempted to implement the acoustic echo canceller algorithm on a natively
running PC with the help of the MATLAB software.

55
ADVANTAGES

ACOUSTIC echo cancellation (AEC) provides one of the best solutions to the control of
acoustic echoes generated by hands-free audio terminals .

• In this type of application, an adaptive filter identifies the acoustic echo path between
the terminal’s loudspeaker and microphone, i.e., the room impulse response. The
filter output, which provides an electronic replica of the acoustic echo, is subtracted
from the microphone signal to cancel the echo. Nevertheless, there are several
specific and challenging problems associated with AEC applications.
• First, the echo path is extremely long (on the order of hundreds of milliseconds) and it
may rapidly change at any time during the connection. The excessive length of the
acoustic echo path in time is mainly due to the slow speed of sound through air;
moreover, multiple reflections of walls and objects in the room increase this length.
• In addition, the impulse response of the room is not static overtime, since it varies
with the ambient temperature, pressure, and humidity; also, movement of objects and
human bodies can rapidly modify the acoustic impulse response.
• As a consequence of these aspects related to the acoustic echo path characteristics, the
adaptive filter works most likely in an under-modeling situation, i.e., its length is
smaller than the length of the acoustic impulse response.
• Hence, the residual echo caused by the part of the system that can not be modeled
acts like an additional noise and disturbs the overall performance. Second, the
background noise that corrupts the microphone signal can be strong and highly
nonstationary.

56
DISADVANTAGES

Besides these specific problems associated with the acoustic environment, there are some
classical issues that have to be addressed in the general framework of echo cancellation.

• The first one concerns the nonstationary character of the speech signal, since it is well
known that the performance of an adaptive filter depends on the properties of the
input signal. In addition, a speech signal is highly correlated. Therefore, this type of
signal represents a challenge for any adaptive filter.
• Another major aspect that has to be considered in echo cancellation concerns the
behavior during double-talk, i.e., the talkers on both sides speak simultaneously. In
this case, besides the echo plus background noise, the microphone of the hands-free
terminal captures a speech signal that acts like a large level of uncorrelated
disturbance to the adaptive filter, and it may cause its divergence. For this reason, the
echo canceller is usually equipped with a double-talk detector (DTD), in order to
control the behavior of the adaptive filter during these periods.
• Each of the previously addressed problems implies some special requirements for the
adaptive algorithms used for AEC. Summarizing, the “ideal” algorithm should have a
high convergence rate and good tracking capabilities (in order to deal with the high
length and time-varying nature of the acoustic impulse response) but achieving low
misadjustment. These issues should be obtained despite the nonstationary character of
the input signal (i.e., speech). Also, the algorithm should be robust against the
microphone signal variations (i.e., background noise variations and double-talk) and
in an under-modeling case. Finally, its computational complexity should be moderate,
providing both efficient and low-cost real-time implementation.
• Even if the adaptive filters literature contains a lot of very interesting and useful
algorithms , there is no adaptive algorithm that satisfies all the previous requirements.

57
FUTURE SCOPE

FUTURE SCOPE

The test of the algorithm was performed totally ‘off-line’. The testing speech was
recorded beforehand as input to the algorithm and the output was looked over after
simulation. Therefore, the real-time application to for testing purpose could be the most
interesting future work.
The high background noise level is annoying to the listener’s side during a conversation
and will affect the performance of the algorithm. However, the background noise is a natural part
of a conversation, which may provide the surrounding environment of the person we talk to.
Hence, there is a need of the noise suppression algorithm to reduce the background noise to a
comfortable level. Moreover, a study of the way to handle the music noise which is trickier to
solve can be also done in the future.
In practice, the echo could be still noticeable due to large variations of echo path
characteristic. Therefore, a further research and evaluation of the reaction of the algorithm to the
echo path changes should be made effort to.
The algorithm proposed in this thesis presents a solution for single channel acoustic
echoes. However, most often in real life situations, multichannel sound is the norm for
telecommunication. For example, when there is a group of people in a teleconference
environment and everybody is busy talking, laughing or just communicating with each other
Multichannel sound abounds. Since there is just a single microphone the other end will hear
just a highly incoherent monographic sound. In order to handle such situations in a better way
the echo cancellation algorithm developed during this research should be extended for the
multichannel case.

58
CONCLUSION

With the world shrinking into a global village because of superior


ommunications, telephones, both conventional and hands-free sets, occupy a prominent
position in solving people’s communication needs. One of the major problems in a
telecommunication application over a telephone system is echo. The Echo cancellation
algorithm presented in this thesis successfully attempted to find a software solution for
the problem of echoes in the telecommunications environment. The proposed algorithm
was completely a software approach without utilizing any DSP hardware components.
The algorithm was capable of running in any PC with MATLAB software installed.
Additionally, a new method, which utilized the noise gate device for nonlinear processing
was proposed. This new technique is faster and provides almost perfect results for
canceling residual echoes without clipping of the reference speech signals. In addition,
the results obtained were convincing. The audio of the output speech signals were highly
satisfactory and validated the goals of this research.

59
REFERENCE BOOKS:

1. Acoustic Signal Processing for T


MA: Kluwer, 2000.

2. J. Benesty, T. Gaensler, D. R. M
and Acoustic Echo Cancellation

3. Adaptive Signal Processing—A


60

S-ar putea să vă placă și