Sunteți pe pagina 1din 4

2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering (CMCE)

The Application of Hilbert-Huang Transform in Speech Enhancement

Liwei Liu
College of Computer Science and Engineering
Changchun University of Technology
Changchun, China
E-mail:liuliwei@mail.ccut.edu.cn

-Hilbert-Huang

A bstract

Gehen g Chen, Fen g Qian


College of Computer Science and Engineering
Changchun University of Technology
Changchun, China
E-mail: chengeheng@mail.ccut.edu.cn

This Paper is organized as follows. A primary


presentation of HHT is given in Section II. In Section III,
the computer simulation example is analyzed to verifY the
validity of HHT in processing the nonlinear and non
stationary signal. In Section IV, an approach of speech
enhancement based on HHT is presented and then is used
to analyze some speech signals. Corresponding
experimental results are shown. Conclusions are given in
Section V.

Transform (HHT) is a new and

powerful theory for nonlinear and non-stationary signal

analysis and it is efficient for describing the local features of


dynamic signals. The paper introduces briefly the HHT
method,

validates

its

validity

through

the

analysis

of

example, presents a speech enhancement algorithm based on


the HHT, and contrasts with the speech denoising method of
wavelet. Simulation experiments show that the results of
based on the HHT to actualize speech denoising are the
enhancement

of

SNR,

definition

and

understanding

of

speech signal. The method of HHT adapts to the processing

II.

of speech signal.

Keywords-Hilbert-Huang transform; EMD;


filter; speech enhancement

I.

Compared with various data analysis methods, the


innovation of HHT is the introduction of IMF,which
guarantees the physically meaningful Instantaneous
Frequency. The HHT consists of two processes[I ].

time-space

Empirical Mode Decomposition


The procedure of EMD decomposition is to shift the
original data series until the signals are adaptively
decomposed into a number of IMFs. Every IMF must
satisfY two properties: (I ) the number of extrema and the
number of zero crossings are either equal or differ by one;
(2) the mean value of the envelope defined by the local
minima is constant zero. A special sifting process is
employed to extract all of IMFs. This sifting process is
described as follows.
Firstly, the upper envelopes and lower envelopes of
signals x(t) , as well as their mean value ml (t) ,are
calculated respectively. The first step of the sifting process
is to calculate the difference:
(I )
hl ( t ) x(t)-ml(t)

A.

INTRODUCTION

Nonlinear and non-stationary data processing is a


necessary part in pure research and practical applications.
In 1998, N. E. Huang [1] presented a new and powerful
method for the analysis of the nonlinear and non-stationary
time series data. The method is composed of two parts,
Empirical Mode Decomposition (EMD) and Hilbert
Transform(HT). The EMD is an adaptive decomposition
with which any complicated signal can be decomposed
into its Intrinsic Mode Functions(IMFs). With the HT, the
IMFs yield instantaneous frequencies as a function of time.
The final result is a three dimensional energy-frequency
time spectrum designated as Hilbert spectrum.
Practical applications of the HHT are broadly spread in
numerous scientific disciplines and investigations, e.g. on
gravity wave characteristics in the middle atmosphere to
derive useful physical insights into dispersive-dissipative
wave phenomenon [2] and on the ages of large amplitude
coastal seiches on the Caribbean coast[3]. Further, the
HHT has been used in other fields of geophysics, e.g. to
examine earthquake processes as well as for the
determination of the dispersion curves of seismic surface
waves and to study the effects of seismic motions on the
condition of buildings and structures in civil
engineering[4]. Moreover, the HHT is used in tsunami
research to detect earthquake generated water waves from
data series recorded from bottom pressure transducers in
the Northern Pacific and to examine the responses of New
Zealand coastal waters to the Peru tsunami[5].Additionally,
the EMD is also used in automatic human gait analysis that
is becoming increasingly important in the context of
human gesture recognition to serve as an individual
biometric characteristic[ 6].
978-1-4244-7956-6110/$26.00 2010 IEEE

HILBERT-HUANG TRANSFORM

207

However, hI ( t ) rarely satisfies the two IMF properties


and is taken as the first IMF of the signals straightway.
Therefore, the sifting usually has to be implemented for
more times, where the "difference" obtained in the
previous sifting is taken as "signals" in present sifting. If
after (k + I ) th sifting, corresponding difference k (t)
satisfies the IMF properties,
hlk ( t ) (kl) ( t ) - mlk (t )
=

(2)

then it can be taken as the first IMF component, denoted


by cI ( t ) , that is:

cI ( t ) hlk ( t )
=

(3)

In practice, to determine whether or not hlk ( t ) well


satisfies the IMF properties, we usually use socalled
standard deviation(SD) criterion, that is, to check if the
following inequality holds[I ]:

CMCE 2010

SD(k)=

f[lhl(k_Ihl)(k)--hlI) k I ]
(t) Z

(t)

tO

0.2

0.3

(4)

Where T is the length of data. Next, taking rest data


(5)
rl(t) x(t)- cI (t)
as "new" signals and implementing the sifting process on it,
we can obtain the second IMF Cz (t). This procedure
should be repeatedly used for n times until the last residue
rn(t) becomes a monotonic function. When the

So getting f(t) is:


(14)
f(t)= 30 + 7 . 5cos(30m)
The frequency fluctuates between [22.5,37.5]. The
Amplitude variation scope is between [0.8,1.2], and its
variation frequency is 7.5Hz.

decomposition procedure finished, the signals then can be


expressed as:
n
(6)
x(t)= >i(t) + rn(t)
i=1
where c l(t),C (t),.,c n(t) , are all of the IMFs included in
Z
the signals, and rn (t) is a negligible residue.

50

fci(r)dr

Zi(t)= ci(t) + jYi (t)= ai(t)e)e,(I)


where

ai(t)
B (t)
I

ciZ(t) + YiZ(t)
arctan

Yi(t)
ci(t)

150

200

250

Dl

350

400

450

500

Figure 1 shows the IMF components derived from the


x(t) by EMD. The Signal is sampled eight periods, and
64 points of each period. The IMFI corresponds to the
150Hz sinusoidal part. The IMF2 corresponds to the FM
AM part, its waveform has change in amplitude and
spacing. The res is the residue.

(7)
t-r
where P indicates the Cauchy principle value. With this
definition, a complex series Zi(t) is formed:
n:

100

Figure 1. The result of EMD

the transform domain:

. r:JWW\J\I\/J

,. : :

Hilbert Transform
As mentioned above, the main purpose of the EMD is to
conduct the HT and obtain the Hilbert spectrum which is
similar to wavelet spectrum. After conducting HT to every
IMF component, Ci(t) , we have a new data series Yi(t)in
B.

Yi(t)=

Empirical Mode Decomposition

Hilbert-Huang spectrum

(8)
(9)
(10)

and the IF is:

met)=

dB/t)

(11)
dt
Compared with the traditional FFT, ai(t) and mi (t)
derived by HHT are functions of time t, not constant,
which are different from FFT, so the HT can present the
varying of the power with time.
I

III.

time

Figure 2. The energy-frequency-time spectrum


Figure 2 shows the energy-frequency-time spectrum
based on the obtained IMFs. The horizontal coordinate is
sampling time, the vertical coordinate is frequency, the
color of bar chart shows the size of amplitude. There are
two frequencies in Figure 2. One is 150Hz that is
unchanged with time. Another is fluctuating between [22.5,
37.5] with time that fundamental frequency is 30Hz, the
change of color is between [0.8,1.2] and has two times in
eight periods that shows the frequency of amplitude
change is 7.5Hz. From the above we can see that the
energy-frequency-time spectrum can extract the various
characteristics and parameters of the signals frequency and
amplitude with time.
This example can verify that the HHT is a new and
powerful method for the analysis of the nonlinear and non
stationary time series data.

ANALYZING OF THE COMPUTER SIMULATION


EXAMPLE

In order to verify the effectiveness of HHT in dealing


with nonlinear and non-stationary signal, this paper
analyzes the analytic expression of frequency modulation
signal, it is:
x(t) (1 + 0.2(2n: 7.5t x cos(2n:30t + 0.5 sin(2n:15t
+sin(2n:150t)
(12)
The signal is overlying of two parts that one is a FM-AM
signal of 30Hz fundamental frequency, 15Hz modulation
frequency, another is a 150Hz sinusoidal signal. We can
get angular frequency met) through analyzing the
frequency of FM about the part of FM-AM:
(13)
m(t) 60n:+15n:cos(30m)
=

208

IV.

SPEECH ENHANCEMENT METHOD BASED ON

The original pure speech signal is ploted in Figure


3(a), the noisy speech signal at 3dB SNR is ploted in
Figure 3(b). The noisy speech signal is processed with the
denoising method based on the HHT and wavelet soft
threshold and hard threshold methods, and then the results
will be compared.

HHT

Speech signal is a kind of typical non-stationary signal,


but for the speech signal analysis and said has been based
on the hypothesis of short-term stationary, and using the
analysis method of stationary. Although these analysis
method in practical application has achieved great success,
but they are stiJl exist significant differences compared
with the people's perception. With the continuous
improvement for the requirement of the speech signal
processing, using suitable nonlinear and non-stationary
signal processing method to analyzing the speech signal is
attended by more and more people. The HHT is an
effective new analysis method that meets the requirements.
Based on the characteristics of the time-space filtering of
the EMD algorithm, it is applied to speech enhancement,
and through the simulation experiments prove the
effectiveness of this method.
The core of HHT is the EMD algorithm. The EMD can
sifting the signal, and to get many IMFs that is changed
from small scale to big scale. In time, each IMF shows a
modal of certain scale. In frequency, the performance of
the filtering process is showed from high frequency to low
frequency. For example, If the signal is decomposed into
the n IMF components, then the low-pass filter can be
expressed as:
n

XI

(t) L (t) + (t)


=

i=1

i>

i=l

the band-pass filter can be expressed as:


I

Xb

(t) I cJt)
i=h

(17)

Based on the above principles, the paper puts forward


a speech enhancement algorithm for the broadband
additive noise based on the HHT. In the experiment, the
speech signal set) is recorded. It is sampled at a
frequency of 8kHz and converted into digital data with a
precision of 16 bits, the content is "open" for Chinese girls.
This pure voice signal is superimposed gaussian white
noise. The value of variance 0'2 is changed to constitute
seven groups signals that the SNR is 10dB, 6dB,
3dB, OdB respectively.

;"-
-------===---------- -------== ----",j
--

_:t

500

'"
.

1000

1500

2000

(a)

2500

3000

3500

4000

1000

1500

2000

(b)

2500

3000

3500

20CIJ

: :::j

2500

3IXlJ

3500

4000

::=

sgn(IMF; n )(IIMF; n 1- 8), IIMF; nI > 8


'
,
,

O,IIMF;,nl < 8

(18)

- 210gN /
/I n(i + I)

(19)

here (5"2 is the estimated noise variance. Obviously along


with scale i increase, the threshold along with reduces.
After the processing of threshold, the high frequency
IMFs and the low frequency IMFs are superimposed, and
the de-noising speech signal is reconstructed. By contrast
with the above method, the speech enhancement is
realized by the wavelet transform method (The wavelet

-1__________
500

1500

Where sgn is symbolic function, 8 is defined as threshold


and its calculation formula is:

"0

1 000

Firstly, decompose the noisy speech signal to IMFs by


the EMD method. The IMF components and residue
derived from the noisy speech signal(Figure 3(b)) shown
in Figure 4. It shows the IMFlIMF5 contains a high
frequency component of the signal, and the noise is
contained in them approximately. But if using low-pass
filter directly, the useful speech signal will be lost because
the distribution of broadband noise spectrum and voice
spectrum is overlapped. So the high frequency
components of the IMFs is processed using wavelet de
noising methods of soft threshold, that is to say, for each
IMF component adopt a floating threshold to identifY the
data that carries less energy. In other words the data that
less than or equal to the threshold wiJl as the actual value
of zero, and keep only threshold above. The specific
processing according to the foJlowing formula.

IMF'I,n
500

______----,,

Figure 4. The IMF components and residue derived from


the noisy speech signal

(16)

g "O "2 c...______________----"--,---,,-__


--

the high-pass filter can be expressed as:

h
xh(t) Lci(t)

" g i@f'ij.j'I'I.'.+t4I.r I!lfoM "'

9H
"1@8

____
-
-. F0N0l\
-F?:\7=
-/=
"

(15)

rn

4000

Figure 3. (a) Original pure speech signal


(b) Noisy speech signal at 3dB SNR

209

base uses db5) that using soft threshold and hard threshold
methods respectively.
TABLE I.
SNRof
Noisy
Speech
Sinal

algorithm achieved the better performance under tested


environmental conditions. It is effective to remove more
noise and is capable to improve the SNR of the speech.
After enhanced, the articulation and intelligibility of the
speech is still good.
HHT is a new theory which has important theoretical
value and widely applying perspective. Nevertheless, it is
still not perfect and has some problems to be solved such
as the curve fitting problem, end disposal problem, mode
mixing problem, and so on. The effect of HHT-based
speech enhancement algorithm is well affected by how
these problems are dealt. In this paper, we did pilot study
on speech enhancement methods based on the HHT theory.
Many works need to be done to perfect HHT as well as to
apply HHT in speech processing field.

SNR MEASURE FOR THE ENHANCED SPEECH OBTAINED


WITH THREE DIFFERENT METHODS
SNRof Enhanced Speech Signal
HHTmethod

Wavelet soft
threshold method

Wavelet hard
threshold method

-lOdB

24dB

18dB

19dB

-6dB

23dB

21dB

22dB

-3dB

23dB

23dB

24dB

OdB

26dB

26dB

26dB

3dB

28dB

24dB

27dB

6dB

30dB

25dB

26dB

10dB

30dB

24dB

28dB

REFERENCES

This table shows that the noisy speech signal in


different SNR is executed three speech enhancement
methods respectively, and get the SNR of enhanced speech
signal. Considering the randomness of the noisy sample,
the SNR of denoising signal is the average of the different
50 times noisy samples. The experimental results
demonstrate the speech enhancement method based on the
HHT outperforms the classical wavelet soft or hard
threshold method and effectively improves the
performance.
V.

[I]

N. E. Huang and et aI., "The empirical mode decomposition and


the Hilbert spectrum for nonlinear and non-stationary time series
analysis", Proc.R.Soc.London.A, Vo1.454, 1998, pp. 903-995.

[2]

X. Zhu and et aI., "Gravity wave characteristics in the middle


atmosphere derived from the empirical mode decomposition
method", Journal of Geophysical Research, Vol.l02, 1997, pp.
16545-16561.

[3]

N. E. Huang, H. H. Shih, Z. Shen, S. Long, "The ages of large


amplitude coastal seiches on the Caribbean Coast of Puerto Rico",
Journal of Physical Oceanography, Vo1.30, No.8, August, 2000,
pp. 2001-2012.

[4]

A. D. Veltcheva, "Wave and group transformation by a Hilbert


spectrum", Coastal Engineering Journal, Vo1.44, No.4, April,
2002, pp. 283-300.

[5]

D. G. Goring, "Response of New Zealand waters to the Peru


tsunami of 23 June 2001", The Royal Society of New Zealand,
Vo1.36, 2002, pp. 225-232.

[6]

W. Huang and et aI., "Nonlinear indicial response of complex


nonstationary oscillations as pulmonary hypertension responding to
step hypoxia", Pro.Natl. Acad.Sci., Vo1.96, No.3, March, 1999, pp.
1834-1839.

CONCLUSION

In this paper, we introduce the basic method of the


HHT and the law of its EMD. Based on the law of the
EMD and the filter characteristic of the EMD components
(IMFs), a novel noise removal method is developed. The
proposed algorithm has been tested and compared to
conventional speech enhancement method, that is wavelet
soft or hard threshold method. The result shows that our

210

S-ar putea să vă placă și