Speech Processing Research Paper 10

2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering (CMCE)
The Application of Hilbert-Huang Transform in Speech Enhancement
Liwei Liu
College of Computer Science and Engineering
Changchun University of Technology
Changchun, China
E-mail:liuliwei@mail.ccut.edu.cn
-Hilbert-Huang
A bstract
Gehen g Chen, Fen g Qian

College of Computer Science and Engineering
Changchun University of Technology
Changchun, China
E-mail: chengeheng@mail.ccut.edu.cn
This Paper is organized as follows. A primary

presentation of HHT is given in Section II. In Section III,
the computer simulation example is analyzed to verifY the
validity of HHT in processing the nonlinear and non
stationary signal. In Section IV, an approach of speech
enhancement based on HHT is presented and then is used
to analyze some speech signals. Corresponding
experimental results are shown. Conclusions are given in
Section V.
Transform (HHT) is a new and
powerful theory for nonlinear and non-stationary signal
analysis and it is efficient for describing the local features of

dynamic signals. The paper introduces briefly the HHT
method,
validates
its
validity
through
the
analysis
of
example, presents a speech enhancement algorithm based on

the HHT, and contrasts with the speech denoising method of
wavelet. Simulation experiments show that the results of
based on the HHT to actualize speech denoising are the
enhancement
of
SNR,
definition
and
understanding
of
speech signal. The method of HHT adapts to the processing
II.
of speech signal.
Keywords-Hilbert-Huang transform; EMD;

filter; speech enhancement
I.
Compared with various data analysis methods, the

innovation of HHT is the introduction of IMF,which
guarantees the physically meaningful Instantaneous
Frequency. The HHT consists of two processes[I ].
time-space
Empirical Mode Decomposition

The procedure of EMD decomposition is to shift the
original data series until the signals are adaptively
decomposed into a number of IMFs. Every IMF must
satisfY two properties: (I ) the number of extrema and the
number of zero crossings are either equal or differ by one;
(2) the mean value of the envelope defined by the local
minima is constant zero. A special sifting process is
employed to extract all of IMFs. This sifting process is
described as follows.
Firstly, the upper envelopes and lower envelopes of
signals x(t) , as well as their mean value ml (t) ,are
calculated respectively. The first step of the sifting process
is to calculate the difference:
(I )
hl ( t ) x(t)-ml(t)
A.
INTRODUCTION
Nonlinear and non-stationary data processing is a

necessary part in pure research and practical applications.
In 1998, N. E. Huang [1] presented a new and powerful
method for the analysis of the nonlinear and non-stationary
time series data. The method is composed of two parts,
Empirical Mode Decomposition (EMD) and Hilbert
Transform(HT). The EMD is an adaptive decomposition
with which any complicated signal can be decomposed
into its Intrinsic Mode Functions(IMFs). With the HT, the
IMFs yield instantaneous frequencies as a function of time.
The final result is a three dimensional energy-frequency
time spectrum designated as Hilbert spectrum.
Practical applications of the HHT are broadly spread in
numerous scientific disciplines and investigations, e.g. on
gravity wave characteristics in the middle atmosphere to
derive useful physical insights into dispersive-dissipative
wave phenomenon [2] and on the ages of large amplitude
coastal seiches on the Caribbean coast[3]. Further, the
HHT has been used in other fields of geophysics, e.g. to
examine earthquake processes as well as for the
determination of the dispersion curves of seismic surface
waves and to study the effects of seismic motions on the
condition of buildings and structures in civil
engineering[4]. Moreover, the HHT is used in tsunami
research to detect earthquake generated water waves from
data series recorded from bottom pressure transducers in
the Northern Pacific and to examine the responses of New
Zealand coastal waters to the Peru tsunami[5].Additionally,
the EMD is also used in automatic human gait analysis that
is becoming increasingly important in the context of
human gesture recognition to serve as an individual
biometric characteristic[ 6].
978-1-4244-7956-6110/$26.00 2010 IEEE
HILBERT-HUANG TRANSFORM
207
However, hI ( t ) rarely satisfies the two IMF properties

and is taken as the first IMF of the signals straightway.
Therefore, the sifting usually has to be implemented for
more times, where the "difference" obtained in the
previous sifting is taken as "signals" in present sifting. If
after (k + I ) th sifting, corresponding difference k (t)
satisfies the IMF properties,
hlk ( t ) (kl) ( t ) - mlk (t )
=
(2)
then it can be taken as the first IMF component, denoted

by cI ( t ) , that is:
cI ( t ) hlk ( t )
=
(3)
In practice, to determine whether or not hlk ( t ) well

satisfies the IMF properties, we usually use socalled
standard deviation(SD) criterion, that is, to check if the
following inequality holds[I ]:
CMCE 2010
SD(k)=
f[lhl(k_Ihl)(k)--hlI) k I ]
(t) Z
(t)
tO
0.2
0.3
(4)
Where T is the length of data. Next, taking rest data

(5)
rl(t) x(t)- cI (t)
as "new" signals and implementing the sifting process on it,
we can obtain the second IMF Cz (t). This procedure
should be repeatedly used for n times until the last residue
rn(t) becomes a monotonic function. When the
So getting f(t) is:

(14)
f(t)= 30 + 7 . 5cos(30m)
The frequency fluctuates between [22.5,37.5]. The
Amplitude variation scope is between [0.8,1.2], and its
variation frequency is 7.5Hz.
decomposition procedure finished, the signals then can be

expressed as:
n
(6)
x(t)= >i(t) + rn(t)
i=1
where c l(t),C (t),.,c n(t) , are all of the IMFs included in
Z
the signals, and rn (t) is a negligible residue.
50
fci(r)dr
Zi(t)= ci(t) + jYi (t)= ai(t)e)e,(I)

where
ai(t)
B (t)
I
ciZ(t) + YiZ(t)
arctan
Yi(t)
ci(t)
150
200
250
Dl
350
400
450
500
Figure 1 shows the IMF components derived from the

x(t) by EMD. The Signal is sampled eight periods, and
64 points of each period. The IMFI corresponds to the
150Hz sinusoidal part. The IMF2 corresponds to the FM
AM part, its waveform has change in amplitude and
spacing. The res is the residue.
(7)
t-r
where P indicates the Cauchy principle value. With this
definition, a complex series Zi(t) is formed:
n:
100
Figure 1. The result of EMD
the transform domain:
. r:JWW\J\I\/J
,. : :
Hilbert Transform
As mentioned above, the main purpose of the EMD is to
conduct the HT and obtain the Hilbert spectrum which is
similar to wavelet spectrum. After conducting HT to every
IMF component, Ci(t) , we have a new data series Yi(t)in
B.
Yi(t)=
Empirical Mode Decomposition
Hilbert-Huang spectrum
(8)
(9)
(10)
and the IF is:
met)=
dB/t)
(11)
dt
Compared with the traditional FFT, ai(t) and mi (t)
derived by HHT are functions of time t, not constant,
which are different from FFT, so the HT can present the
varying of the power with time.
I
III.
time
Figure 2. The energy-frequency-time spectrum

Figure 2 shows the energy-frequency-time spectrum
based on the obtained IMFs. The horizontal coordinate is
sampling time, the vertical coordinate is frequency, the
color of bar chart shows the size of amplitude. There are
two frequencies in Figure 2. One is 150Hz that is
unchanged with time. Another is fluctuating between [22.5,
37.5] with time that fundamental frequency is 30Hz, the
change of color is between [0.8,1.2] and has two times in
eight periods that shows the frequency of amplitude
change is 7.5Hz. From the above we can see that the
energy-frequency-time spectrum can extract the various
characteristics and parameters of the signals frequency and
amplitude with time.
This example can verify that the HHT is a new and
powerful method for the analysis of the nonlinear and non
stationary time series data.
ANALYZING OF THE COMPUTER SIMULATION

EXAMPLE
In order to verify the effectiveness of HHT in dealing

with nonlinear and non-stationary signal, this paper
analyzes the analytic expression of frequency modulation
signal, it is:
x(t) (1 + 0.2(2n: 7.5t x cos(2n:30t + 0.5 sin(2n:15t
+sin(2n:150t)
(12)
The signal is overlying of two parts that one is a FM-AM
signal of 30Hz fundamental frequency, 15Hz modulation
frequency, another is a 150Hz sinusoidal signal. We can
get angular frequency met) through analyzing the
frequency of FM about the part of FM-AM:
(13)
m(t) 60n:+15n:cos(30m)
=
208
IV.
SPEECH ENHANCEMENT METHOD BASED ON
The original pure speech signal is ploted in Figure

3(a), the noisy speech signal at 3dB SNR is ploted in
Figure 3(b). The noisy speech signal is processed with the
denoising method based on the HHT and wavelet soft
threshold and hard threshold methods, and then the results
will be compared.
HHT
Speech signal is a kind of typical non-stationary signal,

but for the speech signal analysis and said has been based
on the hypothesis of short-term stationary, and using the
analysis method of stationary. Although these analysis
method in practical application has achieved great success,
but they are stiJl exist significant differences compared
with the people's perception. With the continuous
improvement for the requirement of the speech signal
processing, using suitable nonlinear and non-stationary
signal processing method to analyzing the speech signal is
attended by more and more people. The HHT is an
effective new analysis method that meets the requirements.
Based on the characteristics of the time-space filtering of
the EMD algorithm, it is applied to speech enhancement,
and through the simulation experiments prove the
effectiveness of this method.
The core of HHT is the EMD algorithm. The EMD can
sifting the signal, and to get many IMFs that is changed
from small scale to big scale. In time, each IMF shows a
modal of certain scale. In frequency, the performance of
the filtering process is showed from high frequency to low
frequency. For example, If the signal is decomposed into
the n IMF components, then the low-pass filter can be
expressed as:
n
XI
(t) L (t) + (t)

=
i=1
i>
i=l
the band-pass filter can be expressed as:

I
Xb
(t) I cJt)
i=h
(17)
Based on the above principles, the paper puts forward

a speech enhancement algorithm for the broadband
additive noise based on the HHT. In the experiment, the
speech signal set) is recorded. It is sampled at a
frequency of 8kHz and converted into digital data with a
precision of 16 bits, the content is "open" for Chinese girls.
This pure voice signal is superimposed gaussian white
noise. The value of variance 0'2 is changed to constitute
seven groups signals that the SNR is 10dB, 6dB,
3dB, OdB respectively.
;"-
-------===---------- -------== ----",j
--
_:t
500
'"
.
1000
1500
2000
(a)
2500
3000
3500
4000
1000
1500
2000
(b)
2500
3000
3500
20CIJ
: :::j
2500
3IXlJ
3500
4000
::=
sgn(IMF; n )(IIMF; n 1- 8), IIMF; nI > 8

'
,
,
O,IIMF;,nl < 8
(18)
- 210gN /
/I n(i + I)
(19)
here (5"2 is the estimated noise variance. Obviously along

with scale i increase, the threshold along with reduces.
After the processing of threshold, the high frequency
IMFs and the low frequency IMFs are superimposed, and
the de-noising speech signal is reconstructed. By contrast
with the above method, the speech enhancement is
realized by the wavelet transform method (The wavelet
-1__________
500
1500
Where sgn is symbolic function, 8 is defined as threshold

and its calculation formula is:
"0
1 000
Firstly, decompose the noisy speech signal to IMFs by

the EMD method. The IMF components and residue
derived from the noisy speech signal(Figure 3(b)) shown
in Figure 4. It shows the IMFlIMF5 contains a high
frequency component of the signal, and the noise is
contained in them approximately. But if using low-pass
filter directly, the useful speech signal will be lost because
the distribution of broadband noise spectrum and voice
spectrum is overlapped. So the high frequency
components of the IMFs is processed using wavelet de
noising methods of soft threshold, that is to say, for each
IMF component adopt a floating threshold to identifY the
data that carries less energy. In other words the data that
less than or equal to the threshold wiJl as the actual value
of zero, and keep only threshold above. The specific
processing according to the foJlowing formula.
IMF'I,n
500
______----,,
Figure 4. The IMF components and residue derived from

the noisy speech signal
(16)
g "O "2 c...______________----"--,---,,-__

--
the high-pass filter can be expressed as:
h
xh(t) Lci(t)
" g i@f'ij.j'I'I.'.+t4I.r I!lfoM "'
9H
"1@8
____
-
-. F0N0l\
-F?:\7=
-/=
"
(15)
rn
4000
Figure 3. (a) Original pure speech signal

(b) Noisy speech signal at 3dB SNR
209
base uses db5) that using soft threshold and hard threshold
methods respectively.
TABLE I.
SNRof
Noisy
Speech
Sinal
algorithm achieved the better performance under tested

environmental conditions. It is effective to remove more
noise and is capable to improve the SNR of the speech.
After enhanced, the articulation and intelligibility of the
speech is still good.
HHT is a new theory which has important theoretical
value and widely applying perspective. Nevertheless, it is
still not perfect and has some problems to be solved such
as the curve fitting problem, end disposal problem, mode
mixing problem, and so on. The effect of HHT-based
speech enhancement algorithm is well affected by how
these problems are dealt. In this paper, we did pilot study
on speech enhancement methods based on the HHT theory.
Many works need to be done to perfect HHT as well as to
apply HHT in speech processing field.
SNR MEASURE FOR THE ENHANCED SPEECH OBTAINED

WITH THREE DIFFERENT METHODS
SNRof Enhanced Speech Signal
HHTmethod
Wavelet soft
threshold method
Wavelet hard
threshold method
-lOdB
24dB
18dB
19dB
-6dB
23dB
21dB
22dB
-3dB
23dB
23dB
24dB
OdB
26dB
26dB
26dB
3dB
28dB
24dB
27dB
6dB
30dB
25dB
26dB
10dB
30dB
24dB
28dB
REFERENCES
This table shows that the noisy speech signal in

different SNR is executed three speech enhancement
methods respectively, and get the SNR of enhanced speech
signal. Considering the randomness of the noisy sample,
the SNR of denoising signal is the average of the different
50 times noisy samples. The experimental results
demonstrate the speech enhancement method based on the
HHT outperforms the classical wavelet soft or hard
threshold method and effectively improves the
performance.
V.
[I]
N. E. Huang and et aI., "The empirical mode decomposition and

the Hilbert spectrum for nonlinear and non-stationary time series
analysis", Proc.R.Soc.London.A, Vo1.454, 1998, pp. 903-995.
[2]
X. Zhu and et aI., "Gravity wave characteristics in the middle

atmosphere derived from the empirical mode decomposition
method", Journal of Geophysical Research, Vol.l02, 1997, pp.
16545-16561.
[3]
N. E. Huang, H. H. Shih, Z. Shen, S. Long, "The ages of large

amplitude coastal seiches on the Caribbean Coast of Puerto Rico",
Journal of Physical Oceanography, Vo1.30, No.8, August, 2000,
pp. 2001-2012.
[4]
A. D. Veltcheva, "Wave and group transformation by a Hilbert

spectrum", Coastal Engineering Journal, Vo1.44, No.4, April,
2002, pp. 283-300.
[5]
D. G. Goring, "Response of New Zealand waters to the Peru

tsunami of 23 June 2001", The Royal Society of New Zealand,
Vo1.36, 2002, pp. 225-232.
[6]
W. Huang and et aI., "Nonlinear indicial response of complex

nonstationary oscillations as pulmonary hypertension responding to
step hypoxia", Pro.Natl. Acad.Sci., Vo1.96, No.3, March, 1999, pp.
1834-1839.
CONCLUSION
In this paper, we introduce the basic method of the

HHT and the law of its EMD. Based on the law of the
EMD and the filter characteristic of the EMD components
(IMFs), a novel noise removal method is developed. The
proposed algorithm has been tested and compared to
conventional speech enhancement method, that is wavelet
soft or hard threshold method. The result shows that our
210

Speech Processing Research Paper 10

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Speech Processing Research Paper 10

Încărcat de

Drepturi de autor:

Formate disponibile

2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering (CMCE)

The Application of Hilbert-Huang Transform in Speech Enhancement

Gehen g Chen, Fen g Qian

This Paper is organized as follows. A primary

Transform (HHT) is a new and

powerful theory for nonlinear and non-stationary signal

analysis and it is efficient for describing the local features of

example, presents a speech enhancement algorithm based on

speech signal. The method of HHT adapts to the processing

Keywords-Hilbert-Huang transform; EMD;

Compared with various data analysis methods, the

Empirical Mode Decomposition

Nonlinear and non-stationary data processing is a

However, hI ( t ) rarely satisfies the two IMF properties

then it can be taken as the first IMF component, denoted

In practice, to determine whether or not hlk ( t ) well

Where T is the length of data. Next, taking rest data

So getting f(t) is:

decomposition procedure finished, the signals then can be

Zi(t)= ci(t) + jYi (t)= ai(t)e)e,(I)

Figure 1 shows the IMF components derived from the

Figure 1. The result of EMD

the transform domain:

Empirical Mode Decomposition

and the IF is:

Figure 2. The energy-frequency-time spectrum

ANALYZING OF THE COMPUTER SIMULATION

In order to verify the effectiveness of HHT in dealing

SPEECH ENHANCEMENT METHOD BASED ON

The original pure speech signal is ploted in Figure

Speech signal is a kind of typical non-stationary signal,

(t) L (t) + (t)

the band-pass filter can be expressed as:

Based on the above principles, the paper puts forward

sgn(IMF; n )(IIMF; n 1- 8), IIMF; nI > 8

here (5"2 is the estimated noise variance. Obviously along

Where sgn is symbolic function, 8 is defined as threshold

Firstly, decompose the noisy speech signal to IMFs by

Figure 4. The IMF components and residue derived from

g "O "2 c...______________----"--,---,,-__

the high-pass filter can be expressed as:

" g i@f'ij.j'I'I.'.+t4I.r I!lfoM "'

Figure 3. (a) Original pure speech signal

algorithm achieved the better performance under tested

SNR MEASURE FOR THE ENHANCED SPEECH OBTAINED

This table shows that the noisy speech signal in

N. E. Huang and et aI., "The empirical mode decomposition and

X. Zhu and et aI., "Gravity wave characteristics in the middle

N. E. Huang, H. H. Shih, Z. Shen, S. Long, "The ages of large

A. D. Veltcheva, "Wave and group transformation by a Hilbert

D. G. Goring, "Response of New Zealand waters to the Peru

W. Huang and et aI., "Nonlinear indicial response of complex

In this paper, we introduce the basic method of the

S-ar putea să vă placă și

g "O "2 c...____________----"--,---,,-