Sunteți pe pagina 1din 9

Mesleh A, Skopin D, Baglikov S et al. Heart rate extraction from vowel speech signals.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(6): 12431251 Nov. 2012. DOI 10.1007/s11390-012-1300-6

Heart Rate Extraction from Vowel Speech Signals


Abdelwadood Mesleh1 , Dmitriy Skopin1 , Sergey Baglikov2 , and Anas Quteishat1
1 2

Computer Engineering Department, Faculty of Engineering Technology, Al-Balqa Applied University, Amman, Jordan Help MediCom Group, Kursk, Russia

E-mail: wadood@bau.edu.jo; m825453@yahoo.com; baglikovs@bk.ru; Anas.Quteishat@fet.edu.jo Received November 17, 2011; revised May 14, 2012. Abstract This paper presents a novel non-contact heart rate extraction method from vowel speech signals. The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans where it is observed that the moment of heart beat causes a short increment (evolution) of vowel speech formants. The short-time Fourier transform (STFT) is used to detect the formant maximum peaks so as to accurately estimate the heart rate. Compared with traditional contact pulse oximeter, the average accuracy of the proposed non-contact heart rate extraction method exceeds 95%. The proposed non-contact heart rate extraction method is expected to play an important role in modern medical applications. Keywords electrocardiogram, feature extraction, heart rate, short-time Fourier transform, vowel speech signal

Introduction

It is known that there are more and more heart patients. This growth motivates researchers to develop tools that monitor the heart rate. During athletic activities, it is also desirable to monitor the heart rate, to achieve optimal results and to insure personal safety[1] . From a medical point of view, the measurement of heart rate varies from investigations of central regulations of autonomic state, to studies of fundamental links between psychological processes and physiological functions, to evaluations of cognitive developments and clinical risks[2] . Heart rate is traditionally measured by detecting arterial pulsation. The heart electrical activities are measured by electrocardiogram (ECG). ECG[1] is an important non-invasive diagnostic tool for assessing the condition of the human heart. Each beat is made up of a series of waves: P-wave, QRS complex, T-wave and occasionally a U-wave (see Fig.1). The signal morphology and timing are indicative of dierent clinical conditions: for example, changes in the ST segment suggest a poor blood supply to heart muscle, while multiple P-waves indicate low cardiac output and often cause clots in the atria. Recently, many algorithms have been developed to analyze ECG signals using support vector machine[3] , self-organizing maps[4] , etc. Generally speaking, the ECG features can be extracted in the time domain[5-7] or in the frequency domain[8-9] using

many feature extraction methods such as the discrete wavelet transform[5-6] , Karhunen-Loeve transform[10] , Hermitian basis and other methods[11] . All the mentioned ECG feature extraction methods are based on ECG signals and are noninvasive contact methods of recording the variations of the bio-potential signals acquired from human skin surface.

Fig.1. Schematic representation of a normal ECG.

Regular Paper In noninvasive methods, medical experts use electrodes that are placed on the patients skin to detect bioelectrical signals such as ECG signals. 2012 Springer Science + Business Media, LLC & Science Press, China

1244

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

It is known that the demand for contactless heart monitoring has increased lately, especially for long duration monitoring and for patients with particular conditions, such as burn victims, or infants at the risk of sudden infant syndrome. In this paper, a novel noncontact method of heart rate extraction is proposed. The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans. As the human speech signals are recorded using standard microphones and transmitted via mobile communication media, the proposed method opens new elds of applications such as heartbeat detection and monitoring of far located patients. A relevant heart rate detection method[12] is based on two-dimensional (2-D) spectrum representation; however, it has not provided an automatic measurement of heart rate parameters. On the other hand, in this work, the relationship between vowel speech production and heart activities is modeled to automatically estimate heart rate parameters after recoding a vowel speech signal. Moreover, this work handles noise and achieves better results. The rest of this paper is organized as follows. Section 2 introduces the relationship between human vowel speech production and heart activities. Section 3 describes the heart rate extraction from vowel speech signals. Experimental results and discussion and conclusions are discussed in Sections 4 and 5, respectively. 2 Human Vowel Speech Production and Heart Activities

result, it should be possible to detect changes of speech properties that are related to human heart activities by obtaining the corresponding frequency characteristics of the vowel speech signal and the raw ECG data of the same person. Fig.2 shows the time domain of the vowel speech signal and the corresponding ECG signal after suppressing P and T waves using a low pass lter with a 40 Hz cut-o frequency. The two signals belong to the same patient but they have dierent spectra (ECG signal is extremely oversampled), however they can be represented in time domain together, in one scale.

Human speech signals[13] contain linguistic, expressive, organic and biological information. The sourcelter theory of speech production model considers the human acoustic speech output as the combination of a source of sound energy (e.g., the larynx) modulated by a transfer function (lter) determined by the shape of the supralaryngeal vocal tract. The result of the mentioned combination is a shaped spectrum with broadband energy peaks. The supralaryngeal vocal tract, which consists of both the oral and nasal airways, serves as a time-varying acoustic lter that suppresses the passage of sound energy at certain frequencies and allows its passage at other frequencies[14] . Formants are those frequencies at which local energy maxima are sustained by the supralaryngeal vocal tract and are determined by the overall shape, length and volume of the vocal tract[15] . Taking into account the fact that the larynx contains muscles covered by many blood vessels that are connected to the human circulatory system, it is concluded that human heart rates are dynamically related to the variations of vocal cord parameters and directly related to the acoustic properties of human speech. As a

Fig.2. Time domain vowel speech and the ECG signals of the same male patient. (a) Vowel speech signal (vowel /i:/ like in the word email). (b) ECG signal recorded at the same time when the vowel speech signal is pronounced (after suppressing P and T waves using a high pass lter with a 40 Hz cut-o frequency).

STFT (short-time Fourier transform)[16-17] is used to study the frequency characteristics of the vowel speech signal; the STFT of the sequence x(m) is dened as: Xn (ejwi ) =
m

x(m)w(n m)ejwi m .

(1)

Taking into consideration that Xn (ejwi ) is evaluated for a xed n, STFT is the conventional Fourier transform of the windowed signal x(m)w(n m), evaluated at frequency w = wi . Since w(m) is an FIR (nite impulse response) lter of a nite size, if the size of w(m) is large, relative to the signal periodicity, then Xn (ejwi ) gives good frequency resolution. On the other hand,

Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals

1245

if the size of w(m) is small, then Xn (ejwi ) gives poor frequency resolution. To extract heart rates from the vowel speech signal, the size of w(m) should be less than the RR-interval of ECG signal for the same patient. The STFT spectrogram of vowel speech (vowel/i:/) recorded immediately after sit-up exercise with corresponding ECG signal of the same patient is shown in Fig.3 (horizontal lines are speech formants). It can be seen that the heart activity (a moment of R wave appearing on ECG signal) produces a frequency modulation of the vowel speech signal for all formants located within a 16 KHz frequency band (see the blue vertical lines in Fig.3). Accordingly, it is possible to extract the relevant heart rate information (RR-interval dened as a time between two sequential R waves of ECG signal) directly from the spectrogram.

dimensional (1-D) signal is obtained. Generally, there are two possible cases of the horizontal scanning (see Fig.4):

Fig.4. Ideal STFT spectrogram of the vowel speech signal and the allocation of scanning lines.

Fig.3. Spectrogram of a vowel speech signal and the corresponding ECG signal (the vertical blue lines indicate the frequency modulation of the speech vowel caused by heart activities).

Heart Rate Extraction from Vowel Speech Signals

Ideally, the STFT spectrogram includes speech formants without any noise. But practically, the heart activities cause distortion to all formants, and the duration of the associated distortion is approximately equal to 0.2 seconds with 100 Hz magnitude. Fig.4 shows an ideal STFT spectrogram for a vowel speech signal. It represents a 40 beats per minute (bpm) heart activity which is the lowest heart rate in real situations[15] . In order to extract the relevant heart rate information from the corresponding STFT spectrogram, a searching algorithm is proposed to horizontally scan the STFT spectrogram starting from the top (Nyquist frequency) to the bottom (DC) of the original speech signal. Each time the algorithm scans the spectrogram of the speech signal horizontally, a one-

When a scanning line passes through a part of the spectrogram beyond the bounds of the formants, it contains background information only and it is not able to extract a useful 1-D signal (see the scanning line labeled a in Fig.4). When a scanning line passes through any part of a speech formant, it is able to extract a useful 1-D signal (see the scanning lines labeled b and c in Fig.4). Each extracted 1-D signal (useful 1-D signal) passes through a 5th-order FIR low-pass lter to suppress high frequency components. Finally, a discrete Fourier transform (DFT) is applied to the ltered 1-D signals. Fig.5 shows the extracted useful 1-D signals in time and frequency domains for the scanning lines b and c. It is clear that the amplitude of the 4th harmonic is the maximum. Based on Fourier transform properties[17] , the harmonic number four of a six-second speech signal corresponds to 0.67 Hz frequency and to 40 bits per second heart rate (which is exactly the heart rate of the original signal). As a result, it is concluded that heart rates can be extracted using a number of harmonics with maximum magnitudes, i.e., using an order statistics lter[18] of the STFT spectrum: R = k max{X k |k = 1, 2, . . . , N/2}, (2)

where N is the length of the extracted 1-D signal x(n), the symbol denotes operation index of, and nally X k is the vector of magnitudes of the DFT of the extracted 1-D signal x(n). The spectrum of the DFT X (h) is dened as:
N

X (k ) =
n=0

x(n)ej 2kn/N ,

(3)

1246

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

Fig.5. Time and frequency domains of the 1-D signal extracted by the scanning lines b and c. It is noted that the amplitude of the 4th harmonic is the maximum that corresponds to 40 bpm heart rate. (a) Time domain of the 1-D signal extracted by the scanning line b. (b) Frequency domain of the 1-D signal extracted by the scanning line b. (c) Time domain of the 1-D signal extracted by the scanning line c. (d) Frequency domain of the 1-D signal extracted by the scanning line c.

where k = 0..N/2 is the number of harmonics of the one-sided spectrum. Applying the order statistics lter (2) and the DFT (3) on each scanning line of the spectrogram (see Fig.4) estimates the 2-D spectrum for the heart rate (see Figs. 68). In Fig.6, the position on x (the heart rate frequency is estimated in bpm) of the 2-D spectrum is estimated with respect to the Nyquist and sampling theories. Applying a typical STFT (see (1)) of a vowel speech signal x(m) for L a hamming window of length N samples produces L/2 coecients, and these coecients represent the frequencies from 0(DC) to fs /2 on the y (vertical axis) of the 2-D spectrum, where fs is the sampling rate of the original vowel speech recorded signal. On the other hand, applying Fourier transform (see (3)) produces ts /2L frequency harmonics from 0(DC) to fs /2L on the x (horizontal axis) of the 2-D spectrum, where ts is duration of x(m) signal in seconds. Accordingly, the heart frequency is evaluated using two relations: tr /2L : fs /2L and xp : fhr , where tr is the recoding period of the input vowel speech signal, xp denotes a position for a candidate heart rate frequency in the 2-D spectrum, fhr is the estimated heart rate frequency and the operator : denotes a relation. Solving the above

relations around fhr estimates the heart rate (heart rate = 60 (xp /ts )). It is known that the heart rate varies from 40 to 200 bpm, and the formants of the human vowel speech vary from 1 to 6 KHz. As a result, the search region (region of interest) is the area bounded from 40 to 200 bpm on the x axis and from 1 to 6 KHz on the y axis. Taking into account the fact that heart rate usually are estimated in bpm while frequency of harmonics in Hertz, we have organized the horizontal axis of 2-D spectrum in bpm units using relation[2] : bpm = 60/fhr (see Fig.6). 4 4.1 Experimental Results and Discussion Testing the Robustness of Proposed Heart Rate Extraction Method

To test the robustness and accuracy of the proposed heart rate extraction method using human vowel speech, a heart rate detection system (HRDS) is implemented. The HRDS is able to capture and analyze the frequency characteristics of human vowel speech and ECG signals. HRDS uses a standard personal computer with a two-channel sound card to achieve the function of analog-to-digital conversion. The left channel of the

Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals

1247

Fig.6. Heart rate estimation for an ideal spectrogram. (a) Ideal spectrogram noised by machine noise and side talk. (b) Histogram of the spectrogram. (c) Same as (a) after ltering by (4). (d) 2-D spectrum estimation with 75 bpm heart rate for the noisy spectrogram in (a).

Fig.7. Heart rate estimation for ideal and noisy spectrograms. (a) 2-D spectrum extraction for the heart rate estimation with 40 bpm heart rate modulations in Fig.4. (b) 2-D spectrum extraction for the heart rate estimation with 100 bpm heart rate modulations.

sound card is connected to a standard microphone; the microphone frequency response range varies from 100 Hz to 16 KHz, while the right channel is connected

to a portable ECG recorder (Cardiette AR600). The vowel speech and the ECG signals are recorded concurrently within six-second periods with a 44 KHz sampling frequency. A Matlab code processes the recorded vowel speech and the ECG signals. In our experiments, the P and T waves of ECG signal are suppressed by the microphone amplier that contains a high pass lter with a 40-Hz cut-o frequency. To study the frequency characteristics of the vowel speech signal, the STFT parameter w(m) is set to 2 048 samples, considering a 44 KHz sampling frequency gives a 21-Hz spectrum resolution. Moreover, an overlap between windows is set to 1 800 samples which produces a 41-millisecond time resolution. Fig.3 illustrates an example of a real spectrogram for a vowel speech signal; it plots frequency against time with color that is used to indicate the relative strengths of the varied frequency components (color varies from dark red indicating low power components to orange indicating high power components). It is clear that the spectrogram contains speech formants (formants are the observed high power spectral density values in orange color). In our heart rate extraction method, the order statistics lter (see (2)) deals with these high powered spectral values and it ignores background speech

1248

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

Fig.8. Heart rate extraction using dierent vowel speech signals. (a) A vowel speech signal /i:/. (b) Heart rate estimation of a patient using vowel /i:/ like in the word email. (c) Vowel speech signal / :/. (d) Heart rate estimation of a patient using vowel / :/ like in the word four. e

information. Unfortunately, the spectrogram may contain noise (machine noise or a silent side talk). Fig.6(a) illustrates a noisy ideal spectrogram with 75 bpm heart rate modulations. In general, the formants can be affected by the following noise sources (see Fig.6(a)): The variation of the vowel speech tones during the vowel speech recording: some volunteers (patients) are not able to keep the same tone of vowel speech during the six-second recording. This problem is common for patients with insucient respiratory lung volume. Machine noise is a high-amplitude noise that has certain allocations of harmonics in the frequency domain (see the horizontal lines in the 2-D spectrogram in Fig.6(a)). Machine noise is converted to low frequency noise in the 2-D spectrum; it appears on the left side of the x-scale in the 2-D spectrum, however it is located outside the bounded search area of the spectrum (out of the region of interest). As a result, it is ignored during analysis. Side talk is the 6 KHz ashes along time axis in the 2-D spectrogram in Fig.6(a). Side talk noise may potentially generate a high frequency noise in the 2-D spectrum and it is located on the right side position of the region of interest. It is completely suppressed using the threshold lter (see (4)). Our heart rate extraction method ignores backgrounds during analysis and treats them as outliers that appear to be inconsistent with the remaining useful part of the 2-D spectrogram and their existence may lead to wrong heart rate extraction results. Methods based on statistical data distributions, prior knowledge of the nature of distributions, expected number of outliers, and the nature of expected outliers are used to detect outliers and are treated by histogram shape, clustering, entropy, and attribute similarity threshold methods[19-20] . In this work, the histogram of the noisy spectrogram is analyzed (see Fig.6(b)) and a one-sided threshold im-

age lter is implemented to reduce the side talk noise using (4) (see Fig.6(c)). X (m, n) = X (m, n), if X (m, n) 0, 0.1 max(X ),

if X (m, n) < 0.1 max(X ), (4)

where X (m, n) is the pixel value of the original spectrogram image of the vowel speech signal located in the mth column, n-th row; X (m, n) is the corresponding ltered spectrogram image and max(X ) is the maximum brightness of original spectrogram image. The lter is able to suppress spectrogram pixels with brightness less than 10% of maximum brightness. Fig.6(d) shows the 2-D spectrum estimation with 75 bpm heart rate for the noisy spectrogram in Fig.6(a). The heart rate extraction result conrms the accuracy of the proposed heart rate extraction method. It is obvious that there is an excellent agreement with the heart activity that refers to 75 Hz heart rate frequency. Fig.7(a) shows the 2-D spectrum estimation with 40 bpm heart rate for the ideal spectrogram in Fig.4. It is noted that the horizontal axis of the 2-D spectrum represents heart rate frequency graded in bpm; on the other hand, the vertical axis represents frequency of speech formants graded in KHz. It is obvious that there is an excellent agreement between the heart rate of original signal and the heart rate evaluated by 2-D spectrum. This agreement is conrmed by the mentioned harmonic number four of the six-second speech signal that corresponds to 0.67 Hz frequency and to a 40-bpm heart rate. As matter of fact, it conforms with our conclusion that human heart rates are extracted using a number of maximum magnitude harmonics. Fig.7(b) shows the 2-D spectrum extraction for the heart rate estimation with 100-bpm heart rate for a real patient, and the

Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals

1249

two parallel lines on the top of the 2-D spectrogram represent some high frequency noise which is generated by the signal itself and they are discarded by the proposed heart rate extraction algorithm. However, it is mentioned before that the search region (the region of interest) is bounded from 40 to 200 bpm. As a result, the proposed method starts searching the spectrogram from 1 to 6 KHz to extract the heart rate. Consequently, the proposed heart rate extraction method evaluates 100 bpm as the average heart rate. 4.2 Testing the Accuracy of Proposed Heart Rate Extraction Method

With reference to the properties of DFT[17] , it is known that the rst harmonic frequency of a signal is related to its duration and all other harmonics are multiples of its rst harmonic frequency; in our proposed heart rate extraction method, vowel speech signals are six seconds in length and the corresponding rst harmonic frequency is 0.17 Hz. Consequently, error of heart rate estimation in bpm can be evaluated by E = 60/ts , where ts is the time duration of the original vowel speech signal in seconds. Given that ts does not exceed the six-second limit, the heart rate error is always acceptable (5 bpm). Experimentally, error does not exceed 9% as shown in Table 1.
Table 1Heart Rate Extraction Using the Proposed Heart Rate Extraction Method ID Age Oximeter Our Heart (Year) Rate Extraction 1 22 105 98 2 25 89 85 3 27 131 120 4 27 95 90 5 23 105 100 6 25 103 100 7 23 98 90 8 27 135 130 9 26 119 124 10 27 123 120 11 23 113 120 12 23 82 80 13 22 115 120 14 23 112 120 15 23 105 98 16 15 129 125 17 8 142 150 18 7 134 128 19 38 111 105 20 39 92 90 21 37 105 100 Average percentage error (Accuracy = 95.08) Percentage Error 6.67 4.49 8.40 5.26 4.76 2.91 8.16 3.70 4.20 2.44 6.19 2.44 4.35 7.14 6.67 3.10 5.63 4.48 5.41 2.17 4.76 4.92

To address the accuracy of the proposed method using dierent English vowels, a pilot study (see Fig.8) is conducted for randomly selected speakers; each of them is asked to pronounce dierent English vowels. Results reveal that vowel/i:/ (like in the word email) is more applicable for our proposed algorithm. Fig.8(a) shows the spectrum of the vowel speech signal for a 115 bpm heart rate patient who pronounced vowel/i:/ (like in the word email), and the estimated heart rate is estimated as 110 bpm according to the position of points inside the region of interest (Fig.8(b)). Fig.8(c) shows the spectrum of the vowel speech signal / :/ (like the word four) for a 150 bpm heart rate patient, and the estimated heart rate is 140 bpm according to the position of points inside the region of interest (see Fig.8(d)). It is known[21] that a physical activity increases heart rate, cardiac output, and pulse amplitude. Immediately before speech vowel signal recording, volunteers and patients are requested to make a number of situp exercises to intensify the inuence of heart activity (in our experiments, heart rate exceeded 120 beats per minute). 21 volunteers (739 years old) are requested to pronounce an English vowel (vowel /i:/). And each six-second period vowel speech signal is recorded by the mentioned microphone, ltered by the 40 Hz cut-o low pass lter, sampled using a 44 KHz sampling frequency, transformed by STFT with the mentioned parameters, horizontally scanned by the scanning lines to produce the 1-D signals. Each extracted 1-D signal is ltered by a 5th-order FIR low-pass lter and nally, a fast Fourier transform is applied to the ltered 1-D signal. Noise is suppressed using the threshold lter. Finally, the application of the order statistics lter and the Fourier transform on each scanning line of spectrogram yields the 2-D spectrum, a bounded region (region of interest is described in Section 3) of the 2-D spectrum is searched to extract the heart rate, and nally the heart rate is estimated. Table 1 summarizes the results of applying our noncontact heart rate extraction method and the contact traditional pulse oximeters on the 21 volunteers. Correlation coecient is 0.953, while the average percentage error is 4.92 and the root mean square error is 5.936. The heart rate estimation results of the proposed method are analyzed by a paired t-test. All analyses and tests are conducted in an explorative manner on a 5% level of signicance. The computations are performed with the statistical software tool EasyFit 5.5. The mean (M) and the standard deviation (SD) of the oximeter results are 111.571 4 and 16.326 6 respectively. On the other hand, the Mand SD of the proposed

The 21 volunteers are patients and students. They were randomly selected and when agreed, they recorded their English speech vowel signals. Among them, there are 11 males and 10 females; their ages vary from 7 to 39 years old (see Table 1).

1250

J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

method are 109.190 5 and 18.123 5 respectively. The hypothesized mean dierence is set as a null hypothesis (i.e., M1 M2, where M1 is the mean of the oximeters results and M2 is the mean of the proposed methods results), is set to 0.05, the total degree of freedom is 39.571 7, dierence in sample mean is 2.381 0, t-test statistic is 0, the two-tailed test lower and upper critical values are 2.022 7 and 2.022 7, p-value is 1, the condence interval varies from 8.385 8 to 13.147 7 and the error margin is 10.766 8. It is clear that our proposed heart rate extraction method works better than the non-automatic heart rate extraction method[12] in term of accuracy (95.08% vs 92%). Moreover, our proposed heart rate extraction method handles noise. As a result, the average error is much less than that of [12] (4.92% vs 15%). 4.3 Discussion

more blood is needed when a person is exercising than when he or she is at rest. To some level, the heartbeats for transplanted hearts are also proportional to the level of activity of a person. On the other hand, the heartbeats for articial hearts are xed unless they are adapted with the patients activity. We have not tested the proposed heart rate extraction method for persons with articial or transplanted hearts (we cannot record vowel speech signals for such patients). 5 Conclusions

It is known that the error of heart rate evaluation using traditional pulse oximeters (contact-based methods) does not exceed 2%; on the other hand, using our proposed heart rate extraction method (a non-contact method), the average percentage error does not exceed 5% (the best accuracy of the proposed heart rate extraction is 97.82%) unless the vowel speech recording is less than six seconds. However, the error does not exceed 9% (the lowest accuracy of the proposed heart rate extraction is 91.60%) for patients with insucient respiratory lung volume who are not able to keep the same tone of vowel speech during the six-second recording or when the recording is less than six seconds. Finally, it should be noted that contact methods (traditional pulse oximeters and ECG evaluation methods) and non-contact methods (our proposed heart rate extraction method) are not directly comparable. Nevertheless, our proposed heart rate extraction method is applicable especially in situation where contact-based heart rate extraction methods are not available or inapplicable, for example, if patients are located in far regions and only recording their speech signal is convenient using their mobile phones. It is obvious that the proposed heart rate extraction method is not sensitive to the amplitude of formants, nor to the slope of formants, and it is able to accurately extract the heart rate in the presence of noise. The proposed approach is robust and is able to work in noisy environments; it discards noise (machine noise and side talk noise). However, the worst error of the proposed heart rate extraction method does not exceed 5%, on the other hand, the error of the oximeter is 2%. Generally speaking, the proposed heart rate extraction method is promising. It is known that heartbeats are directly proportional to the level of activity of a person;

In the modern mobile communication era, we believe that contactless heart rate monitoring methods are required especially for heart patients. In this work, a non-contact heart rate extraction method from vowel speech signals is proposed. The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans. It uses STFT to estimate heart rates and can successfully handles machine noise and side talk. Experimental results reveal that the proposed method is expected to play an important role in modern medical applications. In spite of not outperforming the traditional pulse oximeters, the accuracy of the proposed heart rate extraction method is practically accepted. We do not claim that the proposed method works for persons with articial or transplanted hearts. However, dealing with such patients is left for future work. Heart pathology using vowel speech signals is also left for future work. References
[1] Nelson M, Rejeski W, Blair S et al. Physical activity and public health in older adults: Recommendation from the American college of sports, medicine and the American heart association. Medicine & Science in Sports & Exercise, 2007, 39(8): 1435-1445. [2] Berntson G, Bigger J, Eckberg D et al. Heart rate variability: Origins, methods, and interpretive caveats. Psychophysiology, 1997, 34(6): 623-648. [3] Georgoulas G, Stylios C, Groumpos P. Predicting the risk of metabolic acidosis for newborns based on fetal heart rate signal classication using support vector machines. IEEE Trans. Biomedical Engineering, 2006, 53(5): 875-884. [4] Vasios G, Prentza A, Blana D et al. Classication of fetal heart rate tracings based on wavelet-transform and selforganizing-map neural networks. In Proc. the 23rd Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, October 2001, Vol.2, pp.1633-1636. [5] Linh T, Osowski S, Stodolski M. On-line heart beat recognition using Hermite polynomials and neuro-fuzzy network. IEEE Trans. Instrum. Meas., 2003, 52(4): 1224-1231. [6] Li S, Ji Y, Liu G. Optimal wavelet basis selection of wavelet shrinkage for ECG de-noising. In Proc. Int. Conf. Management and Service Science, September 2009, pp.1-4. [7] Hu Y, Palreddy S, Tompkins W. A patient-adaptable ECG beat classier using a mixture of experts approach. IEEE Trans. Biomedical Engineering, 1997, 44(9): 891-900.

Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals
[8] Moraes J, Seixas M, Vilani F, Costa E. A real time QRS complex classication method using Mahalanobis distance. In Proc. Computers in Cardiology, Sept. 2002, pp.201-204. [9] Papaloukas C, Fotiadis D, Likas A, Michalis L. Automated methods for ischemia detection in long duration ECGs. Cardiovascular Reviews & reports, 2003, 24(6): 313-319. [10] Jager F. Feature extraction and shape representation of ambulatory electrocardiogram using the Karhunen-Lo` eve transform. Electrotechnical Review, 2002, 69(2): 83-89. [11] Cuesta-Frau D, P erez-Cort es J, Andreu-Garc a G, Nov ak D. Feature extraction methods applied to the clustering of electrocardiographic signals: A comparative study. In Proc. the 16th Int. Conf. Pattern Recognition, August 2002, Vol.3, pp.961-964. [12] Skopin D, Baglikov S. Heartbeat feature extraction from vowel speech signal using 2D spectrum representation. In Proc. the 4th Int. Conf. Information Technology, June 2009. [13] Pickett J. The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Allyn & Bacon, 1998. [14] Browman C, Goldstein L. Representation and reality: Physical systems and phonological structure. Journal of Phonetics, 1990, 18: 411-424. [15] Maton A, Hopkins J, McLaughlin C et al. Human Biology and Health. New Jersey, USA: Prentice Hall, 1993. [16] Allen J, Rabiner L. A unied approach to short-time Fourier analysis and synthesis. Proceedings of IEEE, 1977, 65(11): 1558-1564. [17] Cohen L. Time-Frequency Analysis: Theory and Applications. New Jersey, USA: Prentice Hall, 1994. [18] Gonzales R, Woods R. Digital Image Processing (3rd edition), Prentice Hall, 2007. [19] Sezgin M, Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 2004, 13(1): 146-168. [20] James A, Dimitrijev S. Inter-image outliers and their application to image classication. Pattern recognition, 2010, 43(12): 4101-4112. [21] Turkbey E, Jorgensen N, Johnson W et al. Physical activity and physiological cardiac remodelling in a community setting: The Multi-Ethnic Study of Atherosclerosis (MESA). Heart and Education in Heart, 2010, 96(1): 42-48.

1251

Dmitriy Skopin received his M.Sc. and Ph.D. degrees in computer engineering from Kursk State Technical University, Russia, in 1995 and 1998, respectively. Since September 2003 until August 2005 he had been an associate professor with the Kursk State Technical University. Since September 2005 until present time he is a sta member of Al-Balqa Applied University, Jordan. His research interests focus on signal and image processing, advanced programming, and computer graphics. Sergey Baglikov received his Ph.D. degree in computer engineering from the Kursk State Technical University in 1998. He is currently the president of Help Medicom Group which is specialized on novel medical equipments and hightechnology industries. His research interests are the condition monitoring using infrared sensors and the control of power electronics, nanotechnologies and digital signal processing. Anas Quteishat received the BEng degree in electronics from Princess Sumaya University of Technology, Jordan, in 2003. He received his MSc degree in electronic systems design and Ph.D. degree in computational intelligence from University of Science Malaysia in 2005 and 2008 respectively. Currently he is an assistant professor at Al-Balqa Applied University in Jordan. His research interests include neural networks, multi-agent systems, pattern classication and rule extraction.

Abdelwadood Mesleh received his B.Eng and M.Sc. degrees in computer engineering from Shanghai University, China, in 1995 and 1998 respectively. He worked as a research and teaching assistant in the Electrical Engineering Department, Hong Kong University of Science and Technology, China, from 2004 to 2005. He received his Ph.D. degree in feature selection using ant colony optimization (ACO) for Arabic text articles from the Arab Academy for Banking and Financial Sciences, Jordan, in 2008. Since 2008, Dr. Mesleh has been an assistant professor in the Computer Engineering Department, Faculty of Engineering Technology, at AlBalqa Applied University. His research interests include optimization, fuzzy logic, generic algorithm, ACO, Arabic natural language processing, feature subset selection, Arabic speech recognition, MANETs, parallel processing, cryptanalysis, medical image and signal processing, operating systems etc.

S-ar putea să vă placă și