Documente Academic
Documente Profesional
Documente Cultură
Noureddine MEHALLEGUE
Ahmed BOURIDANE
I. INTRODUCTION
Regional accent variation is an important aspect of speech
variability. It reflects a wealth of information related to the
geographical, the social and ethnic background of the speaker
[1][2]. That is why a lot of effort has been made to
automatically identify this kind of information from the speech
signal. Recently, one of the most important research topics in
the speech processing field is related to dialects and regional
accents recognition.
Investigating dialects and regional accents can provide
important benefits to speech technology beyond improving
speech recognition [3]. It can help in speaker recognition by
narrowing the search space at the front end once features used
in Automatic Speaker Recognition Systems (ASRS) are
adapted to regional origin [3]. Also, in the context of
immigration screening, for example, it may be helpful to verify
whether an applicants accent corresponds to accents spoken in
a region he claims he is from [4]. Moreover, it can be helpful
for forensic speaker profiling in judicial or military situation
[5]. Identifying the speakers accent, his/her nationality and/or
hometown, can often lead to important clues with regard to the
2015 IEEE
Where
is the Universal Background Model mean
supervector, is a low rank matrix that defines the lowdimensional space, and
is a standard-normally distributed
latent variable representing the i-vector.
In the experimental part, the same steps used in [35] have
been applied in our experiment. The background development
dataset has been used to train the UBM (2048) and the total
variability
subspace
matrix
400 , two
channel
compensation methods, Linear Discriminant Analysis (LDA)
and Within Class Covariance Normalization (WCCN), are
applied for dimensionality (100) and variability reduction and
an identity vector is estimated of each Algerian PFTA (LDA
attempts to transform the axes to maximize the between classes
variability and minimizing the intra-class variability while
WCCN uses the inverse of the within-class covariance to
normalize the cosine kernel).
During testing, the approach applied in the training phase is
used to extract a vector from the test utterance of unknown
accent. The projection matrices of LDA and WCCN are
applied to transform the obtained vector to a low-dimensional
space. Finally, the cosine kernel between two i-vectors and
is computed according to the following equation [35]:
,
||
.
|| ||
||
Where
and
denote the number of correctly classified
and the total number of utterances in the test dataset,
respectively.
D. Results and discussion
The results of comparing the two Algerian PFTA using the
i-vectors approach are reported in Table I.
TABLE I.
CPFTA (%)
EPFTA (%)
78.48
15.73
21.52
84.27
SNR
Corr- classified
Miss-classified
Total N. Files
Ecc (%)
Len 10s
18
11
29
62.07
15 dB
Len >10s
223
103
326
68.40
SNR > 15 dB
Len 10s
1296
235
1531
84.65
Len >10s
1812
293
2105
86.08
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18] B. Ma, D. Zhu, and R. Tong, Chinese Dialect Identification Using Tone
Features Based on Pitch Flux, in 2006 IEEE International Conference
on Acoustics Speed and Signal Processing Proceedings, 2006, vol. 1, no.
3, pp. I1029I1032.
[19] C. Woehrling and P. Boula de Mareuil, Identification daccents
regionaux en francais: perception et analyse, Revue PArole, vol. 37, p.
55, 2006.
[20] A. Akbari and B. Nasersharifl, A Classifier Combination Approach for
Farsi Accents Recognition, pp. 716720, 2012.
[21] A. DeMarco and S. J. Cox, Iterative classification of regional British
accents in i-vector space., in MLSLP, 2012, pp. 14.
[22] M. Bahari, N. Dehak, L. Burget, A. M. Ali, and J. Glass, Non-negative
Factor Analysis of Gaussian Mixture Model Weight Adaptation for
Language and Dialect Recognition.
[23] N. F. Chen, S. W. Tam, W. Shen, and J. P. Campbell, Characterizing
Phonetic Transformations and Acoustic Differences Across English
Dialects, IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 22, no.
1, pp. 110124, 2014.
[24] M. Al-Ayyoub, M. K. Rihani, N. I. Dalgamoni, and N. A. Abdulla,
Spoken Arabic dialects identification: The case of Egyptian and
Jordanian dialects, in Information and Communication Systems
(ICICS), 2014 5th International Conference on, 2014, pp. 16.
[25] M. H. Bahari, N. Dehak, H. Van hamme, L. Burget, A. M. Ali, and J.
Glass, Non-Negative Factor Analysis of Gaussian Mixture Model
Weight Adaptation for Language and Dialect Recognition, IEEE/ACM
Trans. Audio, Speech, Lang. Process., vol. 22, no. 7, pp. 11171129, Jul.
2014.
[26] H. Wang, C.-C. Leung, T. Lee, B. Ma, and H. Li, Shifted-Delta MLP
Features for Spoken Language Recognition, IEEE Signal Process. Lett.,
vol. 20, no. 1, pp. 1518, Jan. 2013.
[27] H. Li, B. Ma, and K. A. Lee, Spoken language recognition: from
fundamentals to practice, Proc. IEEE, vol. 101, no. 5, pp. 11361159,
2013.
[28] B. Ma, H. Li, and R. Tong, Spoken language recognition using
ensemble classifiers, Audio, Speech, Lang. Process. IEEE Trans., vol.
15, no. 7, pp. 20532062, 2007.
[29] H. BAGUI, Aspects of Diglossic Code Switching Situations: A
Sociolinguistic Interpretation, Eur. J. Res. Soc. Sci., vol. Vol. 2., 2014.
[30] D. Van Leeuwen, Accent Recognition Using I-Vector , Gaussian Mean
Supervector And Gaussian Posterior Probability Supervector For
Spontaneous Telephone Speech", Center for processing speech and
images, KU Leuven , Belgium Center for Lang, pp. 73447348, 2013.
[31] A. H. Khadidja, Language Maintenance and Language Shift among
Kabyle Speakers in Arabic Speaking Communities The Case of Oran,
2013.
[32] B. M. Meriem, A Sociolinguistic Profile of French in Algeria: The
Case of Tlemcen Speech Community, University of Tlemcen, 2014.
[33] S. A. Selouani and M. Boudraa, Algerian Arabic speech database
(ALGASD): corpus design and automatic speech recognition
application, Arab. J. Sci. Eng., vol. 35, no. 2C, p. 158, 2010.
[34] H. Wang, C.-C. Leung, T. Lee, B. Ma, and H. Li, Shifted-delta mlp
features for spoken language recognition, Signal Process. Lett. IEEE,
vol. 20, no. 1, pp. 1518, 2013.
[35] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Frontend factor analysis for speaker verification, Audio, Speech, Lang.
Process. IEEE Trans., vol. 19, no. 4, pp. 788798, 2011.
[36] L. Burget, O. Plchot, S. Cumani, O. Glembek, P. Matejka, and N.
Brummer, Discriminatively trained probabilistic linear discriminant
analysis for speaker verification, in Acoustics, Speech and Signal
Processing (ICASSP), 2011 IEEE International Conference on, 2011,
pp. 48324835.