Documente Academic
Documente Profesional
Documente Cultură
Jan Lundberg
Division of Industrial Design, Luleå University of Technology, 971 87 Luleå, Sweden, jan.lundberg@ltu.se
Jan Berg
School of Music, Luleå University of Technology, Box 744, 941 28 Piteå, jan.berg@ltu.se
Specifications of product sound qualities may contain both perceptual and acoustical descriptions. The
perceptual descriptions are most helpful when they contain adequate detail and utilises understandable
wording. To facilitate the product design process the descriptions should also be interpretable as
acoustical quantities. The objectives of the study reported upon here were to investigate how musicians
use verbal descriptions of sound and to interpret these descriptions in terms of commonly used acoustical
quantities. Musicians’ use of verbal descriptions of saxophone sound was investigated through interviews.
The most frequently used words were evaluated through listening tests. The subjects were asked to judge
how well the words described the timbre of test sounds. To find the most significant perceptual
dimensions for the test sounds Principal Component Analysis was used. Four significant dimensions were
found and described by 9 words. To interpret the perceptual dimensions in terms of physically measurable
indices, models for how acoustical quantities relate to the perceptual dimensions were developed.
Dimension 1 was described by full-toned/warm/soft. The psycho-acoustical quantity sharpness correlated
negatively with this dimension. Dimension 2 was described by the term [o]-like. Sharpness and specific
roughness (9-11 Bark) correlated negatively with this dimension. Dimension 3 was described by
sharp/keen/rough. Sharpness and roughness correlated with this dimension. Dimension 4 was described
by the term [e]-like. No model for prediction of this dimension was found. To validate the models the
effect of a changed design of the tone holes of a saxophone was predicted with the model and validated
with new listening tests.
519
Forum Acusticum 2005 Budapest Nykänen, Johansson, Lundberg, Berg
520
Forum Acusticum 2005 Budapest Nykänen, Johansson, Lundberg, Berg
judgements of the perceptual qualities were made on The ability of the evaluated descriptors to separate the
11 point scales, ranging from “not at all” (=0) to saxophones and musicians from each other was
“extremely” (=10). A single exception was for the examined by ANOVA. Variables not affected by
estimation of the overall impression which was made musician, saxophone or interaction between musician
on an 11 point scale ranging from “extremely bad” (=- and saxophone were considered to be of low
5) to “extremely good” (=+5). importance in the description of the differences in
In listening test 1, 16 subjects were used. In listening timbre found among the analysed saxophone sounds.
test 2 and 3, 20 subjects were used. Models for prediction of verbal descriptions based on
acoustical quantities were developed by linear
Table 1: Descriptors used in listening test 1. regression on data from listening test 1. As dependent
variables, the mean of the judgements of all the
Swedish English translation subjects’ verbal descriptions were used. As
Stor Large independent variables, the acoustic quantities listed in
Fyllig Full-toned [12] table 3 were used. A stepwise estimation method was
Rå Rough used [13]. The limits used for probability of F for entry
Varm Warm was 0.05 and for removal 0.1. For those critical bands
Mjuk Soft [14] where specific loudness and roughness gave
Nasal Nasal significant contribution to a model, the correlation with
Kärnfull Centred specific loudness and roughness in neighbouring bands
Vass Sharp/keen [12] was examined, as well as correlation with overall
Botten Bottom loudness, sharpness and roughness. If significant
Skarp Sharp [12] correlation was found, models based on sums of
Tonal Tonal neighbouring critical bands were proposed and
likt a, som t.ex. i ”mat”* [a]-like evaluated, to avoid models relying on information in
likt e, som t.ex. i ”smet”* [e]-like single critical bands.
likt i, som t.ex. i ”lin”* [i]-like The models were validated by studying the correlation
likt o, som t.ex. i ”ko”* [u]-like between values predicted by the models and mean
likt u, som t.ex. i ”lut”* [u]-like values of judgments for listening tests 2 and 3.
likt y, som t.ex. i ”fyr”* [y]-like
likt å, som t.ex. i “båt”* [o]-like Table 3: Acoustic quantities used as independent
likt ä, som t.ex. i ”färg”* [æ]-like variables in the linear regression models.
likt ö, som t.ex. i ”kör”* [œ]-like
Index Description
Table 2: Descriptors judged in listening test 2 and 3. Fund Fundamental frequency /Hz
Loudness Loudness according to ISO532B /sone
Swedish English translation Sharpness Sharpness /acum [15]
Skarp Sharp Roughness Roughness /asper [16]
Rå Rough Tonality Tonalness [17]
Width Spectral width: The order number of the
Stor Large
highest partial with a SPL above 20 dB
Mjuk Soft
N'4.5 to N'22.5 Specific loudness per critical band [18]
Varm Warm (4.5 Bark-22.5 Bark) /sone/Bark
likt e, som t.ex. i ”smet”* [e]-like R'4.5 to R'22.5 Specific roughness per critical band [18]
likt å, som t.ex. i “båt”* [o]-like (4.5 Bark-22.5 Bark) /asper/Bark
* To give a hint of how the vowels should be pronounced, an example of a
word was given in the following way: like a, as for example in “car”. In the
English translation, the phonetic description of the vowel has been given
instead. 3 Results and discussion
2.3 Analysis 3.1 Multivariate Analysis
The analysis was made in three steps: Principal Principal Component Analysis (PCA) was done on
Component Analysis (PCA), Analysis of Variance listening test 1 to identify prominent perceptual
(ANOVA) and Linear Regression. dimensions. Scatter plots of the loadings are found in
PCA [13] was used to search for the perceptual Figure 2. The r2-values of each component are
dimensions in the data from listening test 1. The presented in table 4 together with interpretations of the
estimated qualities defined in Table 1 were used as components.
variables.
521
Forum Acusticum 2005 Budapest Nykänen, Johansson, Lundberg, Berg
,8
Four pronounced and describable perceptual dimension
[a]-like were found: 1. warm/soft. 2. back vowel similes ([u],
[o]-like
[u]-like [o], [a]). 3. sharp/keen/rough. 4. front vowel similes
[u]-like
[e]-like
([i], [y], [e], [œ], [æ]).
,6
tonal
[æ]-like
3.2 Analysis of Variance (ANOVA)
[œ]-like
,4
centred For listening test 1, the effect of two factors, musician
nasal (musician 0 or 1) and saxophone (saxophone 0 or 1) on
[y]-like
,2
bottom the use of the descriptors in Table 1 was investigated
warm
sharp/keen [i]-like overall imp by ANOVA. Variables not affected by musician,
full-toned saxophone or interaction between musician and
Component 2
soft
0,0 large
saxophone were considered to be of low importance in
the description of the differences in timbre found
rough
sharp among the analysed saxophone sounds. Adjectives and
-,2 vowel-similes with p-values below 0.05 were deemed
-,2 0,0 ,2 ,4 ,6 ,8 1,0 suitable for describing common differences between
Component 1 the sounds. This resulted in 7 adjectives and 2 vowel-
similes, totally 9 descriptors (Table 5).
1,0
Table 5: p-values for the ANOVA of data from
listening test 1. Only variables with p-values below
[i]-like
,8 [y]-like 0.05 are shown. p-values below 0.05 are in bold.
warm centred
[e]-like
full-toned sharp/keen Sharp <0.00005 0.05 0.0003
0,0 bottom [a]-like 0.89 0.31 0.04
nasal
soft [o]-like 0.61 0.15 0.03
[a]-like Overall imp 0.59 0.54 0.0001
tonal
-,2
-,6 -,4 -,2 -,0 ,2 ,4 ,6 ,8 1,0
Component 3 Descriptors representing each component from the
PCA were selected for further analysis. Component 1
Figure 2: Four Varimax rotated components found by was represented by soft and warm, component 2 by
PCA from listening test 1. Descriptors in bold did [o]-like, component 3 by sharp and rough and
separate the two musicians and/or the two saxophones. component 4 by [e]-like. Component 4 was considered
to represent front vowels, even though [e]-like and [æ]-
like were not salient on this component. None of the
front vowel similes showed significant variations
Table 4: r2-values of the four Varimax rotated between the two saxophones or the two musicians at
components found by PCA from listening tests based the 0.05 level. [e]-like was the front vowel simile with
on tone stimuli. the lowest p-value (0.12 for the factor musician).
Therefore, it was chosen as a representative for front
Dim r2X Salient Variables vowels, and hence also for component 4, even though
Positive loading Negative loading it was not salient in this component. The selected
1 19% full-toned, warm, bottom - descriptors were used as variables in listening tests 2
2 15% [a]-like, [o]-like, [u]-like - and 3. For listening test 2 the same factors as for
3 14% sharp/keen, rough, sharp soft, warm listening test 1, musician and saxophone, were
4 10% [i]-like, [y]-like - examined by ANOVA. The results are presented in
Sum 58% table 6.
522
Forum Acusticum 2005 Budapest Nykänen, Johansson, Lundberg, Berg
Table 6: p-values for the ANOVA of data from sound was excluded from the analysis the correlation
listening test 2. p-values below 0.05 are bold. (r2) for [o]-like1 was 0.98 and for [o]-like2 also 0.98.
The outlier was considerably sharper than the other
Variable Factor: Factor: Interaction sounds. It did not diverge from the other sounds with
Saxophone Musician musician-saxophone respect to [o]-likeness. This indicates that the models
Overall imp. 0.18 0.53 0.48 for [o]-like overestimate the importance of sharpness.
Sharp <0.00005 0.83 0.51
Rough 0.20 0.51 0.53 Table 8: The linear regression models suggested by
Soft <0.00005 0.39 0.17 stepwise estimation.
Warm <0.00005 0.36 0.43
Listening test: 1 2 3
[e]-like 0.09 0.06 0.89
r2adj r2 r2
[o]-like 0.001 0.15 0.57 Variables correlating with component 1
warm1=19.6-12.0 sharpness 0.76 0.77** 0.44
For listening test 3, the effect of one factor, saxophone warm2=19.3-11.1 sharpness-26.0 R'13.5 0.84 0.92** 0.40
with or without the plates surrounding the edges of the warm3= 0.96 0.83** 0.26
tone holes, was analysed by ANOVA. The results are 14.7-5.55 sharpness-54.7 R'13.5-1.79 N'21.5
presented in table 7. In listening test 3 all selected soft1=10.6-2.27 N'18.5 0.84 0.72** 0.27
variables could be used for discrimination between the Variable correlating with component 2
[o]-like1=18.4-9.3 sharpness-23.2 R'9.5 0.75 0.76** 0.71*
two saxophones. The modified saxophone was judged
Variables correlating with component 3
sharper, rougher, less soft, less warm, more [e]-like and
sharp1=5.94+1.74 N'14.5+1.72 N'18.5+41.2 R'4.5 0.99 0.59* 0.61
less [o]-like than the unmodified saxophone. No rough1=-10.9+11.2 sharpness+36.2 R'15.5 0.91 0.59* 0.81*
significant difference in overall impression was found. Variable correlating with component 4
[e]-like1=1.43+0.90 N'14.5+13.8 R'9.5 0.78 -0.07 0.00
Table 7: p-values for the ANOVA of data from [e]-like2= 0.92 -0.26 0.00
listening test 3. p-values below 0.05 are bold. 1.23+0.37 N'14.5+24.2 R'9.5+.12 N'21.5
Overall impression
Variable Factor: Saxophone overall imp.1=2.53-48.2 R'5.5-358 R'22.5 0.82 0.09 0.25
Overall imp. 0.26 overall imp.2= 0.91 0.32 0.15
2.07-28.5 R'5.5-466 R'22.5-28.3 R'14.5
Sharp 0.04
* Correlation is significant at the 0.05 level (2-tailed)
Rough 0.004 ** Correlation is significant at the 0.01 level (2-tailed)
Soft 0.0001
Warm 0.0002
Table 9: Linear regression models based on sums of
[e]-like 0.05
neighbouring critical bands.
[o]-like <0.00005 Listening test: 1 2 3
r2adj r2 r2
3.3 Linear regression Variables correlating with component 1
20.5 0.78 0.88** 0.46
warm 4 = 19.1 − 11.2 sharpness - 1.53 ∑ R ′i
i =9.5
A stepwise estimation method [13] was used to find the
warm5 = 0.81 0.81** 0.29
linear regression models best explaining the connection 20.5 22.5
between the perceptual descriptors and acoustic 15.8 − 7.09 sharpness - 3.25 ∑ R ′i + 0.569 ∑ N′i
i =9.5 i =18.5
indices. The results are found in Table 8. To avoid
soft2=20.3-12.5*sharpness 0.68 0.81** 0.55
models relying on information in single critical bands,
Variable correlating with component 2
models based on sums of correlating neighbouring [o]-like2=18.4-9.45 sharpness-11.1(R'9.5+R'10.5) 0.74 0.77** 0.67*
critical bands were proposed, see Table 9. Variables correlating with component 3
For listening test 3, an outlier in the judgements of sharp2=-19.9+19.3 sharpness 0.74 0.86** 0.69*
warm, soft and sharp was identified. When this sound sharp3=-12.4+9.53 sharpness+1.93 N'14.5 0.91 0.69* 0.56
was excluded from the analysis the correlations (r2)
18.5 0.92 0.92** 0.26
sharp 4 = −5.85 + 2.62 shaprness + 0.344 ∑ N′i
were 0.88 for warm1, 0.92 for warm2, 1.00 for warm3, i =13.5
rough2=-9.83+10.5 sharpness+2.44 roughness 0.80 0.77** 0.81*
0.86 for warm4, 0.85 for warm5, 0.72 for soft1, 0.92 for
* Correlation is significant at the 0.05 level (2-tailed)
soft2 and 0.92 for sharp4. The outlier has the highest ** Correlation is significant at the 0.01 level (2-tailed)
specific loudness in the region between 4 and 9 Bark. It In listening tests 1 and 2 there were no significant
was perceived to be the warmest, softest and the least differences in the judgements of [e]-like between
sharp sound. The specific loudness between 4 and 9 different sounds. Therefore a model built on one of
Bark might be important for the perception of warm, these tests can not be expected to show good precision
soft and sharp, and this is not considered in the models. in predictions on other sets of sounds. In listening test
Another sound was identified as an outlier in the 3 there were significant differences in the judgements
predictions of [o]-like in listening test 3. When this of [e]-like between different sounds. Stepwise
523
Forum Acusticum 2005 Budapest Nykänen, Johansson, Lundberg, Berg
estimation of [e]-like based on this test resulted in the Doctoral thesis 2002:17, Luleå University of
model: [e]-like=-2.00+2.58 N'18.5, with r2adj=0.91. This Technology, Sweden, ISSN: 1402-1544 (2002)
model remains to be validated on a new set of sounds.
[5] M.C. Gridley, ‘Trends in description of saxophone
timbre’, Perceptual and Motor Skills, Vol. 65. pp.
303-311. (1987)
4 Conclusions
[6] R. A. Kendall, E.C Carterette, ‘Verbal Attributes
The sequence of interviews and tests reported in this of Simultaneous Wind Instrument Timbres. 1.
paper forms a systematic approach for the VonBismarck Adjectives’, Music Perception, Vol
identification of salient perceptual dimensions. In 10. pp. 445-468 (1993)
addition, the approach enables development of models [7] V. Rioux, ‘Methods for an Objective and
for prediction of how sounds are perceived based on Subjective Description of Starting Transients of
psychoacoustic measurements. All the selected some Flue Organ Pipes - Integrating the View of
descriptors; sharp, rough, soft, warm, [e]-like and [o]- an Organ-builder’, Acustica-Acta Acustica, Vol
like, were useful for describing differences in 86. pp. 634-641 (2000)
saxophone timbre and could be modelled by
psychoacoustic quantities. Four perceptual dimensions [8] J. Stephanek, Z. Otcenasek, ‘Psychoacoustic
were formed: 1. warm/soft, 2. back vowel similes, 3. aspects of violin sound quality and its spectral
sharp/rough, 4. front vowel similes. The relations’, Proc. 17th International Congress on
psychoacoustic indices sharpness and roughness are Acoustics, Italy-Rome (2001)
among the most prominent qualities of saxophone [9] B.S. Rosner, J.B. Pickering, ‘Vowel Perception
sound, but it is also possible to identify critical bands and Production’, Oxford University Press, Oxford,
where specific loudness and roughness are of particular ISBN 0-19-852138-3 (1994)
importance. Of special interest are the prediction
models for “sharp” and “rough”. The best model for [10] A. Nykänen, Ö. Johansson, ‘Development of a
“sharp” uses the psychoacoustic quantity sharpness, Language for Specifying Saxophone Timbre’,
but it emphasises the region between 13 and 19 Bark Proc. Stockholm Music Acoustic Conference 2003,
(2000-5300 Hz), where the hearing is most sensitive. Stockholm, Vol. 2. pp. 647-650 (2003)
This suggests that psychoacoustic sharpness [11] J. Blauert, K. Genuit, ‘Evaluating sound
underestimates the importance of this frequency environments with binaural technology – some
region. The best prediction model for “rough” is a mix basic considerations’, J. Acoust. Soc. Jpn. (E), Vol.
of roughness and sharpness. This suggests that the 14. No. 3. pp. 139-145 (1993)
Swedish word for rough, “rå”, is interpreted as a mix
between psychoacoustic roughness and sharpness. [12] A. Gabrielsson, H. Sjögren, ‘Perceived sound
quality of sound-reproducing systems’, J. Acoust.
When the task was to describe sounds, the use of
Soc. Am., Vol 65. pp. 1019-1033 (1979)
untrained listeners in the tests gave similar results as
the use of saxophone players. When the task was to [13] J.F. Hair, R.E. Anderson, R.L. Tatham, W.C.
judge preference the untrained listeners failed to reach Black, ‘Multivariate Data Analysis’, 5ed. Prentice
consensus, while saxophone players succeeded. Hall International, London, ISBN 0-13-930587-4
(1998)
[14] E. Zwicker, ‘Subdivision of the audible frequency
References range into critical bands (Frequenz-gruppen)’, J.
Acoust. Soc. Am., Vol. 33. p. 248 (1961)
[1] G.Pahl, W. Beitz,,‘Engineering Design: A
Systematic Approach’, Springer-Verlag, London, [15] G. Von Bismarck, ‘Sharpness as an attribute of
ISBN 3-540-19917-9 (1996) timbre of steady sounds’, Acustica, Vol. 30. pp.
159-192 (1974)
[2] J. Blauert, U. Jekosch, ‘Sound-Quality Evaluation
– A Multi-Layered Problem’, Acustica-Acta [16] W. Aures, ‘Ein Berechnungsverfahren der
Acustica, Vol. 83. pp. 747-753 (1997) Rauhigkeit’, Acustica, Vol. 58. pp. 268-280 (1985)
[3] R. Guski, ‘Psychological Methods for Evaluating [17] W. Aures, ‘Berechnungsverfahren für den
Sound Quality and Assessing Acoustic sensorischen wohlklang beliebiger schallsignale’,
Information’, Acustica-Acta Acustica, Vol. 83. pp. Acustica, Vol. 59. pp. 130-141 (1985)
765-774 (1997) [18] E. Zwicker, H. Fastl, ‘Psychoacoustics – Facts and
[4] J. Berg, ‘Systematic Evaluation of Perceived Models’, 2ed. Springer Verlag, Berlin, ISBN 3-
Spatial Quality in Surround Sound Systems’, 540-65063-6 (1999)
524