Sato - Acoustic and Visual Stimuli

ACTA ACUSTICA UNITED WITH Vol.
98 (2012) 749 759
ACUSTICA
DOI 10.3813/AAA.918556
Effects of Acoustic and Visual Stimuli on Subjective Preferences for Different Seating Positions in an Italian Style Theater
Shin-ichi Sato1) , Shuo Wang1) , Yuezhe Zhao1) , Shuoxian Wu1) , Haitao Sun1) , Nicola Prodi2) , Chiara Visentin2) , Roberto Pompoli2)
State Key Laboratory of Subtropical Building Science, South China University of Technology, Guangzhou 510640, China. ssato@untref.edu.ar Dipartimento di Ingegneria, Universit degli Studi di Ferrara, via Saragat 1, 44100 Ferrara, Italy
1)
2)
Summary This study investigates the eects of acoustic and visual stimuli on subjective preferences for dierent seating positions in an Italian style theater. The sound and visual elds of ten positions in the theater were simulated in two laboratories, one equipped with a large-wide (7 m3 m) screen with headphones for the sound source, and the other equipped with a 42" LCD monitor with 6 pairs of stereo loudspeakers. The sound signal was an anechoic soprano vocal music piece accompanied by a keyboard, convolved with binaural impulse responses at the ten seating positions. The visual stimuli were based on a CAD (computer-aided design) model of the theater, which included images of the singer, stage sets, and an audience. Subjective judgments by paired comparison tests were conducted in the two laboratories with dierent participant groups. By comparing the results of experimental trials using 1) visual stimuli only, 2) acoustic stimuli only, and 3) both acoustic and visual stimuli, the eects of the acoustic and visual stimuli on seat preference could be examined. The results of the experiments with the two dierent stimulus presentation systems were similar despite having dierent participant groups. A subjective scale analysis showed a signicant correlation with the sound level of the soprano signal as well as the clarity C80 for the stage (vocal) source.
PACS no. 43.55.-n, 43.55.Gx, 43.55.Hy
1. Introduction
To realize excellent sound elds in concert halls and theaters, it is important to identify the acoustical parameters that inuence subjective evaluations of the visual and auditory experiences of such places. The current research, which attempts to nd the optimum acoustic conditions for concert halls and theaters, can be divided into two approaches. One is to evaluate live performances using questionnaires. Beranek investigated the subjective attributes of the musical-acoustical qualities of concert halls using such a method [1]. The respondents evaluated the listening conditions in the audience areas of fully occupied halls at regular concerts with a full symphony orchestra performing. Barron also conducted a subjective study of live concerts in British concert halls using a questionnaire [2]. The respondents evaluated several subjective attributes in addition to their overall impressions of the concert hall sound
elds. Similarly, Hidaka and Beranek, and Farina collected subjective evaluations of opera houses using questionnaires [3, 4]. Their questionnaires were delivered to respondents, who were asked to evaluate the acoustics of the opera houses they had experienced and knew well. The advantage of conducting evaluations during live performances is that the respondents can actively evaluate the experience as it happens during the performance. However, this method also has some drawbacks. First, it is not easy to compare the sound quality of dierent halls because of the short-term nature of the acoustic memories of listeners; second, usually a certain amount of time passes between one experience and another. Finally, the music performances to be compared are often not the same. Therefore, in the abovementioned studies, experts such as music conductors, music critics, and acoustical consultants were chosen as the questionnaire respondents. The second approach is the recording of subjective judgments made in laboratories using auralized or synthesized sound elds. Schroeder et al. investigated the subjective preference evaluations of European concert halls [5]. Anechoic recordings of a symphony orchestra were played from a loudspeaker on the stage and recorded by a dummy head placed in each seat. These recordings were played
Received 11 April 2011, accepted 31 July 2012.
now at Ingeniera de Sonido, Universidad Nacional de Tres de Febrero, Varentn Gmez 4752, Caseros, Buenos Aires, Argentina
S. Hirzel Verlag EAA
749
ACTA ACUSTICA UNITED WITH Vol. 98 (2012)
ACUSTICA
Sato et al.: Effects of stimuli on preferences
back in a laboratory so that the recorded signals could be reproduced directly into the participants ears. Kimura and Sekiguchi conducted such music hearing tests using sound reproductions of recordings made using a dummy head in actual auditoriums [6]; their recordings were made in a manner similar to those used in the study by Schroeder et al. The test signals were presented through headphones to participants in a laboratory, and several subjective attributes in addition to an overall evaluation were evaluated. Ando made systematic investigations to identify orthogonal factors in subjective preferences [7]. Several anechoic music signals were used to simulate sound elds, which consisted of a few early (lateral) reections and reverberations from several loudspeakers, and subjective preferences were evaluated using paired comparison tests. One of the advantages of such subjective evaluations made in laboratory settings is that the same music signals can be presented to a large number of subjects. Further, this method makes it possible the instantaneous comparison of the acoustic qualities of dierent halls. Finally, acoustical parameters can be strictly controlled if synthesized sound elds are used. However, one of the problems with this method is that the signal presented to the listeners is a short segment of music and thus it is not comparable to a live orchestra performance. This is a compromise to be accepted in order to shorten the duration of the subjective tests. In addition, in most cases not all listeners are experts and the ability to make a judgment also depends on ones experience with performance spaces in general, and specically with the hall under investigation. Furthermore, even for experienced listeners, it may result subtle to compare the sound elds of dierent theaters because each one has peculiar frequency characteristics as well as a unique mix of values of the acoustical parameters. Finally the behavior of the hall with respect to dierent music programs is also a factor, since the preferred conditions could vary with the program. In order to address the above concerns an improvement of the laboratory test should be capable of keeping the procedure suciently easy so as not to confuse the less experienced listeners on the one hand, and to be still eective on the other hand. The results of the subjective tests are assessed in terms of consistency and agreement of the responses to demonstrate the accuracy of the experiments. In most studies of auditory tests in laboratories, participants are asked to concentrate on the presented sound stimuli in dark environments without visual cues. However, actual concert hall performances always include both visual and acoustic stimuli. Therefore, not only does acoustic information aect subjective preferences in concert halls and theaters, but also does visual information such as the view from the seating area, lighting, performers, and stage set. As Barron points out [2], the elimination of visual cues may not be desirable because listeners make some judgments relative to their position in the hall. Therefore, there is a need to investigate the eects of both acoustic and visual stimuli on the subjective evaluations of performance spaces.
Larsson et al. investigated the inuence of visual simulation delity using sounds only, sounds with still pictures, a virtual room with sounds, and sounds replayed over headphones in the room [8]. They showed that the ratings of acoustical attributes (auditory source width, perceived room size, and distance to the sound source) were greatly aected by the presentation method of the stimuli. Cabrera et al. investigated auditory and visual spatial impressions of two auditoria [9]. In their visual experiment, the subjects were asked to judge the spaciousness, envelopment, stage dominance, intimacy and target distance of grayscale projected photographs taken from various positions in the auditoria. In their auditory experiment, subjects were asked to judge the apparent source width, envelopment, intimacy and performer distance of auralized orchestral recordings of the same positions of the auditoria. Although they did not conduct a multi-modal experiment, interactions were shown by the dierent audio and visual distances, as well as the envelopment and intimacy responses. Jeon et al. investigated the eects of acoustic and visual stimuli on seat preferences in an opera house [10]. Subjective assessments for visual- only, acoustic-only, and acoustic-visual preferences for several seat positions in the opera house were made through paired comparison tests and subjective ratings. The visual stimuli consisted of still photographs, and the auditory stimuli were male vocal signals convolved with binaural impulse responses at each position. The authors showed that visual preference is predicted by the distance from the stage, the size of the photographed stage view, the open angle value of the seats in relation to the longitudinal axis, and visual obstructions. Acoustic preference can be mainly predicted by the sound pressure level. In cross-modal experiments, both acoustic and visual preferences were found to contribute to the overall impressions of seat quality in the opera house, but acoustic cues were more inuential than visual cues. Valente and Braasch investigated the inuence of visual cues on the judgment of spatial impressions in rooms [11]. Videos of ve solo music and speech performers in addition to the auralized acoustic signals of a general-purpose hall were used as the test stimuli. The participants were asked to adjust the direct-to-reverberant energy ratio (D/R ratio) of the binaural impulse responses according to their expectations considering the visual cues. They were also asked to rate the apparent source width (ASW) and listener envelopment (LEV). Results showed that the participants overestimated the D/R ratio for both with and without the visual stimuli conditions. When there was a visual target, the participants estimated much greater direct sound energy levels than that of the actual acoustical condition. Also the participants scaled the scene with a greater ASW or LEV when presented with a receiver position further away from the source, which revealed more of the rooms volume. The previous research results investigating the interaction between acoustic and visual stimuli in theaters are still limited in their applicability and generalizability. Therefore, there is a need to further investigate the eects of
750
ACTA ACUSTICA UNITED WITH ACUSTICA
Vol. 98 (2012)
acoustic and visual information on the subjective evaluations of concert hall and theater experiences. The current study investigates how both acoustic and visual stimuli affect the subjective evaluation of seat preferences in a typical Italian style theater due to the various possible acoustic qualities and views depending on the listening position. The acoustics of modern opera theaters are designed trying to realize a good acoustic environment throughout the audience area (both the main oor and the balcony seats) as well as the space for performers, while Italian style opera theaters, on the other hand, have quite dierent acoustic and visual environments between the stalls and the boxes. The reason for that is to be traced back to their design criteria which, despite including the acoustical concepts known at the time of construction, seldom were inuenced by socio-economical motivations. Moreover, the actual acoustic parameters of theaters representing each type have been measured and summarized (for example, [12, 13]). For the current study, laboratory tests under controlled conditions were used and they were implemented using paired comparison tests. As done in past studies only one music fragment was used for convolutions, which was representative of the operatic music style. The acoustic and visual elds of ten positions in the theater were reproduced in two laboratories with dierent acoustic and visual stimuli presentation systems. The experiments used presentations of 1) visual stimuli only, 2) acoustic stimuli only, and 3) both acoustic and visual stimuli, and the scale values of ten positions for each test were constructed. In particular Experiment 1 was conducted at the State Key Laboratory of Subtropical Building Science, South China University of Technology (China), and included all three modes whereas Experiment 2 was conducted at the Department of Engineering, University of Ferrara (Italy) and only modes 2) and 3) were used. Through a comparison of the results of three tests, the eects of the acoustic and visual stimuli on seat preferences could be investigated and discussed.
Figure 1. Listening positions chosen for the subjective tests (Positions 15 are on the stalls, and Positions 610 are in the boxes).
0
Relative amplitude [dB]
-20 -40 -60 -80 125
250
500
1000
2000
4000
8000
16000
1/3 octave band centre frequency [Hz]
Figure 2. 1/3 octave band spectrum of the sound signals. : Soprano; : Keyboard.
2. Procedure
2.1. Acoustical stimuli The opera theater used for the experiments was the Teatro Comunale in Ferrara, Italy (Figure 1). The theater has a truncated elliptical plan and consists of 800 seats (two thirds of which comprise ve tiers of boxes) with a hall of 5000 m2 and a stagehouse of 8500 m3 . This theater was used for the validation of the guidelines for the acoustical measurements [13] as well as investigations of the subjective evaluations of the vocal and instrumental sounds [14, 15, 16]. Figure 1 shows the ten positions (ve on the stalls and ve in the boxes) chosen for the subjective evaluations. Part of the anechoically recorded music piece Romanza Tormento by P. Tosti was used as the source signal. This music signal was the same music signal used in [14, 15, 16]. The soprano vocal and piano keyboard accompaniment were channeled separately. The duration of
the original signal was 16 s. The rst 8 s of the original signal were used for the subjective tests to shorten the total duration of the experiment by paired-comparison. The rst 8 s have a more similar spectrum to the total signal than the latter 8 s. Figure 2 shows the one-third octave band spectrum of the soprano and keyboard accompaniment signals. To conduct the subjective tests, the sound elds of the ten seating positions were reproduced by convolving the above anechoic music signals with the binaural impulse responses recorded at each position. The vocal signal was convolved with the binaural impulse responses measured with the source on the stage, and the keyboard signal was convolved with the binaural impulse responses measured with the source in the orchestra pit. Then, these two signals were mixed. The impulse responses were recorded in the hall using an articial head and torso (Brel & Kjr type 4100), a loudspeaker (LEM SPL), and the sweep sine signal under unoccupied conditions. During the impulse response measurements, the stage did not contain any scenery. Curtains were not lowered at the back of the
751
ACUSTICA
Table I. Acoustic and geometrical parameters at the ten seating positions. In Experiment 1 Leq and Lmax were elevated by 5 dB from the values in the table. 1 Leq (total) [dBA] Leq (soprano) [dBA] Leq (keyboard) [dBA] Leq (soprano-keyboard) [dBA] Lmax (total) [dBA] Lmax (soprano) [dBA] Lmax (keyboard) [dBA] Lmax (soprano-keyboard) [dBA] IACC (total) IACC (soprano) IACC (keyboard) IACC (soprano/keyboard) C80 (stage) [dB] C80 (pit) [dB] C80 (stage-pit) [dB] EDT (stage) [s] EDT (pit) [s] EDT (stage/pit) [s] D [m] [deg] [deg] 74.8 74.1 66.4 7.8 84.7 82.8 80.3 2.5 0.63 0.71 0.33 2.15 6.2 -0.9 7.1 1.16 1.31 0.89 5.80 3 -20 2 73.2 72.4 65.4 7.0 82.6 81.0 77.2 3.8 0.3 0.39 0.39 1.00 4.6 -3.2 7.7 1.09 1.21 0.90 11.4 1 -9 3 74.5 73.4 68.0 5.4 84.8 82.7 80.6 2.1 0.2 0.2 0.44 0.45 5.8 2.7 3.1 0.89 0.93 0.96 14.5 1 -7 4 75.1 74.3 67.5 6.8 85.0 83.6 79.2 4.4 0.5 0.57 0.28 2.04 5.0 -0.7 5.7 1.20 1.26 0.95 6.3 32 -20 5 73.8 72.8 67.1 5.7 84.1 82.3 79.4 2.9 0.18 0.19 0.33 0.58 5.2 0.4 4.8 0.88 1.03 0.85 15 13 -7 6 71.6 70.3 65.7 4.7 82.0 79.5 78.4 1.1 0.3 0.37 0.27 1.37 4.3 1.8 2.4 1.13 1.28 0.88 11 45 -1 7 71.1 70.4 63.0 7.4 80.8 79.6 74.6 4.9 0.25 0.23 0.55 0.42 7.2 2.8 4.5 0.97 1.16 0.84 17.5 17 -1 8 68.9 67.4 63.6 3.8 78.9 75.8 76.0 -0.2 0.2 0.22 0.17 1.29 1.1 0.7 0.4 1.14 1.22 0.93 11.8 15 30 9 71.1 69.8 65.3 4.6 81.6 79.0 78.1 0.9 0.32 0.29 0.58 0.50 5.7 5.4 0.3 1.13 0.93 1.22 18 17 15 10 72.5 69.6 69.4 0.2 83.5 77.9 82.2 -4.3 0.28 0.16 0.42 0.38 2.9 3.5 -0.5 1.00 1.01 0.99 17.5 27 35
stage. There were no musical instruments or chairs in the orchestra pit. The loudspeaker reproducing the vocal signal was located on the stage just under the proscenium, and the other loudspeaker reproducing the keyboard signal was located in the orchestra pit, 0.7 m in front of the rear wall (under the stage overhang). In the previous subjective preference tests conducted inside the same theater, these source positions on the stage and in the pit were found to be preferred to the other positions [14]. The heights of the loudspeakers on the stage and in the orchestra pit were 1.5 and 1.2 m above the oor level, respectively. The unoccupied RT (T20) is 1.38 s while the estimated occupied RT according to Hidaka et al. [17] is 1.22 s. Even though the signals under occupied conditions were not available, the relationship between the variation of the acoustical parameters according to the dierent seat positions and seat preference can help to clarify the critical variable for seat preference. The A-weighted sound pressure level Leq of the soprano signal at Position 1 was set at the value 79 dBA in Experiment 1 and 74 dBA in Experiment 2. These values are fully compatible with realistic sound level for a singer on stage assuming a sound power level for the soprano singing forte according to Meyer [18] In both cases the sound pressure level dierence between the seating positions was maintained from the impulse response measurements. The acoustical parameters calculated from the binaural impulse responses (early decay time EDT, reverberation time T20 , and clarity C80 ) and the convolved signals of the anechoic music signals and the binaural impulse responses (the equivalent sound pressure level Leq and the maximum sound pressure level Lmax and interaural crosscorrelation IACC) are shown in Table I. The eect of mul-
tiple sources on the subjective judgments was also investigated. The balance parameters were calculated by subtracting the keyboard (or the pit) value from the soprano (or the stage) value for Leq , Lmax , and C80 , and by taking the ratio of the soprano (or the stage) value to the keyboard (or the pit) value for IACC and EDT, as Robinson et al. did [19]. The parameters calculated from the binaural impulse responses were the average for the left and right signals and for two octave frequency bands (500 and 1000 Hz). The Leq and Lmax showed a signicant correlation (r = 0.97, 0.99, 0.96, and 0.95 for the total (soprano + keyboard values), soprano, keyboard, and balance (soprano keyboard values), p < 0.01). The variable ranges of the parameters are 6.5 dBA, 0.43 s, 7.4 dB, and 0.4 for Leq , EDT, C80 and IACC, respectively. These values are larger than the difference limens [20, 21, 22]. It could be possible to get a greater range of the acoustical parameters if the acoustics of the dierent theaters were compared, however the comparison of the acoustics and vision of the dierent theaters is not the purpose of this study. The geometrical parameters, i.e., the source-receiver distance D, the horizontal angle from the longitudinal center axis , and the vertical angle are also listed in Table I. In Experiment 1, the headphones were used to present the sound stimuli to the participants. Inverse lters were applied to the anechoic music signal convolved with the binaural impulse response for each position to cancel the sound eld between the ears and the headphones. Because the testing room was usually used for 3D virtual reality presentation systems with stereo images, the room was equipped with two projectors, which produce a high level of machine noise. The noise level of the two projectors at
752
Vol. 98 (2012)
Position 1
Position 6
Position 2
Position 7
Position 3
Position 8
Position 4
Position 9
Position 5 Figure 3. Visual stimuli of the 10 positions.
Position 10
the listeners position was 49.8 dBA. Therefore, noise cancelling headphones (Sennheiser PXC450) were used for the listening tests; use of the noise cancelling headphones resulted in noise levels being reduced to 37.0 dBA. For this reason it was decided to set the reference level in Position 1 to 79 dBA so that a dynamic range of 42 dBA was available. In Experiment 2, a transaural-dipole system consisting of six pairs of stereo loudspeakers was used. Two pairs of loudspeakers were placed frontally at the height of the listener ears with a span of 35 and 90 respectively (the former pair was close to the monitor sides), while the other four pairs were arranged on the oor and ceiling, at the vertexes of a cube having an angular span of 90 degrees with respect to the listener. All six pairs are fed with the same binaural signals, but because each pair is processed for cross-talk cancellation independently from the other pairs, a specic set of inverse lters is developed for each pair depending on their location. This allows the acoustical parameters measured in the real room to be reproduced closely at the listener position. A similar six pairs of stereo
loudspeaker system was used in the previous intelligibility study [23] to auralize the acoustic environment of classrooms and the enhancement of the spatialization was conrmed. In [23], two pairs are placed respectively in the front and at the back of the listener while in this study loudspeakers at the height of listeners ears were placed only in front because locating them at the back provided strong reections from the LCD monitor to the listener. The inverse lters were made from the impulse responses recorded using an articial head and torso (Brel & Kjr type 4100). Thus, non-personalized HRTF was used for both system. The accuracy of the acoustical parameters reproduction in both systems was conrmed in terms of the dierence between the actual and the virtual (reproduced) sound elds. All the parameters tested (T20 , EDT, C50 , C80 , ts , and IACC) showed these dierences to be within the dierence limens determined in [20, 21, 22]. 2.2. Visual stimuli The 3D computer model of the theater was created using CAD (computer-aided design) software (3DMAX) and
753
ACUSTICA
was rendered using a real time rendering system which is based on OPENGL. As shown in Figure 3, the views from the abovementioned ten seating positions were extracted as colored images in JPEG format. These static images became the visual stimuli used for the subjective evaluations. These images included images of the singer on stage, stage sets, and an audience with appropriate lighting. For simplicity, the orchestra pit and the keyboard image were not illustrated in the visual stimuli, and the stage set was reproduced according to pictures taken during a dierent opera performance, despite the scene not exactly matching the music. Although the acoustical stimuli were measured under unoccupied conditions, a sparse audience was added to the visual stimuli. In Experiment 1, the abovementioned images (with 33311080 pixels) were projected onto a large, wide screen (6.9 m width and 2.9 m height) which covered a peripheral view of the subject and gave a quite enveloping sense of immersion. The distance between the center of the screen and the listeners head was 2.2 m. In Experiment 2, the same images (but with resolution 19201080 pixels) were shown on a 42" LCD monitor in front of the subject. The distance between the center of the screen and the listeners head was 2 m. Pictures of the two systems are shown in Figure 4. The eld of view of the visual stimuli was xed. Thus, a closer seat position in the theater gives fewer visual objects. In summary in Experiment 1 the visual presentation performance of the system was optimized whereas some compromise had to be accepted as regards the audio rendering. On the contrary in Experiment 2 the audio was more accurate but a simpler visual presentation was implemented. By doing so also the robustness of the preference as regards the accuracy of presentation was investigated. 2.3. Subjective tests As described in the Introduction section, the ability to make a preference judgment depends on ones experience. Therefore, the participants of Experiment 1 who were not familiar with the theater and the participants of Experiment 2 who were from the city of the theater and also have attended performances in the theatre were compared. In Experiment 1, responses from 21 participants (17 males and 4 females, aged 2340 years, no auditory decit reported) were collected. Because the lyrics of the music were written in Italian, and most of the subjective test participants were not Italian speakers, the background and the meaning of the lyrics were explained to the participants prior to the tests. Paired comparison tests were conducted to obtain subjective responses from the participants. To prevent the results of the combined acoustic and visual stimuli test from aecting those of the acoustic stimuli only and the visual stimuli only tests, the three test conditions were conducted in the following order: 1) visual stimuli only, 2) acoustic stimuli only, and 3) acoustic and visual stimuli. Three tests were performed consecutively. Prior to the main test
Figure 4. Laboratory test settings. Top: Experiment 1; bottom: Experiment 2.
for each music signal, all 10 position views with their accompanying musical sounds were presented to the participants to familiarize them with the stimuli; trial sessions with three pairs of the visual stimuli only condition were conducted. If the ten stimuli are simultaneously tested in a single session of paired comparison tests, there are 45 stimulus combination pairs (N (N 1)/2, where N = 10). To shorten the total duration for the experiment, each test session was divided into two sub-sessions, i.e., the stall group (positions 1, 2, 3, 4, 5, and one box position 9) and the box group (6, 7, 8, 9, 10, and one stall position 2). Fifteen stimulus combination pairs (N (N 1)/2, where N = 6) were arranged in random order for each sub-session. The duration of the stimulus presentation during the visual stimuli only condition was 6 s, and the duration of the stimulus presentation during the acoustic stimuli only condition and the visual and acoustic combined condition were equal to the duration of the music signal (= 9.0 s). A silent interval of 0.6 s was placed between each stimulus in a pair, and a solid black image was projected onto the screen. After presenting two dierent stimuli, each participant was asked to judge which of the two (former or latter) in the pair was preferable. Since the previous studies showed that paired comparison tests with no ties can give poor results if the population can be separated in two groups of par-
754
Vol. 98 (2012)
ticipant with opposite preference for a given pair, or stimuli are perceived as rather similar by participants [24, 25], the item no dierence (or dicult to judge) was added to the possible responses to increase the accuracy of the paired comparison test. The data collection was accomplished with paper response sheets where the testers had to check their choices. Each pair of stimuli was separated by an interval of 5 s. In the acoustic stimuli only condition, the message Pair x is being presented (x = 1, 2, . . . , 30) (in Chinese) appeared in the center of the screen during the presentation of the pair. All the text instruction was in white letters on a black background. Three participants participated in the test at the same time. The total duration of the tests was 4045 min, including the instruction and exercise session. The audio and visual stimuli were presented in slideshow style, created using HSP software [26]. In Experiment 2, responses from 61 participants (38 males and 23 females, aged 2180 years, no specic hearing decit reported) were collected. As outlined above in this case the visual stimuli only condition was not tested. In the second sub-session for the box group in the acoustic stimuli only and the acoustic and the visual stimuli tests, one stall position (Position 2) was not included; thus, 15 and 10 stimuli combination pairs were presented to the participants for the stall and the box groups, respectively. During a silent interval between each stimulus in a pair, a solid gray image was shown in the LCD monitor. The data collection was accomplished by a wireless system based on touchscreen mobile terminals. After presenting two different stimuli, each participant was asked to make a judgment from the alternatives appearing on the touchscreen in his/her hands. As similar to Experiment 1, more than two alternatives including ties were prepared to increase the accuracy of the paired comparison test. The ve alternatives were as follows: 1) A (the former) is much more preferred to B (the latter), 2) A is preferred to B, 3) A and B are equally preferred, 4) B is preferred to A, and 5) B is much more preferred to A (in Italian). After the choice was done, the presentation of the next pair automatically started. Only one tester participated in the experiment at a time, and the total duration of the tests was about 30 min, including the initial instruction and practice sessions.
2 1
Scale value
Visual only
Acoustic only
Visual and acoustic
0 -1 -2 -3 1 2 3 4 5 6 7 Seat position 8 9 10
2 1
Scale value
Acoustic only
Visual and acoustic
0 -1 -2 -3 1 2 3 4 5 6 7 Seat position 8 9 10
Figure 5. Average preference scale values for the three test conditions. Top: Experiment 1; bottom: Experiment 2.
3. Results
The answers provided by each participant were converted into numbers (possible values: 1, 0, 1 for Experiment 1, and 2, 1, 0, +1, +2 for Experiment 2). The number of circular triads in which a participant preferred stimulus A to B, B to C, and C to A was counted to test the consistency of responses for each participant. If PAB , PAC , and PBC were the answers when the presentations of pairs were made (A-B, A-C, and B-C), then there are two cases in which a circular mistake can be said to have occurred [25], PAB 0 and PBC 0 and PAC 0, (1) PAB 0 and PBC 0 and PAC 0.
Next, the rate of circular errors, that is, the ratio of the number of circular triads to the number of possible triads, was calculated for each participant. The average circular error rate for the visual stimuli only, the acoustic stimuli only, and the acoustic and visual stimuli conditions for Experiment 1 were 0.084, 0.107, and 0.062, respectively. The circular error rate for the acoustic stimuli only and acoustic and visual stimuli conditions for Experiment 2 were 0.252, and 0.181, respectively. The responses of the participants who exhibited a circular error rate of less than 0.2 were used to calculate the subjective scale for each test. All 21 participants of Experiment 1 and 43 of the 61 participants of Experiment 2 were used for the analysis. First the scale values of the preferences for each test were obtained according to [27, 28] for the responses of Experiment 1 while the average score was calculated for the responses of Experiment 2. Then, the two sets of scale values (tests for the stall and box positions) for each test were reconstructed so that the scale value dierence between Positions 2 and 9 was the average for the tests of the stall and box positions. The scale values of the dierent test conditions are shown in Figure 5. Despite having dierent participant groups (one is familiar with the theater and the other is not) and dierent test systems, the results of the two experiments were generally similar. A mixed collection of participants could be expected to produce a mixed result, that is the scale values for dierent seating positions should show only slight dierences. However, the actual dierence of the maximum and the minimum scale values showed that the participants could distinguish the acoustics and visual condition of each seat with a certain consistency even if they belonged to dierent groups. The correlation coecients between Experiments 1 and 2 for the acoustic stimuli only and the acoustic and visual stimuli
755
ACUSTICA
conditions were 0.96 (p < 0.01) and 0.98 (p < 0.01), respectively. In each test, the scale value for Position 1 is the highest, while those for Positions 8 and 10 are the worst. The stalls positions were generally preferred to the box positions in terms of both the acoustic and visual conditions. In the boxes, participants were more sensitive to the vertical dierence than the front-rear one. As shown in Figure 5a, Position 8 shows a larger dierence of the scale values among the dierent tests than other positions. The scale value of the visual stimuli only condition is not worse as the acoustic stimuli only and the acoustic and visual stimuli conditions. Position 8 is visually preferred because it is close to the singer while is not acoustically preferred because the position is outside of the directivity of the loudspeaker simulating the singer. In Experiment 1, the correlation coecients between the scale values for the visual and acoustic combined stimuli condition and those for the visual stimuli only and acoustical stimuli only conditions were 0.91 (p < 0.01) and 0.96 (p < 0.01), respectively. The scale value of the acoustic stimuli only condition was more similar to the visual and acoustic combined stimuli condition. The range of scale values for the visual stimuli only condition was the lowest, while the scale value range for the visual and acoustic combined stimuli condition was the greatest. By combining the acoustic and visual information, the participants judgments became easier and more stable than those of the visual stimuli only and acoustic stimuli only conditions in terms of the circular error rate and the range of scale values. Table II shows the correlation coecients between the objective parameters and the scale values of the three conditions for Experiment 1 and the two conditions for Experiment 2. Since there is evidence showing that both distance and direction may be perceived through audition alone [29], distance and angle parameters were included in the correlation analyses for acoustic stimuli only preference. The objective parameters, which showed a signicant correlation with the subjective scales in dierent experiments, were similar. The sound levels Leq , Lmax , and C80 showed a signicant correlation with the subjective scales. Because the scale values of the three conditions in Experiment 1 were highly correlated, even the subjective scale for the visual stimuli only test showed a signicant correlation with the acoustic parameters. The vertical angle showed the highest correlation with the subjective scale in every test. This is because the vertical angle has a high correlation with the values of Leq , Lmax , and C80 . Then, regression analysis was conducted to determine the relationship between the subjective scales and the acoustical and visual parameters. All possible combinations of two parameters, except for the pairs which showed a signicant correlation coecient, were examined. The combination of Leq (total) and Leq (balance) produced the greatest signicant correlation coecient between the observed and calculated values. sv a1 Leq (total) + a2 Leq (balance). (2)
Regarding the acoustic stimuli only condition, the partial regression coecients for variables a1 and a2 in equation (2) were 0.30 (p < 0.05) and 0.30 (p < 0.01), respectively, for Experiment 1, and 0.29 (p < 0.01) and 0.27 (p < 0.01), respectively, for Experiment 2. The coecient of determination r2 of the regression model were 0.86 (p < 0.01) and 0.88 (p < 0.01), respectively. Regarding the acoustic and visual combined stimuli condition, the partial regression coecients for variables a1 and a2 in equation (2) were 0.18 (p < 0.05) and 0.34 (p < 0.01), respectively, for Experiment 1, and 0.25 (p < 0.01) and 0.31 (p < 0.01), respectively, for Experiment 2. The r2 values were 0.89 (p < 0.01) and 0.91 (p < 0.01), respectively. Even though the C80 (stage), IACC(stage), D or was added to equation (2) as a variable of the regression, the r2 values increased little and these variables were not signicant. Regarding the subjective scale for the visual stimuli only condition, the r2 values for any combination of the three geometrical parameters was smaller than the correlation coecient with the vertical angle only.
4. Discussion
The acoustical balance between a singer on the stage and the orchestra in the pit is one of the main concerns with respect to the quality of opera houses. Barron [30] summarizes the aims of the design of an opera house in qualitative terms, stating that the speech should be intelligible, and that the orchestral sound should have a suitable clarity and convey an adequate sense of reverberance. Both the voice of the singer on the stage and the sound of the orchestra in the pit should reach every listener with sucient loudness. Of utmost importance are the balance between the singer and the orchestra and the fact that the acoustics of the theatre must favor the singer. Barron dened the balance index B at a listeners position as the dierence in the sound levels between the sources on the stage and in the pit. Prodi and Velecka [15] investigated the scale value of B by psycho-acoustic tests. They found that the most appropriate range of B values covers the interval from 2.0 to +2.3 dBA. Sato and Prodi [16] investigated the eects of acoustical parameters other than the level on subjective judgments of balance between a soprano singer and a keyboard inside dierent theaters. Their results of multiple regression analyses between the scale values and acoustical parameters show that the EDT of the stage source has a major inuence on the scale values of balance. The contribution of the stage parameters to the scale value is greater than that of the pit parameters. Robinson et al. also investigated the stage to pit ratio of several acoustical parameters [19]. Their listening test results showed that the stage to pit ratio of the IACC, C80 , and the level were signicantly correlated with subjective preference. Table II also shows the correlation coecients between the subjective scale and the balance parameters (dierence between the soprano (or the stage) and the keyboard (or the pit) values). Leq (balance), Lmax (balance), and C80 (balance) showed a signicant correlation with
756
Vol. 98 (2012)
Table II. Correlation coecients between the objective parameters and the scale values obtained by subjective judgments. ** p < 0.01; * p < 0.05. Acu: Acoustic stimuli only, Vis: Visual stimuli only, Acu+Vis: Acoustic and visual stimuli. Vis Leq (total) Leq (soprano) Leq (keyboard) Leq (soprano-keyboard) Lmax (total) Lmax (soprano) Lmax (keyboard) Lmax (soprano-keyboard) IACC (total) IACC (soprano) IACC (keyboard) IACC (soprano/keyboard) C80 (stage) C80 (pit) C80 (stage-pit) EDT (stage) EDT (pit) EDT (stage/pit) D 0.29 0.52 0.42 0.90 0.11 0.54 0.38 0.82 0.31 0.55 0.05 0.42 0.52 0.52 0.76 0.19 0.43 0.32 0.51 0.37 0.80 Experiment 1 Acu 0.75 0.88 0.12 0.79 0.64 0.90 0.12 0.74 0.46 0.61 0.10 0.36 0.72 0.44 0.81 0.01 0.20 0.23 0.48 0.26 0.95 Acu+Vis 0.64 0.81 0.07 0.89 0.49 0.82 0.04 0.80 0.49 0.66 0.05 0.45 0.69 0.50 0.84 0.11 0.32 0.27 0.56 0.36 0.95 Acu Experiment 2 Acu+Vis 0.74 0.88 0.06 0.84 0.61 0.88 0.07 0.76 0.60 0.74 0.041 0.52 0.69 0.50 0.84 0.17 0.34 0.23 0.61 0.27 0.97
0.78 0.90 0.15 0.79 0.67 0.91 0.15 0.73 0.62 0.74 0.00 0.54 0.67 0.46 0.80 0.19 0.32 0.22 0.61 0.15 0.96
the subjective scales. For the soprano source signal, the correlation with Lmax (soprano) was greater than that of Leq (soprano); but for the balance between the soprano and keyboard source signals, the correlation with Leq (balance) was greater than that of Lmax (balance). It is reasonable that balance is more correlated with Leq than Lmax because this means that the comparison of the two signals is more stable than the evaluation of each signal individually. The signals we used were highly non-stationary, and for music signals the Leq is actually a better estimate, even though Leq leaves out a lot of details. Not only the soprano sound level, but also the level balance between the two sources aected the subjective scale, while only the stage parameters were signicant in determining the subjective scale of the balance preferences in a previous study under the conditions of xed (constant) sound levels [16]. The level balance between the vocalist and keyboard can be controlled by the conductor and musicians, however, each position indicates dierent balance value. Leq (balance) showed a positive correlation with the subjective scale, that is, a larger soprano sound was preferred more in this particular experimental condition. Even though in Experiment 1 the presentation level was elevated by 5 dB due to background noise, the relationship between the sound level and the scale value showed a tendency of the greater sound level being more preferred. Moreover, greater sound levels were also associated with closer seat positions. Seat preference may be related to the source and receiver distances. Zahorik summarized that the parameters which inuence distance perception are sound intensity, the direct-toreverberant energy ratio, spectrum, and binaural cues [29].
When the sound levels were analyzed in more detail regarding the frequency bands, 1, 2 and 4 kHz of Leq (total) and Lmax (soprano) showed a signicant correlation with the subjective scales, while for Leq (keyboard), 125, 250, and 500 Hz showed a signicant correlation with the subjective scale. These results correspond to the main frequency components of the two signals, as shown in Figure 2. Because this method makes it possible the instantaneous comparison of the acoustics of dierent seats, the results showed that listeners made judgments according the sound level rst. Then the ratio of early-to-late energy ratio C80 was also a signicant factor. Because the vocal signal (accompanied by a keyboard) was used, a sound eld with a larger C80 (clearer sound) was preferred. An evaluation of a live performance may provide a result that the early to late reverberant ratio would have been the more critical variable. In a previous study which investigated the perceived balance between a soprano singer and a keyboard in different theaters under constant sound level conditions [16], the EDT for the stage source was found to have a major eect on the scale values of balance. In our study, on the other hand, EDT was found to only have a minor eect. We believe this is because the variance of EDT inside the theater is limited to comparing the variance between different theaters. On the other hand, C80 for the stage source showed a signicant positive correlation with the scale values for the acoustic only and the acoustic and combined visual tests. The clarity of the source signal, and especially that of the soprano sound, is one of the criteria for a preferred condition. IACC has a positive correlation with the
757
ACUSTICA
scale value for the three tests even though the correlations were not signicant. These ndings conrm the previous results with dierent theatres: when two sources are competing and one is probably more important for the listener, higher clarity and/or higher IACC are required to accomplish preferred listening. Again, this is somehow in contrast with the common knowledge gained from one source only. Non-personalized HRTF is a limitation to our study; however, the two experiments in the present study used a dierent stimuli presentation system, that is, both headphones and a multi-loudspeaker system. The results of the subjective scale obtained from these dierent test system, however, were found to be similar. Therefore, we could conclude that the subjective scale was validated even though personalized HRTF data were not used. Moreover, the signicant factors found to contribute to seat preference, namely the sound levels Leq and Lmax , and C80 , are not directly connected to HRFT. Whether the parameters related to spatial impressions such as IACC are signicant for seat preference must be examined using personalized HRTF impulse responses in future experiments. The three geometrical parameters showed a negative correlation with the subjective scales, indicating that a closer position is more preferred than a remote one, that a position closer to the center axis is more preferred than a side position, and a frontal sightline to the singer is more preferred than a sightline looking down at the singer. In the present experimental conditions, the sightlines of the visual stimuli were xed to the singer on the stage, and thus, the participants did not need to turn their heads, even when sitting in the side box positions. The highest positions in the gallery showed the lowest preference levels, although it is generally said that the gallery position has better acoustics due to the useful reection from the large ceiling plane. The reason for the low observed preference stems probably from the directional characteristic of the loudspeaker, which was used in the acoustical measurement to construct the reproduced sound eld. A xed (still) loudspeaker could not mimic exactly the upward sound projection like an actual soprano singer during performance. The visual stimuli were static, not dynamic. The limitation using still images can be seen not only in this study but also most of the other published studies on audio-visual perception and interaction in rooms. In our study the task of the participants was to judge their preferences according to dierent seat positions. Even static pictures are useful for this purpose. The use of motion pictures which are synchronized to the music is a further task to be attempted in subsequent studies. Presumably, visual preference will play a greater role in experiments with appropriately animated (dynamic or interactive) visual scenes. Anyway an interesting nding of the present study is the substantial independence of the position preference judgment of the participant groups and the means of presentations used in the two experiments. In other words despite the compromise on audio (Experiment 1, the participants
were not familiar with the theater) and on visual images (Experiment 2, the participants were familiar with the theater) the agreement of the two sessions was remarkable. This can be seen as a proof of the robust criteria adopted by the testers in their choices as far as the position preference was investigated in an Italian style opera theater, which has a clear sound and visual eld dierence among the listening positions. It should be further investigated whether such robust criteria can be adopted for the sound elds with more subtle dierence among the positions, as for instance in a modern opera hall.
5. Conclusions
To investigate the eects of acoustic and visual stimuli on the subjective preferences for dierent seating positions in a Italian style theater, the results of subjective evaluations using visual stimuli only, acoustic stimuli only, and a combination of acoustic and visual stimuli were compared. By combining the acoustic and visual information, the participants judgments became easier and more stable than those of the acoustic stimuli only conditions in terms of the circular error rate and the range of scale values. The correlation between the scale value for the acoustic stimuli only and the combined acoustic and visual stimuli conditions was greater than that for the visual stimuli only and the combined acoustic and visual stimuli conditions. The participants mainly used acoustic information to judge seat preferences. Stall positions were preferred to box positions in terms of both acoustics and vision. The sound level (Leq ) was the dominant factor in seat preference. In this experimental condition, a larger value of sound level, which is associated with a closer position to the stage, was found to contribute to the observed greater seat preference. The level dierence between the two source signals, that is, the soprano and the piano keyboard, was also signicantly correlated with seat preference. The seat positions which receive a more dominant soprano vocal signal are preferred to other positions. This dominant soprano preference was also reected in the factor C80 results. A higher C80 for the stage source, and a larger dierence between the stage and the pit C80 , which are related to higher clarity of the soprano signal, showed a higher seat preference. Regarding the visual preference the seats close to the symmetrical axis were preferred to the side positions in the stalls while the seats of the lower tiers were preferred to the other positions in the boxes. Among the geometrical parameters, the sourcereceiver distance showed a negative correlation coecient in every test, but it was not signicant. The vertical angle showed the highest correlation with the subjective scale in every test. This is because the vertical angle has a signicant correlation with the values of Leq , Lmax , and C80 . The limitations of this study are as follows. The auditory stimuli were non-personalized binaural presentation. The auralized sound stimuli made from the impulse responses measured with the loudspeakers and does not exactly simulate the (dynamic) sound radiation of the soprano and the
758
Vol. 98 (2012)
keyboard. The visual stimuli were static images and decoupled from the movement of the subjects sight. Acknowledgement The authors would like to thank Mr. Andrea Carletti and Ms. Bruna Grasso of Teatro Comunale di Ferrara (Italy) for providing the scenery pictures and initial drawings to create the CG model and the visual stimuli. The authors would also like to thank the participants who participated in the subjective tests. This work was supported by the National Natural Science Foundation of China (No. 50938003) and the Internationalization Project 2010 from the University of Ferrara. References
[1] L. L. Beranek: Music, acoustics, and architecture. John Wiley, New York, 1962. [2] M. Barron: Subjective study of British symphony concert halls. Acustica 66 (1988) 114. [3] T. Hidaka, L. L. Beranek: Objective and subjective evaluations of twenty-three opera houses in Europe, Japan, and the Americas. J. Acoust. Soc. Am. 107 (2000) 368383. [4] A. Farina: Acoustic quality of theatres: correlation between experimental measures and subjective evaluation. Applied Acoustics 62 (2001) 889916. [5] M. R. Schroeder, D. Gottlob, K. F. Siebrasse: Comparative study of European concert halls: correlation of subjective preference with geometric and acoustic parameters. J. Acoust. Soc. Am. 56 (1974) 11951201. [6] S. Kimura, K. Sekiguchi: Study on criteria for acoustical design of rooms by subjective evaluation of room acoustics. J. Acoust. Soc. Jpn. 32 (1976) 606614. [7] Y. Ando: Calculation of subjective preference at each seat in a concert hall. J. Acoust. Soc. Am. 74 (1983) 873887. [8] P. Larsson, D. D. Vastfall, M. Kleiner: Auditory-visual interaction in real and virtual rooms. Proceedings of the Forum Acusticum, 3rd EAA European Congress on Acoustics, Sevilla, 2002, PSY05005IP. [9] D. Cabrera, A. Nguyen, Y. J. Choi: Auditory versus visual spatial impression: A study of two auditoria. Proceedings of the 10th International Conference on Auditory Display (ICAD), Sydney, 2004, 235242. [10] J. Y. Jeon, Y. H. Kim, D. Cabrera, J. Bassett: The eect of visual and auditory cues on seat preference in an opera theater. J. Acoust. Soc. Am. 123 (2008) 42724282. [11] D. L. Valente, J. Braasch: Subjective scaling of spatial room acoustic parameters inuenced by visual environmental cues. J. Acoust. Soc. Am. 128 (2010) 19521964. [12] L. Tronchin, A. Farina: Acoustics of the former teatro La Fenice in Venice. J. Audio Eng. Soc. 45 (1997) 1051 1062.
[13] R. Pompoli, N. Prodi: Guidelines for acoustical measurements inside historical opera houses: procedures and validation. J. Sound Vib. 232 (2000) 281301. [14] S. Sato, H. Sakai, N. Prodi: Subjective preference judgments for source locations on the stage and the orchestra pit in an opera house. J. Sound and Vib. 258 (2002) 549 561. [15] N. Prodi, S. Velecka: A scale value for the balance inside a historical opera house. J. Acoust. Soc. Am. 117 (2005) 771779. [16] S. Sato, N. Prodi: On the subjective evaluation of the perceived balance between a singer and a piano inside dierent theatres. Acta Acustica united with Acustica 95 (2009) 519526. [17] T. Hidaka, N. Nishihara, L. L. Beranek: Relation of acoustical parameters with and without audiences in concert halls and a simple method for simulating the occupied state. J. Acoust. Soc. Am. 109 (2001) 10281042. [18] J. Meyer: Some problems of opera house acoustics. Proc. 12th International Congress on Acoustics, Vancouver, 1986. [19] P. W. Robinson, N. Xiang, J. Braasch: Investigations of architectural congurations and acoustic parameters for multiple sources. Proc. 20th International Congress on Acoustics, Sydney, 2010. [20] T. J. Cox, W. J. Davies, Y. W. Lam: The sensitivity of listeners to early sound eld changes in auditoria. Acustica 79 (1993) 2741. [21] I. Bork: A comparison of room simulation software. The 2nd round robin on room acoustical computer simulation. Acta Acustica united with Acustica 86 (2000) 943956. [22] ISO/CD 3382: Acoustics. Measurement of reverberation time. Part 1: Performance spaces. International Organization for Standardization, Geneve, 2003. [23] N. Prodi, C. Visentin, A. Farnetani: Intelligibility, listening diculty and listening eciency in auralized classrooms. J. Acoust. Soc. Am. 128 (2010) 172181. [24] N. T. Gridgeman: Pair comparison, with and without ties. Biometrics 15 (1959) 382388. [25] E. Parizet: Paired comparison listening tests and circular error rates. Acta Acustica united with Acustica 88 (2002) 594598. [26] http://hsp.tv/ [27] L. L. Thurstone: A law of comparative judgment. Psychological Review 34 (1927) 273289. [28] H. Gulliksen: A least squares solution for paired comparisons with incomplete data. Psychometrika 21 (1956) 125 134. [29] P. Zahorik, D. Brungart, A. W. Bronkhorst: Auditory distance perception in humans: A summary of past and present research. Acta Acustica united with Acustica 91 (2005) 409420. [30] M. Barron: Auditorium acoustics and architectural design. E&FN Spon, London, 1993.
759

Sato - Acoustic and Visual Stimuli

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Sato - Acoustic and Visual Stimuli

Încărcat de

Drepturi de autor:

Formate disponibile

ACTA ACUSTICA UNITED WITH Vol.

98 (2012) 749 759

Received 11 April 2011, accepted 31 July 2012.

S. Hirzel Verlag EAA

ACTA ACUSTICA UNITED WITH Vol. 98 (2012)

Sato et al.: Effects of stimuli on preferences

Sato et al.: Effects of stimuli on preferences

ACTA ACUSTICA UNITED WITH ACUSTICA

-20 -40 -60 -80 125

1/3 octave band centre frequency [Hz]

ACTA ACUSTICA UNITED WITH Vol. 98 (2012)

Sato et al.: Effects of stimuli on preferences

Sato et al.: Effects of stimuli on preferences

ACTA ACUSTICA UNITED WITH ACUSTICA

Position 5 Figure 3. Visual stimuli of the 10 positions.

ACTA ACUSTICA UNITED WITH Vol. 98 (2012)

Sato et al.: Effects of stimuli on preferences

Figure 4. Laboratory test settings. Top: Experiment 1; bottom: Experiment 2.

Sato et al.: Effects of stimuli on preferences

ACTA ACUSTICA UNITED WITH ACUSTICA

Visual and acoustic

Visual and acoustic

ACTA ACUSTICA UNITED WITH Vol. 98 (2012)

Sato et al.: Effects of stimuli on preferences

Sato et al.: Effects of stimuli on preferences

ACTA ACUSTICA UNITED WITH ACUSTICA

ACTA ACUSTICA UNITED WITH Vol. 98 (2012)

Sato et al.: Effects of stimuli on preferences

Sato et al.: Effects of stimuli on preferences

ACTA ACUSTICA UNITED WITH ACUSTICA

S-ar putea să vă placă și