Final - 1) 2 - PHD

THE ASSESSMENT OF SPEECH INTELLIGIBILITY IN ROOM ACOUSTICS
FOR EFFICIENT APPLICATION IN COMPUTER MODELLING AND

IMPROVED ENCLOSED SPACES

Volume 1 of 2: Text

Christos Nestoras

Supervisors:
Dr Stephen Dance
Prof Bridget Shield

A thesis submitted in partial fulfilment of the requirements of London South Bank
University for the degree of Doctor of Philosophy

September 2009

Preface

ii

Reviewed by:

First supervisor: Dr Stephen Dance, Senior lecturer, London South Bank University
Second supervisor: Prof Bridget Shield, Professor of Acoustics, London South Bank University
External examiner: Prof John Turner, Pro Vice-Chancellor, Portsmouth University
External examiner: Mr Peter Mapp, Principal consultant, Peter Mapp Associates
Internal examiner: Mr Ken Rotter, Senior lecturer, London South Bank University
Preface - Abstract
Abstract
The aim of this study is the development of computer models that are capable of
consistently predicting primary and speech intelligibility specific parameters for variable
source configurations within lecture rooms. Four main parts can be highlighted in the
study: consistently measuring the acoustic environment in rooms using a generic
measurement methodology; a series of low level measurements to establish the effect of a
continuously reducing S/N on the measurement results; the development and validation
of efficient and consistent computer models; and the designation of a new hybrid method
for an objective validation of auralization.

Multiple computer simulations have been processed for ten test rooms and validated by
using measurements taken for different source types and positions. Two contributions in
the model development have been made: the simulation validation/calibration
methodology using EDT, and the development of accurate computer models based on a
single reference parameter i.e. EDT.

The acoustic characteristics within ten test rooms have been measured and analyzed for
four source configurations to determine the level of consistency among the different
methodologies. The assessment also provided validation data for the computer modelling
purposes. Complementing measurements were taken using an open loop system, to
quantify the effectiveness and accuracy compared to traditional closed loop measurement
systems.

Finally, an objective validation method was developed to enable an efficient clear-cut
assessment of auralization quality, in particular for speech intelligibility parameters. A
comparative assessment of auralizations from ten test rooms was consequently
undertaken to determine the quality level incorporated. The procedure was complemented
by a typical subjective evaluation via listening tests for a broader view of the results.

iii
Preface - Acknowledgements

Acknowledgements

The core financial support for this work was provided by an LSBU research scholarship.
The Royal Academy of Engineering is also gratefully acknowledged for the award of a
travel grant to promote the dissemination of information and development of research
networks, among others. A number of people have contributed with their time and effort
in the duration of this project; I would like to thank primarily my supervisor Dr Stephen
Dance for his constructive criticism and continuous support throughout this study. His
enthusiastic approach has been invaluable. Also Prof Bridget Shield for her critical view,
particularly in the early stages of the study, and continuous encouragement. Mr Lars
Morset of Morset Sound Development Norway (WinMLS) provided direct support when
needed with software licensing issues. Dr Bengt-Inge Dalenbck of CATT Acoustics
Sweden offered his expert knowledge in different simulation aspects. Broad technical
support was provided by the technicians in the Acoustics Group lab. I would like to thank
our retired technician and current enterprise consultant in particular, Mr Salih Hassan, for
his help at various instances during the acoustic measurement sessions. Also Mr Louis
Gomez, KTP associate, for the long hours in field tests for the London Underground and
the synergy created to facilitate the audition of concept ideas in to actual working
environments; I want to thank all who offered their time to take part in the subjective
testing sessions. Numerous people that would be many to list here have also offered their
moral support, a contribution that I highly value. This project would have been
impossible without the support of my family. I want to thank Naoko for her patience and
understanding.
iv
Preface List of tables and figures

LIST OF TABLES

Page

Table 2.1. Matrix used for the determination of MTF in the 14 modulation frequencies and seven octave bands 29
Table 2.2. STI scale and equivalent subjective perception of speech intelligibility (current) 32
Table 2.3.STIPA modulation frequencies 34
Table 2.4. Weighting factors adopted by STIPA 34
Table 3.1. Room list for room acoustics measurements 60
Table 3.2. Average BGNL over ten rooms (Leq, 1min) 61
Table 3.3. Acoustic parameters measured in ten test rooms and statistical summary 61
Table 3.4. Standard deviation for T
30
and EDT among the four source types in Room 1 63
30
30
30
30
30
30
30
30
30
Table 4.1. Sample threshold efficient S/N ratios using a sine sweep in a test room 97
Table 4.2. Comparison of STI for reference and experimental (marginal) conditions 98
Table 4.3. Threshold efficient S/N for the six system configurations derived from marginal T
30
data in N11
(reverberation chamber) 128
30
data in Room 10 128
Table 5.1. Example mean values for single omni directional source (prediction against measurement) 142
Table 5.2. T
30
for actual and predicted conditions (simple) in Room 8 144
Table 5.3. EDT for actual and predicted conditions (simple) in Room 8 144
Table 5.4. C
50
for actual and predicted conditions (simple) in Room 8 144
Table 5.5. T
30
for actual and predicted conditions (CAD) in Room 8 145
Table 5.6. EDT for actual and predicted conditions (CAD) in Room 8 145
Table 5.7. C
50
for actual and predicted conditions (CAD) in Room 8 145
Table 5.8. Example of average error for Simple model using an alternative source configuration in Room 8 (sound
system, SS4) 146
Table 5.9. Example of average error for CAD model using an alternative source configuration in Room 8 (sound
system, SS4) 146
Table 5.10. Comparison of prediction data to room acoustics measurements in Room 1 (omni source), averaged over all
receiver positions 150
Table 5.11. Comparison of prediction data to room acoustics measurements in Room 1 (sound system), averaged over
all receiver positions 150
v

Table 5.28. Comparison of prediction data to room acoustics measurements in Room 10 (omni source), averaged over
Table 6.1. Result variation example between the two auralization validation methods (averaged over ten test rooms for
omni source 1) 173
Table 6.2. Result variation example between the two auralization validation methods (averaged over ten test rooms for
omni source 2) 173
Table 6.3. Prediction and auralization data comparison example (Room 2, S1) 175
Table 6.4. Prediction and auralization data comparison example (Room 2, SS4) 176
Table 6.5. Example of prediction and auralization based acoustic parameter differences for Simple model, Single
source (S1) in Room 8 178
Table 6.6. Example of prediction and auralization based acoustic parameter differences for CAD model, Single source
(S1) in Room 8 178
Table 6.7. Example of prediction and auralization based acoustic parameter differences for Simple model, Multi
source (SS4) in Room 8 179
Table 6.8. Example of prediction and auralization based acoustic parameter differences for CAD model, Multi source
(SS4) in Room 8 179
Table 7.1. Assessment uncertainty via error margins, calculated by considering the data origin at the assessment stages
190

LIST OF FIGURES

Page

Figure 2.1 Impulse response example 9
Figure 2.2 Delta function pulse in time domain 10
Figure 2.3 Continuous signal as a function of Delta function pulses 11
Figure 2.4 Exponentially swept sine example, a) Input b) System response to input 12
Figure 2.5 STIPA signal sample incorporating a pink spectrum, a) Signal spectrum, b) Signal time history (5sec), c)
Sample of typical speech time history (5sec) 15
Figure 2.6 Multi sloped sound decay accounting for a 60dB level drop 17
Figure 2.7 Multi sloped sound decay accounting for a 30dB level drop 18
Figure 2.8 Typical audio and modulation spectra for speech 24
Figure 2.9 Input/Output comparison with respect to modulation depth and resulting MTF spectrum 29
Figure 2.10 Effects of binaural hearing, I) ILD, based on level difference (for higher frequencies), II) ITD, based on
phase difference (for lower frequencies) 35
Figure 2.11 The Common Intelligibility Scale as determined by the IEC 44
Figure 2.12 Ray tracing principle and example sound propagation paths 47
Figure 2.13 Image source principle 48
Figure 2.14 Image source and resultant sound ray examples in a room 50
Figure 3.1 Sound source configurations (I-IV) 58
Figure 3.2 Examples of classroom population used in the study 60
Figure 3.3 Source efficiency in terms of C50- Room 1 66
vi

Figure 3.13 STI for four source configurations in Room 1, I) Primary - II) Post processed 70
Figure 3.16 STI for three source configurations in Room 4, I) Primary - II) Post processed 71
Figure 3.20 STI for four source configurations in Room 8, I) Primary- II) Post processed 72
Figure 3.23 Relation of Clarity to MTI in ten test rooms, I) C50 to MTI without background noise, II) C80 to MTI
without background noise, III) C50 to MTI with background noise, IV) C80 to MTI with background
noise 75
Figure 3.24 MTI relation to space reverberance in ten test rooms (no noise) 76
Figure 3.25 Relation of EDT to T30 for four source configurations in ten test rooms (S1, S2, SS4loudspeakers,
SS2loudspeakers) 77
Figure 3.26 Relation of EDT to T30 for four source configurations after excluding Rooms 9-10 (S1, S2,
SS4loudspeakers, SS2loudspeakers) 77
Figure 3.27 Relation of C50 to EDT in ten test rooms 78
Figure 3.28 Measurement system, I) Closed loop configuration, II) Open loop configuration 79
Figure 3.29 Comparison of T30 and EDT results for Closed-Open loop systems as an average over all measuring
positions in Room 1 (Four source configurations) 81
positions in Room 4 (Three source configurations) 84
positions in Room 8 (One source configurations) 88
Figure 4.1 Six system configurations (I-VI) 96
Figure 4.2 Test room (reverberation chamber) schematic with source-receiver positioning 97
Figure 4.3 T30 and EDT accuracy/performance decay in octave bands for a series of measurements (-1dB in signal
level per measurement), Room 1, Signal from Sound system (no simulated noise) 101
level per measurement), Room 2, Signal from Sound system (no simulated noise) 102
level per measurement), Room 3, SS (signal), SS (noise) 103
level per measurement), Room 5- SS (signal), SS (noise) 105
vii

Figure 4.13 MTI data for Room 1 113
Figure 4.23 STI in Room 1 (no frequency weighting) 123
Figure 4.33 Threshold efficient S/N trend in relation to T30, 125-8kHz octave band data in ten test rooms 126
Figure 4.34 Threshold efficient S/N trend in relation to EDT, 125-8kHz octave band data in ten test rooms 126
Figure 4.35 Test rooms used in the assessment of result repeatability, I) N11, II) Room 10 128
Figure 5. 1 Example room geometry, I) Simple representation II) Enhanced detail 135
Figure 5. 2 Measured directivity response for Yamaha HS50M monitors at 1m in free field conditions
(balloon) 136
Figure 5. 3 Measured directivity response for Yamaha HS50M monitors at 1m in free field conditions
(polar) 137
Figure 5. 4 Top view of test rooms for the validation of T30 calibration methodology 141
Figure 5. 5 Example of model detail resolution in Room 8, I) Via coordinate system, II) Via CAD software 143
Figure 5. 6 Mean EDT for actual and predicted conditions (simple) in Room 8 144
Figure 5. 7 STI comparison for actual and predicted conditions (simple) in Room 8 144
Figure 5. 8 Mean EDT for actual and predicted conditions (CAD) in Room 8 145
Figure 5. 9 STI comparison for actual and predicted conditions (CAD) in Room 8 145
Figure 5. 10 Example STI error in multi source conditions (S4) for Simple model in Room 8 147
Figure 5. 11 Example STI error in multi source conditions (S4) for CAD model in Room 8 147
Figure 5. 12 Model detail example, I) Full, II) Optimized 148
Figure 5. 13 Geometry representation of Room 1 in simulation software 150
Figure 6. 1 Objective validation of auralization schematic 170
Figure 6. 2 Binaural recording setup 171
Figure 6. 3 Head and torso simulator in measurement position 171
Figure 6. 4 Comparison of T30 from prediction output and auralization validation (simple) in Room 8 178
Figure 6. 5 Comparison of EDT from prediction output and auralization validation (simple) in Room 8 178
viii

ix
Figure 6. 6 Comparison of T30 from prediction output and auralization validation (CAD) in Room 8 178
Figure 6. 7 Comparison of EDT from prediction output and auralization validation (CAD) in Room 8 178
Preface Abbreviations and definitions

Abbreviations and definitions

Abbreviation

Definition

%Alcons
Percentage articulation loss of consonants, see p.22. A parameter for
the assessment of speech intelligibility.
AES
Audio engineering society
AI
Articulation index, see p.22. A parameter for the assessment of
speech intelligibility.
ANSI
American national standards institute
BGNL
Background noise level
BRIR
Binaural room impulse response, See also RIR. BRIR relates to a
binaural listener model that involves two impulse responses at a
listener position. Binaural denotes stereophonic and equivalent to
human hearing audio processing.
BSI
British standards institute
C
Clarity index (dB). C describes the clarity of the signal on
propagation, at a receiver position. This energy ratio is defined as
the ratio of early to late arriving sound. The index e.g. 50 relates to
the time threshold for defining late arriving sound (in milliseconds).
A 50ms limit is commonly used for speech intelligibility
applications.
CAD
Computer aided design. Denotes software implementation for 3D
model generation, commonly for architectural design purposes. In
the context of the study, 'CAD' also denotes a computer model
incorporating a somewhat more detailed definition of room
geometry.
CIS
Common intelligibility scale. Published by the IEC, CIS relates a
number of speech intelligibility measures on a single scale, also
including common subjective assessment methods.
CVC
Consonant-vowel-consonant. Denotes a sequence of words as
arranged by their starting letter. CVC word lists are used in
subjective speech intelligibility assessments such as MRT, DRT, PB
word lists etc. (see related glossary entries).
D
Definition (%). D is an energy ratio (early energy / total energy)
relating to the distinctness of sound in a room. The measure defines
a condition as a percentage and has been found to have good
correlation with speech intelligibility.
DRT
Diagnostic rhyme test. DRT is a subjective method for the
assessment of speech intelligibility.
EDT
Early Decay Time. Reverberation time derived from the first 10 dB
of level decay, normalized to a 60dB decay. See p.18.
G
Strength (dB). G is a measure relating to the overall energy
transferred from the sound source to the receiver after subtracting
the influence of the direct field.
HATS
Head and torso simulator. HATS is typically used for binaural
recordings/measurements to account for the influence of the human
body on sound propagation, as would be perceived by a listener.
HRTF
Head related transfer function. HRTF relates to a filter characteristic
that simulates the influence of the human head on sound propagation
when reaching a listener.
I/O
Input/output
IACC
Interaural cross correlation. IACC is an acoustic parameter relating
to spaciousness. It is obtained using a binaural receiver and provides
information on the correlation between the signals received at the
two ears. See p. 19.
ILD
Interaural level difference. ILD is a psychoacoustic effect based on
the signal level difference at the ears due to shadowing effects
caused by the head. See section 2.5.
ISO
International organization for standardization.
ITD
Interaural time difference. ITD is a psychoacoustic effect based on
the phase shift caused by the interaural time delay. It is effectively a
x

function of time delay for sound arriving at the two ears due to the
relative position of the source with regard to the listener. See section
2.5.
ITU
International Telecommunication Union.
JND
Just noticeable difference. JND is used to describe the perception
threshold for changes in a particular condition on a subjective basis.
L10
L10 is a statistical measure to describe the sound level exceeded for
10% of the measurement duration. Effectively, L10 is a description
of peak noise.
LAeq
Leq measured in dBA. See also Leq.
Leq
Leq (dB) is the continuous noise level equivalent of a sound event,
as an average over a period of time.
m(F)
Modulation transfer function. m(F) describes/quantifies a
transmission path by the decrease of the modulation depth, via a
comparison of the test signals modulation index, m
i
, to the
modulation index at the receiver, m
o
, as a function of modulation
frequency. The result relates to a subjective perception of speech
intelligibility. See section 2.4.3.
MLS
Maximum length sequence. MLS is a type of test signal also used in
acoustic measurements, described as a periodic pseudorandom
signal. It is considered as largely efficient on a number of aspects
relating to acoustic measurements. See section 2.2.1.2.
MRT
Modified rhyme test. MRT is a subjective method for the assessment
of speech intelligibility.
MTF
Modulation transfer function. The abbreviation is normally used to
refer to the general MTF theory. See also m(F) and section 2.4.3.
MTI
Modulation transfer index. MTI is the average TI per octave band k.
A direct averaging of MTI values will result in a basic STI, however
a number of corrections are normally applied for a more realistic
STI. See TI, STI entries and section 2.4.3.
P.A.
Public address system. P.A. refers to a sound system aimed for
public address.
PB word lists
Phonetically balanced word lists. PB word lists comprise purposely
designed word lists forming the basis for a subjective methodology
for the assessment of speech intelligibility.
RIR
Room impulse response. The impulse response of a system (room) is
the mathematical function that describes the output signal when the
input is excited by a unit pulse.
RT
Reverberation time. RT is defined as the time taken for a sound to
decay by 60dB (T60) after the excitation source has ceased.
S/N
Signal to noise ratio. S/N is a fundamental parameter in acoustic
measurements, directly relating to speech intelligibility.
SII
Speech intelligibility index, see p.22. A parameter for the
assessment of speech intelligibility.
SIL
Speech interference level, see p.23. A parameter for the assessment
of speech intelligibility.
SNR
See S/N.
SPL
Sound pressure level. SPL denotes the rms sound pressure of a
signal with reference to the threshold of hearing.
STD
See .
STI
Speech transmission index. STI is one of the primary acoustic
parameters for the assessment of speech intelligibility. It is based on
the determination of m(F) values at 98 data points. See section
2.4.3.1.
STIPA
Speech transmission index for public address systems. STIPA is an
STI derivative specifically aimed for assessing speech intelligibility
for transmission channels involving a sound system, see section
2.4.3.2. The abbreviation is also used to refer to the test signal
purposely developed for use with the method, see section 2.2.1.4.
T30
Reverberation time as defined by a 30dB decay (T30) normalized to
a 60dB decay equivalent. See also RT.
Threshold efficient S/N
A S/N that is marginally efficient in terms of the signal level
required for an accurate measurement. See also S/N.
TI
Transmission index. TI is determined from the apparent S/N ratio,
specific for octave band k and modulation frequency f. It forms the
basis for the derivation of MTI and subsequent STI. See MTI, STI
entries and section 2.4.3.
xi

xii
T
s

Centre time (ms). Ts is an acoustic parameter defined as the time of
the centre of gravity of the squared impulse response, see p.21.

Standard deviation. is a statistical measure relating to the
dispersion of a group of values from the mean.

Contents

CONTENTS

Page

ABSTRACT.................................................................................................................................................................... iii

CHAPTER 1.................................................................................................................................................................... 4
INTRODUCTION........................................................................................................................................................... 4
1.1 Aims.................................................................................................................................................................. 4
1.2 Outline of thesis ................................................................................................................................................ 6

CHAPTER 2.................................................................................................................................................................... 8
LITERATURE REVIEW............................................................................................................................................... 8
2.1 Introduction....................................................................................................................................................... 8
2.2 The impulse response theory............................................................................................................................. 8
2.2.1 Test signals.............................................................................................................................................. 10
2.2.1.1 Dirac or Delta function pulse........................................................................................................... 10
2.2.1.2 Maximum Length Sequence (MLS) ................................................................................................ 11
2.2.1.3 Sine sweep (exponential)................................................................................................................. 12
2.2.1.4 STIPA signal ................................................................................................................................... 14
2.2.1.5 Other signals.................................................................................................................................... 15
2.2.2 Acoustic Parameters................................................................................................................................ 16
2.2.2.1 Reverberation Times Indexes (RT) ................................................................................................. 16
2.2.2.2 Spaciousness Parameters ................................................................................................................. 19
2.2.2.3 Energy Ratios .................................................................................................................................. 20
2.2.2.4 Levels .............................................................................................................................................. 21
2.2.2.5 Speech Intelligibility Parameters..................................................................................................... 22
2.3 Fundamental attributes of speech.................................................................................................................... 23
2.4 Speech intelligibility measurement methodologies ......................................................................................... 24
2.4.1 Statistical (Direct) Measures of Speech Intelligibility............................................................................. 25
2.4.2 Objective (Indirect) Measures of Speech Intelligibility........................................................................... 26
2.4.3 The Modulation Transfer Function (MTF).............................................................................................. 27
2.4.3.1 The Speech Transmission Index (STI) ............................................................................................ 30
2.4.3.2 The STI Public Address (STIPA) method....................................................................................... 33
2.5 Implications of binaural hearing...................................................................................................................... 34
2.6 Sound fields in enclosed spaces for speech intelligibility ............................................................................... 36
2.6.1 Sound fields in natural acoustics ............................................................................................................. 37
2.6.2 Sound system assisted sound fields ......................................................................................................... 38
2.6.3 (University) classroom acoustics............................................................................................................. 40
2.6.4 Relations between measures of speech intelligibility .............................................................................. 42
2.7 Sound field simulation .................................................................................................................................... 44
2.7.1 Geometrical acoustics ............................................................................................................................. 44
2.7.1.1 Ray tracing (stochastic) ................................................................................................................... 46
2.7.1.2 Image source method (deterministic) .............................................................................................. 48
2.7.1.3 Hybrid models (deterministic-stochastic)........................................................................................ 51
2.7.2 Auralization............................................................................................................................................. 52
2.8 Summary and conclusions............................................................................................................................... 53

CHAPTER 3.................................................................................................................................................................. 56
ROOM ACOUSTICS MEASUREMENTS................................................................................................................. 56
3.1 Introduction..................................................................................................................................................... 56
1
Contents

3.2 Acoustic conditions for the different source configurations............................................................................ 57
3.2.1 Measurement methodology ..................................................................................................................... 57
3.2.2 Equipment list ......................................................................................................................................... 59
3.2.3 Test rooms............................................................................................................................................... 59
3.2.4 Results..................................................................................................................................................... 61
3.3 Measurement output accounting for typical speech and BGNL...................................................................... 70
3.4 Parameter interrelations................................................................................................................................... 73
3.4.1 Clarity (C) energy ratios versus STI........................................................................................................ 74
3.4.2 Room reverberance (EDT, T
30
) versus STI ............................................................................................. 75
3.4.3 EDT versus T
30
........................................................................................................................................ 76
3.4.4 EDT versus C
50
....................................................................................................................................... 77
3.5 Comparison of measurements for closed/open loop........................................................................................ 78
3.5.1 Open loop measurement methodology.................................................................................................... 78
3.5.2 Supplementary equipment list for open loop measurements ................................................................... 79
3.5.3 Two system data comparison (Closed-Open loop data) .......................................................................... 79
3.5.4 Comments on Open loop measurement sessions..................................................................................... 91
3.6 Conclusions..................................................................................................................................................... 92

CHAPTER 4.................................................................................................................................................................. 94
LOW LEVEL MEASUREMENTS ............................................................................................................................. 94
4.1 Introduction..................................................................................................................................................... 94
4.2 Measurement methodology............................................................................................................................. 95
4.2.1 Threshold efficient signal to noise ratio (S/N) measurements ................................................................. 95
4.2.2 Noise source incorporation...................................................................................................................... 95
4.3 Initial investigation and screening sessions..................................................................................................... 97
4.4 Low level measurements in ten test rooms...................................................................................................... 99
4.4.1 Measure interrelations and performance ................................................................................................. 99
4.4.2 Correlation of T
30
(and EDT) with threshold efficient S/N................................................................... 125
4.4.3 Repeatability of results with/without simulated noise floor .................................................................. 127
4.5 Conclusions................................................................................................................................................... 130

CHAPTER 5................................................................................................................................................................ 131
COMPUTER MODELLING OF TEST SPACES ................................................................................................... 131
5.1 Introduction................................................................................................................................................... 131
5.2 Preparation methodology .............................................................................................................................. 133
5.2.1 Model design......................................................................................................................................... 133
5.2.1.1 Model detail resolution.................................................................................................................. 134
5.2.1.2 Source directivity .......................................................................................................................... 135
5.2.1.3 Definition of absorption and scattering coefficients ...................................................................... 137
5.2.2 Model validation/calibration methodology............................................................................................ 139
5.3 Experimental results...................................................................................................................................... 140
5.3.1 Basis for model validation/calibration................................................................................................... 140
5.3.1.1 Test methodology - Room acoustics measurements ...................................................................... 140
5.3.1.2 Test methodology T
30
calibration ............................................................................................... 140
5.3.1.3 Test methodology Results via output comparison...................................................................... 141
5.3.1.4 Session conclusions....................................................................................................................... 142
5.3.2 Model resolution ................................................................................................................................... 142
5.3.2.1 Assessment preparation and the impact of detail resolution.......................................................... 142
5.3.2.2 Assessment result for a single omni directional source ................................................................. 143
5.3.2.3 Use of Alternative Source Configurations..................................................................................... 146
5.3.2.4 Discussion ..................................................................................................................................... 147
5.3.2.6 Session conclusions....................................................................................................................... 149
5.4 Prediction results........................................................................................................................................... 149
5.4.1 Room 1 data .......................................................................................................................................... 150
5.4.2 Room 2 data .......................................................................................................................................... 151
5.4.3 Room 3 data .......................................................................................................................................... 152
5.4.4 Room 4 data .......................................................................................................................................... 153
5.4.5 Room 5 data .......................................................................................................................................... 154
2
Contents

3
5.4.6 Room 6 data .......................................................................................................................................... 155
5.4.7 Room 7 data .......................................................................................................................................... 156
5.4.8 Room 8 data .......................................................................................................................................... 157
5.4.9 Room 9 data .......................................................................................................................................... 158
5.4.10 Room 10 data ........................................................................................................................................ 159
5.5 Discussion..................................................................................................................................................... 160
5.5.1 Prediction results................................................................................................................................... 160
5.5.2 Simulation calibration using reference EDT ......................................................................................... 161
5.5.3 Model design and preparation ............................................................................................................... 162
5.5.3.1 Detail resolution requirement ........................................................................................................ 163
5.5.3.2 Room acoustics modelling guideline............................................................................................. 163
5.5.4 Comments on modelling sessions ......................................................................................................... 164
5.6 Conclusions................................................................................................................................................... 164

CHAPTER 6................................................................................................................................................................ 167
AURALIZATION....................................................................................................................................................... 167
6.1 Introduction................................................................................................................................................... 167
6.2 Objective validation of auralized responses .................................................................................................. 168
6.2.1 Objective validation of auralized responses using a swept sine ............................................................ 169
6.2.2 Evaluation of results.............................................................................................................................. 170
6.3 Subjective validation of auralized responses................................................................................................. 171
6.3.1 Recording room responses .................................................................................................................... 171
6.3.2 Equipment list ....................................................................................................................................... 171
6.3.3 Comparison of recordings to predicted auralization.............................................................................. 172
6.4 Auralization study in ten test rooms.............................................................................................................. 172
6.4.1 Comparison of objective validation methods ........................................................................................ 173
6.4.2 Objective assessment of auralizations ................................................................................................... 174
6.4.3 Subjective assessment of auralizations.................................................................................................. 176
6.5 Auralization accuracy and relation to model detail ....................................................................................... 177
6.5.1 Objective assessment of convolution quality from simple and CAD models................................... 177
6.5.2 Assessment of auralization realism by subjective means ...................................................................... 179
6.6 Conclusions................................................................................................................................................... 180

CHAPTER 7................................................................................................................................................................ 183
SUMMARY AND CONCLUSIONS.......................................................................................................................... 183
7.1 Overview....................................................................................................................................................... 183
7.2 Room acoustics measurement methodology ................................................................................................. 183
7.3 Low level measurements............................................................................................................................... 185
7.4 Development of an optimized methodology for improved computer models................................................ 186
7.5 Auralization accuracy assessment ................................................................................................................. 187
7.6 Further work.................................................................................................................................................. 189
7.7 Overall conclusions....................................................................................................................................... 189

REFERENCES............................................................................................................................................................ 191
Chapter 1 - Introduction

CHAPTER 1
Introduction

1.1 Aims
The aim of this study is to develop and validate an efficient methodology comprising
acoustic measurements and computer modelling for the prediction of acoustic parameters,
in particular speech intelligibility, in lecture rooms.

The focus of the research is principally on post evaluation of conditions, i.e. improving
existing environments, for a number of different source configurations including multi
source conditions (sound system). An assessment on this basis depends on the numerical
output of the prediction that will be used to identify existing or potential acoustic
conditions. Prediction consistency is in turn dependant on the accuracy of room acoustics
measurements that are typically used as a reference for performance, enabling calibration
and fine tuning of the simulation. An auralization of an enclosed space provides a means
for a fast subjective evaluation of the room acoustics. Thus precision in the prediction of
the room response is a fundamental requirement.

Room acoustics measurements, being the first stage of an assessment, require a consistent
measurement methodology that is adequately generic to realistically account for the
actual potential conditions within a space e.g. not invalidated by loudspeaker distortion or
low signal to noise ratios (S/N). The measurement results would thus enable their use as a
general indicator of acoustic conditions, as opposed to a measurement outcome that is
invalidated on this basis. Accordingly, it is necessary to identify a reliable methodology
to obtain reference measurement results that accurately represent a space and can thus be
suitably post processed to derive speech intelligibility specific parameters for different
4

conditions i.e. different speech level, speaker gender and background noise level
(BGNL).

Validated computer models can be used to predict the acoustic environment in
significantly altered conditions e.g. alternative source types and positions. This option
requires a simulation that is capable to retain consistency with measurements under the
altered conditions; a suitable simulation validation method is essential for this approach
to be effective. The level of detail incorporated in a computer model is an additional
factor in the development process that has a direct effect on the resultant simulation
efficiency e.g. processing speed and prediction accuracy. Commonly, rules of thumb
relating to the required detail resolution concern larger rooms that fundamentally do not
apply in typical university classrooms; consequently, there is no clear guidance for model
construction of such spaces. It is thus necessary to establish the influence of such variants
on the simulation efficiency to identify the most competent approach.

In a similar context, the auralization generated via a computer simulation is typically
assumed to be consistent with the predicted acoustic parameters, accurately representing
a given space. However, an objective validation methodology is evidently required to
enhance confidence in the result, as subjective methodologies involving listening tests are
usually not cost effective and difficult to implement. Subjective testing also can not give
absolute results. Currently, one such methodology exists that is limited in assessing the
accuracy of an intermediate product in the auralization process i.e. the room impulse
response. An improved methodology is required to enable an assessment of the end
product of the process, the auralized material. This would provide enhanced confidence
on the consistency of the auralization in relation to the prediction outcome, and
consequently to the measured parameters as well. The current method is furthermore
limited by the capabilities of the required host measurement software, as only a Dirac
pulse can be used in the necessary deconvolution process. Increased flexibility of an
objective validation method in these terms would be beneficial by enabling a broader
application of the method, leading to more accurate auralization based assessments.

5

This study addresses the processes, from room acoustics measurements to auralization via
computer simulations, as linked by a prospective speech intelligibility predictive
assessment for improved enclosed spaces.

1.2 Outline of thesis
The work in this study has been divided in 7 chapters. Chapter 1 gives a synopsis of the
work undertaken, introducing the primary objectives. Chapter 2 gives a general review of
the literature related to the study. The fundamental concept of the impulse response
theory and measurement techniques is primarily addressed, while the main acoustic
parameters used in the study are presented and defined in this context. The notion of
speech intelligibility and basic psychoacoustic and phonological attributes of speech
signals are given to complement the related measurement techniques. Subjective
measurement methodologies are also briefly outlined. The currently main objective
measurement techniques i.e. STI, STIPA are presented in detail, following an
introduction to the underlying modulation transfer function theory (MTF). The
implications of speech intelligibility in the acoustic conditions of enclosed spaces are
addressed in terms of both natural acoustics and sound system assisted sound fields. The
interrelation between measures relating to speech intelligibility is reviewed in the context
of classroom acoustics. The chapter concludes with an introduction to computer
modelling in terms of ray tracing, image source methods and hybrid approaches, as
related to the study.

Chapter 3 introduces the room acoustics measurements in ten test rooms. The primary
objective is to establish the acoustic character of the rooms considered, while EDT, T
30
,
C and STI are analyzed to determine their interrelationship. Using in turn four source
configurations the effect of the sound source type and position on the derived acoustic
parameters is examined to determine the consistency of measurement data that are
obtained under different conditions as such. Open loop based measurement data is
compared on the same basis against the typical closed loop system equivalent.

6

7
In Chapter 4, low level measurements in ten test rooms are examined to determine the
effect of a continuously reducing S/N on the associated measurement. The relation of S/N
to room reverberance is thus examined to discern an efficient approach in obtaining
usable data from low level measurements.

Chapter 5 introduces the development of computer models for ten test rooms. Focus is
primarily on two aspects of the simulation process i.e. the validation/calibration in terms
of an appropriate reference parameter and the required detail resolution in a model. T
30

and EDT are examined as possible reference parameters for the validation/calibration
process. An efficient methodology is presented and the influence of the reference
parameter, T
30
and EDT, in terms of the resultant simulation accuracy is evaluated.

The required detail resolution is determined with the use of models incorporating
different levels of detail for the same room. Enhanced detail for smaller rooms appears
advantageous in terms of consistency with measurements. However, given the wide
availability of CAD software which can significantly simplify the generation process, the
balance in terms of detail level, processing speed and overall efficiency is often disturbed
by users. Issues relating to the model development time and resultant processing speed
are thus addressed to streamline the process of model development.

Chapter 6 discusses the need for an objective auralization validation process, as opposed
to typical subject based approaches. A new hybrid method is proposed and compared
with what is currently the only available objective practice, to establish the gain by the
new approach. Auralizations in ten test rooms are subsequently assessed to examine
consistency with the predictions numerical output. The influence of the model detail
resolution is further examined through auralizations. The sessions are complemented by a
subjective assessment via listening tests for a broader view of the results.

Finally, a synopsis of the work achieved in this study and suggestions for further work
are given in Chapter 7.

Chapter 2 - Literature review

CHAPTER 2
Literature review

2.1 Introduction
Speech intelligibility is a complex concept, forming a dynamic process with its key
parameters that directly relate to potential acoustic performance in a room. Given the
scope of the current study, different methodologies for the measurement of acoustic
parameters, and speech intelligibility in particular, are presented and the discussion
expands to include the general framework of the impulse response theory and related
measurement techniques. Acoustic parameters related to the study are briefly reviewed
and the MTF theory, the theoretical basis for key measures of intelligibility, is
introduced, following a synopsis of critical elements of speech sounds. A review and
discussion on classroom acoustics, as well as an introduction to computer modelling and
auralization conclude the chapter.

2.2 The impulse response theory
The impulse response of a system is the mathematical function that describes the output
signal when the input is excited by a unit pulse
[1]
(Figure 2.1). In acoustical science the
theory is applied in an analogous way by regarding an enclosed space as a filter having an
input that is the test signal source S(t) and an output S(t), represented by a given receiver
position within the room.

8

Figure 2.1. Impulse response example

For a system that is linear and time invariant (LTI) the distinct alterations that the
traveling signal undergoes become the means to determine and characterize the acoustic
properties of the space in consideration. S(t), in this context, is considered as a temporal
function in time of sequential intensity components, given that for a linear system an
infinite number of ideal pulses (see section 2.2.1.1) arriving at different time delays can
be used to represent the decaying signal. A reverse integration technique, originally
introduced by Schroeder in 1965 for RT estimation
[ 2 ]
, formed the basis for further
processing methodologies to allow derivation of the majority of acoustic parameters from
an impulse response measurement, as the latter is considered an accurate and complete
description of the acoustic properties of a transmission system
[1]
. S(t) can be therefore
calculated by convolving the source signal S(t) and the impulse response g(t) as given by:

'( ) ( ') ( ') '
t
S t g t t s t dt
=
}
(2.1)

Parameters accounting for the directional response of a room moreover can be based on a
binaural room impulse response (BRIR) measurement using a pair of matching
microphones, or a head and torso simulator (HATS) to more closely approximate a
human listener. The obvious advantage for the second case is that wave effects attributed
to sound transmission around the body of the listener are accounted for without the need
for a correction factor i.e. a head related transfer function (HRTF). This is particularly
useful if considering binaural speech intelligibility measurements, see section 2.4.3.1.
9

In the context of the aforementioned, HATS based measurements can also be used for the
purposes of a realistic recording/auralization, see section 6.3.

2.2.1 Test signals
In measuring a rooms impulse response a test signal by definition comprises of a
reproducible, with respect to sound power radiation (for directivity and frequency
content), impulsive sound. The quality of the test signal in this sense directly relates to
measurement consistency.

2.2.1.1 Dirac or Delta function pulse
The ideal form of a test signal is a Dirac i.e. Delta function pulse signal which,
theoretically, can be defined as a pulsive signal of infinitely short duration, infinitely high
power and unit energy (Figure 2.2, Eq 2.2). On this basis, an infinitely broad spectrum is
also suggested.

( ) 1 t dt o
+
=
}
(2.2)

Figure 2.2. Delta function pulse in time domain
Where (t)=0 when t0 and, approaching to infinity when t=0.

A loudspeaker, nonetheless, confronts physical restraints in reproducing very short length
sounds rendering Delta functions unfeasible/unrealistic as an actual test signal. However,
any signal can be perceived as a close succession of short impulses (Figure 2.3), a
characteristic that is central in methodologies using reverse integration (as discussed in
the preface of section 2.2).

10

Figure 2.3. Continuous signal represented by a series of Delta function pulses

2.2.1.2 Maximum Length Sequence (MLS)
The Maximum Length Sequence (MLS) is a periodic pseudorandom signal, widely
accepted as a largely efficient test signal on a number of aspects. Comprising effectively
of Delta function pulses, MLS has the desired property that its frequency spectrum over
one period is as flat as an ideal pulse, resulting in a clean output (no residual noise). Fast
Hadamard transform (FHT) is used to perform the required correlation process, as first
described within the context of room acoustics by Alrutz and Schroeder
[ 3 ]
. The
deterministic nature of MLS enables synchronous averaging thus allowing an increase in
signal to noise ratio (S/N). This can be achieved as, theoretically, exactly the same results
are expected for subsequent periods of MLSs from measurement repetition under the
same conditions. By repetition, the sequence will add up in phase with the previous ones
while background noise that is not correlated for the number of periods considered will
effectively be reduced by 3dB per doubling of the number of averages taken
[4]
. The S/N
gain in this case is shown by:
(2.3) 10log dB N A =

where N is the number of averages.

Similarly, S/N gain will also relate to the order of the MLS sequence as it determines the
length of the signal. Pre-emphasis applied to the signal (in the form of e.g. a pink noise
spectrum) further increases the potential gain and noise rejection capabilities of the
11

system while it is particularly useful for the common situation where the background
noise spectrum is not flat
[5].

Limitations for the technique are found when considering system non-linearities and time
variances. The former can be considered negligible in the context of building acoustics
[4]

while more care should be taken to avoid time variances, as measurement errors can be
introduced. A practical aspect of MLS signals however is that absolute level can be easily
determined using a sound level meter (SLM), a characteristic that is essential for
measurements in need of level calibration.

The use of MLS cannot be recommended for deriving a general purpose impulse
response but should rather be considered for specialized tasks such as speech parameters
measurement on an absolute level basis
[6]
. Given a level calibrated system nonetheless,
this approach can also be actualized by post-processing of the measurements. The latter
can be obtained using any suitable form of excitation signal thus, again not necessitating
the use of MLS signals in this context.

2.2.1.3 Sine sweep (exponential)
The exponential sine sweep test signal (or log-Time Stretched Pulse) is a sine wave
sweeping exponentially in time over the desired frequency range (Figure 2.4).
Figure 2.4. Exponentially swept sine example, a) Input b) System response to input

The idea of sine wave utilization for impulse response measurement is not new (see
Griesinger
[7]
). However, the method was refined and further developed to its current
12

form by Farina
[ 8 ]
who showed that an exponentially (rather than linearly) varying
frequency sweep could allow to simultaneously deconvolve a systems linear impulse
response and separate impulse responses for each harmonic distortion order. The method
is based in generating and using an accurate inverse filter f(t) that is capable of packing
the input signal S(t) into a delayed Dirac delta function (t). Deconvolution of the
systems impulse response, and separate distortion components, can then be performed
by convolving the output signal S(t) with the inverse filter f(t). Another study relating
harmonic distortion induced errors and signal level has shown that a reduced error in the
impulse response can be expected with the use of exponential sweep, as compared to
linear, consequently implying that a higher than before signal level can be used while
improving confidence in the measurement
[ 9 ]
. Considering, moreover, specific
applications such as recordings for auralization purposes that require in excess of 90dB
S/N
[10]
the exponentially swept sine would be the only option for accurately obtaining
such results.

Furthermore, time aliasing problems are dealt with by including a segment of silence at
the end of the sweep. Also, as opposed to MLS, if naively the time window for data
analysis is not made long enough the (late) decay tail can only be lost during the
deconvolution process, and not fold back to the beginning of the impulse response thus
altering its character.

Another advantage in terms of time variance is that a sine sweep can be made longer to
compensate for minor time variances of a system. By avoiding the multiple average
approach the resulting impulse response is not affected (as opposed to MLS) by the time
variation, as only a single measurement is involved. The effective S/N can be
significantly improved in this case as more energy is spread over a longer time. BS EN
ISO 18233: 2006
[ 11 ]
recommends the use of a single very long sweep to enhance
confidence in the measurement.

It has been experimentally demonstrated that a longer signal duration is superior to the
averaging technique, nonetheless, new advancements in the sine sweep methodology
[12]

13

although not implying preference, provide a fix to reduce the error to an acceptable level
for cases where averaging cannot be avoided.

Finally, another significant feature is that sine sweeps do not require a tight
synchronization of the sampling clock for the signal generation and recording units. As
such, measurements can be easily performed even if using external sources (i.e. different
than the measurement processing unit). It has been shown that common mismatches
between the sampling clocks of playback and recording devices in such cases do not
result in estimation errors of the systems impulse response
[8]
.

The assumption

that exponential sine sweep based measurements are the most
advantageous choice for the majority of transfer function measurement conditions is
supported in the literature
[10]
.

2.2.1.4 STIPA signal
The STIPA signal was formed for use with the STIPA method, intended for the
prediction of speech intelligibility of sound systems (see section 2.4.3.2). The signal is
spectrally shaped (to approximate a generic speech spectrum) pseudo-random noise
(Figure 2.5), having the distinctive attribute of incorporating the modulation frequencies
(two modulation frequencies per octave, as defined in ISO 60268-16
[13]
) that are used in
the MTF theory (see section 2.4.3). This characteristic, while facilitating measurements
without the requirement for source-receiver synchronization in the time scale, also
enables a significant simplification of the measurement procedure
[14, 15, 16]
giving the
signal its main or perhaps the only significant advantage over the use of other test signals.
In contrast, due to the fact that the two modulation frequencies are used simultaneously,
the negative effect of fluctuating or impulsive noise in a given measurement session is
increased
[13]
.

The STIPA signal is less or not affected by certain types (with some exceptions, e.g.
compression) of typical online signal processing that is found in sound system
configurations, and normally affecting signals like MLS and sine sweeps (see Mapp
[17]
).
14

However, disabling the particular units during measurements (this not being a
straightforward action on occasions) solves the problem for the latter.

Figure 2.5. STIPA signal sample incorporating a pink spectrum, a) Signal spectrum, b) Signal time history (5sec),
c) Sample of typical speech time history (5sec)

As previously noted
[18]
, potential differences exist between STIPA signals compiled by
different manufacturers (e.g. pink or speech shaped spectrum). Mapp performed a series
of measurements comparing a number of STIPA meters and showed that the signal
differences among the manufacturers could to some extent be responsible for
discrepancies in what would be expected to be comparable results
[ 19 ]
. In the same
publication it is pointed out that the problem was acknowledged by the manufacturers
who subsequently revised their signals thus, eliminating any concerns in this respect.

2.2.1.5 Other signals
Alternative test signals that were widely used prior to the introduction of more elaborate
options typically include random noise, filtered to give a white (flat) or pink (-3 dB per
octave) spectrum; the latter employed to improve the S/N for lower frequency bands. Due
to the randomness characteristic in this case, a measurement of adequate length was
needed so as to reduce residual noise in the impulse response. The methods are however
considered largely outdated and normally not used.

Different means of sources can be used in an attempt to approximate an ideal pulse (e.g.
starting pistol, balloons). The latter commonly do not give the required accuracy with
respect to directivity control and frequency content reproducibility thus, compromising
15

measurement consistency. On a practical/feasible approach however, screening purposes
could be assumed as sources of the type simplify to an extent the measurement procedure.

It should be noted that different spectra can be applied to particular test signals (e.g. MLS,
Sine sweep etc.) to account for male or female type sources. This is done in an attempt to
more closely approximate the effect of different talkers by increasing the S/N at the
desired frequency ranges for a more efficient measurement methodology. Post processing
of the data, however, can also allow for a more explicit approach if necessary.

2.2.2 Acoustic Parameters
There is a multitude of parameters that can be used to acoustically describe a space.
While most of these parameters can be directly derived from the spaces impulse
response a number of additional measures complement the principal record, considering
also the existence of different categories that have a more specialized focus. The
following paragraphs discuss the parameters that are most relevant in the scope of the
current study while incorporating in addition some epigrammatic historical information
on the process of acoustic measure advancements.

2.2.2.1 Reverberation Times Indexes (RT)
Reverberation time (RT) is defined as the time taken for a sound to decay by 60dB (T
60
)
after the excitation source has ceased (Figure 2.6). According to classical theory as
proposed by Sabine in 1922, RT relates to the volume and total absorption of the
enclosed space considered, by the well known Sabines empirical formula of:
60
0.161V
RT
A
= (2.4)
where V is the volume of the enclosure and A is the sum of absorption units derived from
the different surfaces within the room (air absorption is also included when very large
spaces are considered).

16

The calculation of RT indexes through classical theory is subject to inherent limitations
in the Sabine formula, introducing errors for conditions differentiating from simple. Thus,
other researchers have attempted to redefine RT for enhancing confidence on the results.
While refining more on specific aspects that are characteristic of the space considered,
other such approaches include the Norris-Eyring formula, cited in Skarlatos
[20]
(for large
spaces having the same absorption on all surfaces), the Fitzroy formula
[21]
(for spaces
where surfaces incorporate differentiating absorption coefficients) and the Sette-
Millington formula, cited in Kang
[22]
(for the estimation of absorption coefficients that
always result in values less than one, and also for RT estimation in spaces where surfaces
incorporate differentiating absorption coefficients). However, none of the early formulas
are able to account for the early reflections and direct sound of a sound field or for non-
diffuse field characteristics.

Going back to the definition of T
60
, it is often difficult in practice to obtain a decay of this
magnitude due to dynamic range restraints. For this reason, T
30
and T
20
are commonly
derived (through linear regression on the decay) to quantify RT, as they are the
equivalent decays between -5dB to -25dB and -5dB to -35dB respectively, normalized to
a 60dB decay (Figure 2.7). The first 5dB of the decay are excluded in this case to avoid
the influence of potential strong early reflections and/or direct sound. RT indexes can
provide some information on the diffusivity of a space as similar values among different
indexes would suggest a linear decay and therefore the presence of optimum diffusion.

Figure 2.6. Multi sloped sound decay accounting for a 60dB level drop

17

Figure 2.7. Multi sloped sound decay accounting for a 30dB level drop

Early Decay Time (EDT) is another measure relating to reverberance, defined in 1970
by Jordan
[23]
as the equivalent RT measured over the first 10dB of decay (0dB to -10dB).
It is a parameter of particular value with its significance summed in the fact that it
represents the time taken for the early reflections to reach the receiver. It therefore
comprises a more meaningful description of the early part of the sound decay. EDT is
considered an important cue in judging the character (and from a psychoacoustics point
of view, the size) of a room. Its importance on the basis of subjective perception (also of
early energy in general) is widely acknowledged and exploited in the area of speech
intelligibility (see sections 2.4.3 and 3.4.2).

As discussed in sections 2.2 and 2.2.1.1, RT indexes can be derived from the impulse
response through reverse integration. Following from Eq. 2.1, given that the excitation
source seizes at t=0 the sound intensity at a receiver position is given by:

0
'( ) ( ') ( ') ' ( ) ( ) ( ) ( )
t
t
S t g t t s t dt g x s t x dx g x s x t dx

= = =
} } }
(2.5)

To obtain the sound pressure level decay range that is necessary for the RT derivation
sound intensity is converted by:

{ }
( ) 10log ( ) ( )
t
SPL t f x s x t dx
=
}
(2.6)

18

It follows that the range of reverberation time indices, T
x
, and EDT can be calculated
from the decay corresponding to a given receiver position.

2.2.2.2 Spaciousness Parameters
Spaciousness parameters relate to the subjective spatial impression and are measured by
signal differentiation between the two ears thus, requiring two impulse responses per
listening position.

The Inter-Aural Cross Correlation (IACC), suggested by Damaske and Ando in 1972
[ 24]
, is one of the primary parameters associated with spaciousness (obtained using a
binaural receiver). IACC provides information on the correlation between the signals
received at the two ears, while it has been shown by its authors to relate to the directional
perception of a source (high IACC value at a single delay time) and the general
perception of field diffuseness (low IACC suggests high diffusivity). The measure could
thus be used for a general sound field quality assessment while also potentially assisting
in source positioning and aiming, particularly for multi source conditions to optimize the
subjective perception of sound quality. Early reflections (for spaciousness) or late
reverberant sound (for diffuseness) or both are accounted for, depending on the
integration limits, and the measure can be generically defined in a normalized form as:
{ }
0
0 0
0
1/ 2
2 2
0 0
( ) ( )
[ ( )] [ ( )]
t
r l
rl
t t
r l
g t g t dt
g t dt g t dt
t
+
=
}
} }
(2.7)

where g
r
and g
l
are the impulse responses corresponding to the right and left ear.
rl

denotes the correlation function for which the maximum of its absolute value within the
range 1ms t < is called the IACC
[1, 24]
.

Additional spaciousness parameters would include the lateral energy fraction (LF) that is
a ratio of the energy from early lateral reflections over the total energy arriving at the
19

receiver within the first 80ms. Similar parameters are the early lateral energy fraction
(LEF) and Lateral Fraction Cosine (LFC).

2.2.2.3 Energy Ratios
Energy ratios are closely related to the acoustical character of a space and speech
intelligibility. Their measurement usually requires omni-directional equipment, though
variable directionality is allowed depending on the measurement purpose (e.g. for speech
intelligibility measurement). Energy ratios are derived from the impulse response, g(t), on
the basis of comparing the useful energy (typically for speech, the direct sound and the
first 50 ms of energy, though the time limit is often open for interpretation) to the total or
late arriving energy (arriving after the time threshold defining early energy).

Definition (D) or Deutlichkeit as proposed by Thiele
[ 25 ]
is one of the earliest
attempts to use energy ratios in quantifying the effect of a space on room acoustics, and
relates to the distinctness of sound. The measure defines a condition as a percentage; the
higher the value, the better the definition of sound, while it has been found that a good
correlation between intelligibility and D is the case (cited in Kuttruff
[1]
). With the direct
sound included in both parts of the ratio, D can be defined as:
50
2
0
2
0
[ ( )]
100%
[ ( )]
ms
g t dt
D
g t dt
=
}
}
(2.8)

The Clarity index (C) or Klarheitsmass was introduced by Reichardt et al.
[26]
to
characterize the transparency/clarity of music performances. However, the measure has
been found
[27]
to directly relate to speech intelligibility given a lower time limit, 50ms
for C
50
(originally 80ms) despite the fact that it can not account for background noise or
noise masking effects. A number of alternative indices (e.g. 35ms) based on the
differentiating ear integration time when relating to signal level or frequency have been
suggested in the literature (e.g. see Bradley
[28]
), nonetheless it is the 50ms threshold that
prevailed as a widely accepted useful energy time threshold for speech applications. The
Clarity index is defined as in Eq.2.9 (original) and Eq. 2.10 (for speech):
20

80
2
0
10
2
80
[ ( )]
10log
[ ( )]
ms
ms
g t dt
C
g t dt
=

)
}
}
dB
`
(2.9)
50
2
0
50 10
2
50
[ ( )]
10log
[ ( )]
ms
ms
g t dt
C
g t dt
=

)
}
}
dB
`
(2.10)

The Centre time (T
s
) or Schwerpunktszeit was first introduced by Kurer
[29]
in 1969
and can be defined as the time of the centre of gravity of the squared impulse response.
Given that a decrease in the value of T
s
is proportional to the reflection time delay when
compared to direct sound, a degree of (negative) correlation has been shown to exist with
speech intelligibility (cited in Kuttruff
[1]
).
2
0
2
0
[ ( )]
[ ( )]
s
g t tdt
T
g t dt
=
}
}
(2.11)

2.2.2.4 Levels
Strength (G) or Strength factor is a measure relating to the overall energy transferred
from the source to the receiver after subtracting the direct field influence. This is
generically defined as:
2
0
10
2
0
[ ( )]
10log
[ ( )]
A
g t dt
G
g t dt
=

)
}
}
dB
`
(2.12)
where g
A
(t) is the counterpart impulse response measured in an anechoic room at a
distance of 10m.

The Late Lateral Strength (LG) is an alternative measure of the energy transferred to
the receiver after a given time threshold (usually 80ms) and relating to the sense of
listener envelopment. As such, only octave bands up to 1kHz are considered so as to
account for the fact that higher frequencies do not contribute much in listener
envelopment
[1]
. can be defined as:
80
LG
21

2
80
80 10
2
0
[ ( )]
10log
[ ( )]
ms
A
g t dt
LG dB
g t dt
=

)
}
}

`
(2.13)

2.2.2.5 Speech Intelligibility Parameters
Several parameters have been adopted in the process of evaluating speech intelligibility.
With the subjective (direct) methodologies (see section 2.4.1) being considered as most
accurate, having nonetheless a number of advantages and disadvantages, computer based
(indirect) methods give significant advantages in terms of practicality and thus simplicity
of measurement. A separate section is dedicated on the more widely used MTF theory
and its STI derivatives (see section 2.4.3) as they comprise the main approach in speech
applications. The following paragraphs present some of the earlier parameters relating to
intelligibility, complementing at this stage the C and D energy ratios as previously
discussed.

The Articulation Index (AI) was based on the work of French and Steinberg at Bell
Laboratories in 1947
[30]
and reconsidered by Kryter in 1962
[31]
. Commonly referred to
nowadays as the speech intelligibility index (SII), it comprises one of the first attempts
to measure intelligibility
[ 32 , 33 ]
. The basis of AI theory suggests that a speech
communication system can be divided in twenty frequency bands (later reduced to five
bands for simplicity), each carrying a different contribution to intelligibility. With the
total contribution being the sum of the individual bands contribution, S/N ratios are
derived for each of the latter and weighted to yield a result. AI assessment varies from 0
to 1 with 0.3 or below considered unsatisfactory, 0.3 to 0.5 satisfactory, 0.5 to 0.7 good
and above 0.7 excellent.

The Percentage Articulation Loss of Consonants (%ALcons) was originally
developed from the early findings of Peutz in 1971
[ 34 ]
. Having determined that
intelligibility was proportional to RT, room volume and the distance between the talker
and listener, Peutz concluded that it was the loss of consonants that mostly reduced
intelligibility. In this sense, the %ALcons measure refers to the percentage of consonants
22

that will be lost during the transmission of speech through a system. Optimal values are
highlighted at a 5%ALcons with 10% considered good and more than 15% unacceptable.
The main problem here, however, is that the result is derived only from the 1/3 octave
band centered at 2000Hz. All other frequency bands are simply ignored. Moreover, there
is no relation to parameters other than RT and also the measurement procedure does not
include vowels, something that could cause a systematic error with respect to word tests
[35]
. For these reasons and despite the fact that the method has gained popularity over
earlier years, it does not generally comprise a reliable assessment of intelligibility for the
majority of potential conditions.

The Speech Interference Level (SIL) is a simple method for predicting or assessing
speech intelligibility in cases of direct communication in noisy environments
[ 36 ]
. In
obtaining a result, the vocal effort of the speaker (accounting for the Lombard effect) and
the distance from the listener are taken into account, though the method is normally not
used where other methods can be applied. SIL is defined as the difference between the
speech level (L
S,A,L
) and the speech interference level of noise (L
SIL
), both determined at
the listeners position for the octave bands in the range of 500Hz to 4kHz. Fair speech
communication is ensured if the difference in levels, SIL= L
S,A,L
L
SIL
, is > 10dB. BS
EN ISO 9921: 2003
[36]
gives a more detailed description of the method.

2.3 Fundamental attributes of speech
Speech is a continuous waveform having a wide frequency range starting from below
125Hz and extending to above 8kHz. Fundamental frequencies (100Hz - 400Hz) vary
among genders being, on average, about 100Hz for men and 200Hz for women. The
latter are the basis for a series of changing harmonics, called formants, that are created at
integer multiples of the fundamental frequency and are partly the means to determine the
character of an individuals voice.

Formants create the various vowel sounds and the transitions among them and are
considered as a relatively efficient way to generate sound
[37]
. Their amplitude can be up
23

to 27dB higher than consonants which are all impulsive in character. Consonants are
separated in two categories, voiced or unvoiced, the latter being particularly quiet.
Common durations for vowels and consonants are 90ms and 20ms respectively.
Considering that energy in speech is carried mainly by vowels the latter emerge as a key
parameter in potential masking effects over the consonants as intelligibility is subject
mainly to consonant comprehension. It should be noted that the audible spectrum of
speech is not flat, with a common trend suggesting the presence of more energy in lower
frequencies (Figure 2.8).

Sound that is heard in speech is organized into words or smaller elements that form the
latter, phonemes. Common rates (frequencies) of phonemic production are much lower
than the audible frequency range corresponding to only a few Hz. Audible speech sounds
organized into packets of phonemic information, can be modeled by amplitude
modulating a wide-band signal. Thus a second spectrum in speech, defining the rate at
which humans utter phonemes, is highlighted as the modulation spectrum. This can be
represented by fourteen frequencies spaced at one-third octave intervals ranging from
0.63Hz to about 16Hz, with a spectrum that has higher values at the middle frequencies
(Figure 2.8)
[13]
. Given its impact in the context of speech communication, the modulation
spectrum comprises a key parameter of the MTF theory and resultant STI method.
Figure 2.8. Typical audio and modulation spectra for speech

2.4 Speech intelligibility measurement methodologies
Speech intelligibility is a continuous and dynamic process, the assessment of which can
be based on a multitude of different elements for a wide range of conditions. On this
24

basis, direct and indirect methods comprise the means to measure this function through a
number of different approaches.

2.4.1 Statistical (Direct) Measures of Speech Intelligibility
Direct measurement methods comprise of human based techniques that make use of the
subjective impression of trained listener subjects. The test procedures involve listening to
sets of words individually, in pairs or within carrier phrases. Stimulants, produced by
trained speakers, comprise of specific word lists designed to assess specific aspects of
speech transmission. The outcome of the test is judged on the basis of evaluating speech
comprehension by the listeners. Derivation of any results, however, unavoidably involves
the use of statistical theory.

Both BSI
[36]
and ANSI
[ 38 ]
describe the use of three methods though with minor
alterations among the two documents. These methods are summarized and presented in
the following paragraphs:

The Modified Rhyme Test (MRT)
[ 39 ]
is based on a dataset of 50 six-word
(monosyllabic English) lists. The structure of the words is based on a consonant-vowel-
consonant (CVC) sequence although CV or VC structures are also present in some
instances. Words within a list are audibly related primarily in terms of a vowel sound that
is included in every word. Each word is initiated (or terminated) by the same consonant
phoneme or phoneme cluster and terminated (or initiated) using six different phonemic
elements thus, forming a list of six word elements e.g. bat-bad-back-bass-ban-bath.
Listeners are asked to identify the word spoken (out of six), usually within the context of
a carrier phrase. Results are based on the indication of errors in initial and final consonant
sound comprehension and can be analyzed in terms of the words correctly identified,
words not correctly identified and the frequency of repeated errors in specific consonant
comprehension situations.

The Diagnostic Rhyme Test (DRT) (Voiers 1977, cited in Steeneken
[40]
) is similar to
MRT but uses a list of 192 CVC words (monosyllabic English) arranged in 96 rhyming
25

pairs. The words themselves audibly differ only in the initial consonant phoneme (e.g.
key-tea) and listeners are asked to identify the word spoken out of a pair, without the use
of carrier phrases.

The Phonetically Balanced (PB) word list test (Egan 1948, cited in Katz
[41]
) comprises
20 lists having 50 CVC syllables/words each. Here, the initial and final consonant in each
list appear in accordance with their frequency of use in these positions. The words within
a given list are presented in a different (random) order each time the list is used, while
every word appears within the same carrier phrase. The specific method normally
requires more training for listeners and talkers and is particularly sensitive in the S/N;
small changes in the latter suggesting larger changes in the intelligibility results.

Subjective methods are by far the most accurate and reliable type of measurement of
speech intelligibility. With the psychoacoustics of speech perception not yet being clearly
defined, the obvious advantage over their objective counterparts can be highlighted.
However, the testing procedure is considered difficult to implement and can result in an
expensive scheme. Consequently, the objective counterpart alternatives are normally
preferred.

2.4.2 Objective (Indirect) Measures of Speech Intelligibility
With computer based measures being an efficient alternative for assessing speech
intelligibility several parameters have been adopted in the process. A number of these
parameters have been discussed in section 2.2.2, most notably the Clarity and
Deutlichkeit indexes, EDT and SII that are considered a valuable indication of a spaces
acoustic characteristics. In the following section however the theoretical background and
the consequent method that is typically employed in a speech intelligibility assessment,
namely the STI method, is presented.

26

2.4.3 The Modulation Transfer Function (MTF)
Audio signals such as speech or music that propagate within an enclosure will reach a
receiver position at various time delays given their reflected energy in addition to the
direct field. This typically results in a smoothing of their spectrum when compared to the
original signal, while at the same time interfering noise and other effects (e.g. echoes)
will alter the signal in a comparable manner. Comprising essentially a smearing effect (as
described by Houtgast et al.
[42, 43]
) the amount of reverberation (and unwanted effects)
relates to the extent of smoothness in the original envelope at the input.

A signal that incorporates a simple, well defined envelope can be used to excite the input
thus enabling a subsequent comparison to the envelope of the received signal, or output,
to quantify the smearing effect. However, long-established signals using step function or
pulse envelopes are unable under the MTF approach to produce a result that relates to the
subjective perception of intelligibility for situations where decay curves differentiate
substantially from strictly exponential. In this sense, an alternative approach for assessing
a rooms objective characteristics was suggested by Houtgast et al.
[44]
, as previously
introduced within the field of optics (see Houtgast et al.
[42]
); the new approach
reintroduced in this case the use of a sine-wave modulated envelope with the aim to
obtain a closer correlation to subjective impression.

The theory was based on the assumption of an enclosure being a linear system.
Consequently, modulations are defined by the intensity envelope of the signal, given that
on this basis only, reverberation and/or interfering noise will affect solely the depth of
modulation of a sinusoidal signal without changing its shape
[13]
. With the modulation
being dependent on the modulation frequency applied, see Table 2.1, a transmission path
can be quantified by the decrease of the modulation depth, by way of a comparison of the
test signals modulation index, m
i
, to the modulation index at the receiver, m
o
as a
function of modulation frequency, Eq 2.14 (Figure 2.9).

( )
o
i
m
m F
m
=
(2.14)
27

This is defined as the Modulation Transfer Function (MTF)
[42]
and can be interpreted
in terms of an apparent S/N ratio as in Eq. 2.15, to form the theoretical basis for
derivation of a group of parameters, closely relating to speech intelligibility
[42, 43, 13]
.

( )
10log
1 ( )
App
m F
SNR dB
m F
| |
=
|
\ .
(2.15)
MTF correctly accounts for the influence of reverberation, distinct echoes and interfering
noise, while effectively forming a low pass filter in its domain.

Later publications by the authors of MTF have also described refined versions with
applications to telecommunications systems
[45]
, and an extended theoretical approach for
use of the method as a prediction tool in room acoustics, utilizing information from
design specifications (e.g. Room volume and RT)
[46]
. Calculation of results in this case
is in line with statistical room acoustics where m(F) can be mathematically derived via
the RT (T) and S/N (comprised of the total reverberant level of speech over the noise
level at the receiver position), as shown in Eq. 2.16.
( )
1/ 2
2
1
/ /10
( ) 1 2 1 10
13.8
S N
T
m F F t

(
| |
(
= + +
(
|

\ .
(

(2.16)
where F is the modulation frequency applied. The two terms in the second part of the
equation relate independently to the effect of reverberation and interfering noise.

In close succession, Plomp et al. introduced their approach for the prediction of MTF
using an image source computer model
[47]
, while shortly after Rietschote et al. published
a paper in a comparable context, using ray tracing models
[ 48 ]
(see section 2.7 for
computer simulation). Schroeder
[49]
had earlier presented his view on obtaining the MTF
through a Fourier transform of the squared impulse response for linear time-invariant
systems. This formed the basis for use of the MTF in computer modeling and speech
applications in particular, based on its main derivative the STI. This single figure of merit
formed the means to establish the efficiency of the different approaches as described,
when compared to subjective methods of assessment; commonly PB scores have been
used. The STI method is presented in detail in the following section.
28

Figure 2.9. Input/Output comparison with respect to modulation depth and resulting MTF spectrum

Octave Band (Hz) 125 250 500 1k 2k 4k 8k
Modulation frequency, f
0.63 m
0.8
1
1.25
1.6
2
2.5
3.15
4
5
6.3
8
10
12.5

Table 2. 1. Matrix used for the determination of MTF in the 14
modulation frequencies and seven octave bands

29

2.4.3.1 The Speech Transmission Index (STI)
The STI methodology is based on establishing the MTF values, m(F), for 98 data points.
These are obtained from a combination of 14 modulation frequencies in the range of
0.63Hz to 12.5Hz (Table 2.1), using 1/3 octave intervals, and seven octave bands in the
range of 125Hz to 8kHz, giving a typical range for speech as demonstrated in figure 2.8.
On the basis that the phonemes comprising speech are characterized by their distinctive
frequency spectrum, speech clarity requires that the spectral differences of the phonemes
are preserved
[13]
. The spectral differences of a signal when compared at the input and
output of a transmission system (considered as a filter) are characterized by the product
of the MTF measurement for the seven octave bands. Unwanted effects such as distortion
of the (speech) signal from reverberation or interfering noise result in a reduction of the
sine wave modulated envelope depth thus, the effect on the signal is reflected on the
change of these fluctuations within the envelope function. As such, the MTF values
enable derivation of a single parameter directly relating to the subjective impression of
speech intelligibility
[42, 50, 51]
, namely the STI.

Following from Eq. 2.15, calculation of the STI involves the determination of
transmission indices (TI) from the apparent S/N ratio, specific for octave band k and
frequency f as shown in Eq. 2.17. As STI is linearly related to S/N ratios in a 30dB range
from -15dB to 15dB
[13]
, all related S/N ratios are adjusted to fit within the latter range
and thus allow for STI values in the range of 0 to 1.

,
,
15
30
k f
k f
SNR
TI
+
=
(2.17)
The 14 transmission indices, obtained for each octave band, are then averaged to give the
modulation transfer index (MTI
k
), specific for octave band k, Eq. 2.18.

14
,
1
1
14
k k
f
f MTI TI
=
=
(2.18)

While the final STI value is obtained through a weighted summation of the modulation
transfer indices for the seven octave bands, the revised form of the latter (STI
r
)
[50]

requires a number of corrections in the calculation process to account for auditory
30

masking and reception threshold. With the methodology to this stage remaining unaltered,
weighting and redundancy corrections are applied as follows:

Initially, the intensities of octave band specific audio masking effects (I
am,k
) are
determined using Eq. 2.19

, 1 am k k I I am f =
(2.19)
where
1 k
I

is the intensity of the masking signal in band k -1 i.e. an octave lower than k,
amf is the auditory masking factor related to the absolute reception threshold by means of
lower limit of the masking noise level within each octave band (I
rs,k
) and dependent on
masking signal level (as defined in BS EN ISO 60268-16
[13]
).

Applying the corrections for auditory masking and reception threshold gives the
corrected version of m, Eq. 2.20

, ,
, ,
'
k
k f k f
k am k rs
I
m m
I I I
=
+ + k
(2.20)
where is the modulation index for octave band k and frequency f, is the
corrected index and
, k f
m
,
'
k f
m
k I is the signals intensity in octave band k.

Subsequently, the effective S/N ratio for octave band k and modulation frequency f takes
a revised form to represent the corrections made, as shown in Eq. 2.21.

,
,
,
'
10log
1 '
k f
k f
k f
m
SNR dB
m
=
(2.21)

Using the corrected S/N ratios, updated TI
k,f
and subsequently updated MTI
k
values can
be derived. Ultimately, the revised speech transmission index, STI
r
, is obtained through a
weighted summation of the modulation transfer indices for the seven octave bands and
the corresponding redundancy corrections (see BS EN ISO 60268-16
[13]
), as shown in Eq.
2.22

7 6
( 1)
1 1
r k k k k k
k k
STI MTI MTI MTI o | +
= =
=
(2.22)
31

where
k
o represents the octave weighting factor and
k
| represents the redundancy
correction factor related to the contribution of adjacent frequency bands. As the particular
corrections are (speaker) gender depended, optimal gender related weighting and
redundancy factors along with threshold corrections are given in BS EN ISO 60268-16
[13]
.

STI
r
has been found to give marginally better results (4-6% in terms of CVC word scores)
[52]
when compared to the 1980 version. Results are currently rated as shown in the STI
scale, relating STI scores to the subjective perception of intelligibility (Table 2.2).

Table 2.2. STI scale and equivalent subjective perception of speech intelligibility (current)
STI 0 - 0.3 0.31 - 0.45 0.46 - 0.6 0.61 - 0.75 0.76 - 1
Subjective perception of intelligibility Bad Poor Fair Good Excellent

What is apparent is that the scale could prove to be insufficient to cover the variety of
conditions in which its use may be constructive and is rather linked to a more general
approach on the topic. As has been argued by a number of researchers
[ 53, 54, 55, 56]
,
alternative or perhaps additional scales might be more appropriate for different cases,
including assessments based on non-native or hearing impaired listeners and cases where
the application scope is rather different. In a typical example, environments concerned
about safety generally require an STI of 0.50 to comply with the safety code for
emergency announcements. In such cases a potential result of 0.60 would clearly qualify
the transmission system for its intended use and render it fundamentally successful,
however, considering the current scale the outcome would simply be rated as fair.
Moreover, a condition that might me labeled as good may in fact be only fair for a
hearing impaired or non-native listener (see van Wijngaarden et al.
[53, 54]
). Considering
analogous examples, alternative rating systems emerge as necessary for a better
interpretation of results in an appropriate context for individual cases.

32

Taking another approach, STI could also be described as a metric assessing a worst case
scenario (i.e. potentially underestimating intelligibility scores in some cases) on the basis
of the monaural character of measurements that disregard any advantages of binaural
hearing in speech comprehension. This concern is particularly relevant for situations
where speech and interfering noise originate from different directions. To address this
gap a binaural model has been proposed
[57, 58]
based on a simplified approach of the
binaural hearing mechanisms (see section 2.5). The study confirms the validity of a
better ear approach from binaural measurement set ups, using the better result (per
octave band) from each ear. The specific method was shown to be comparable to the
binaural model of STI and thus constructive in the interpretation of results for a number
of conditions.

In concluding this section, a number of limitations apply to the STI method due to the
form of analysis (inherent limitations of the MTF theory) and typical test signals that are
used. The method should not be used to assess transmission channels that introduce
frequency shifts or frequency multiplication, or channels that include vocoders
[13]
. A
speech-based signal is recommended in the literature
[59]
in special cases, given advances
in developing such stimulants for efficient use, see van Wijngaarden et al.
[59]
and
Drullman et al.
[60]
.

2.4.3.2 The STI Public Address (STIPA) method
The STIPA method is effectively a simplification of the STI, aiming in the assessment of
Public Address systems. STIPA applies 12 modulation frequencies, two to each of the
octave bands considered, see table 2.3. For male speech the octaves centered at 125Hz
and 250Hz are combined, although not always the case in relation to test signals as
compiled by different manufacturers
[18]
, while the former band is simply ignored for
female speech. Frequency weighting relating to talker gender is adopted, see table 2.4.
Each frequency band is modulated by the two corresponding modulation frequencies
simultaneously thus, increasing the negative effect of fluctuating or impulsive noise
[13]
.
However, the method does not require for the measurement stimulant to be synchronized
with the receiver (due to the use of spectrally shaped, pseudorandom noise that
33

incorporates the signal modulations) therefore, simplifying the implementation technique
[61]
.

Octave Band (Hz) 125-250 500 1k 2k 4k 8k
First modulation frequency (Hz) 1 0.63 2 1.25 0.8 2.5
Second modulation frequency (Hz) 5 3.15 10 6.25 4 12.5
Table 2.3. STIPA modulation frequencies

Octave Band (Hz) 125-250 500 1k 2k 4k 8k
Male 0.127 0.23 0.233 0.309 0.224 0.173
0.078 0.065 0.011 0.047 0.095 -
Female 0.117 0.223 0.216 0.328 0.25 0.194
0.099 0.066 0.062 0.025 0.076 -
Table 2.4. Weighting factors adopted by STIPA

Limitations of STIPA, as described in BS EN ISO 60268-16
[13]
, dictate that the method
should not be used for public address systems that introduce vocoders, frequency shifts or
frequency multiplication (though it has been shown by Mapp
[17]
that measurements using
a STIPA signal are less or not affected by frequency shifters). Also, its use should be
avoided with systems that show strong nonlinear distortion or when impulsive
background noise is present. It should be noted that STIPA (and STI) cannot correctly
account for poor P.A. frequency response.

2.5 Implications of binaural hearing
The advantages introduced by binaural hearing as a consequence of the human head
shape and distance between the ears are based on a group of functions, including cross
correlation processing of the signal at the two ears (see IACC in section 2.2.2.2), among
others. Two effects in this sense, the Interaural Time Difference (ITD) and Interaural
Level Difference (ILD), are mainly responsible for the widely acknowledged, see
Colburn
[ 62 ]
, localization abilities of a binaural listening system, primarily for the
horizontal plane on this basis.

34

ITD is a function of time delay, as measured for sound arriving at the two ears, due to the
relative position of the source with regard to the listener (Figure 2.10 (I)). It is based on
the phase shift caused by the interaural time delay, thus the function is employed by the
human perceptual model at lower frequencies. An upper frequency limit relates to the
directional angle of the source, e.g. 743Hz for a 90 angle. Above this frequency the ear
simply cannot resolve the phase differences, while the limit increases with more acute
source-listening angles
[ 63 ]
. ILD is based on the level difference at the ears due to
shadowing effects caused by the head (Figure 2.10 (II)). Given that objects whose size is
small compared to the wavelength will appear acoustically invisible, the function is
effectively introduced into human perception over a threshold frequency value, relating to
the size of the listeners head.

Figure 2.10. Effects of binaural hearing, I) ITD, based on phase difference (for lower frequen
II) ILD, based on level difference (for higher frequencies)
cies),

Bronkhorst and Plomp
[ 64 , 65 ]
studied the advantages of binaural hearing in noisy
environments (in terms of speech intelligibility) considering ITD and ILD, for normal
and hearing impaired listeners. Their results confirmed the advantage of a binaural
system (over monaural) for both categories of listeners, showing however a reduced
effect for hearing impaired listeners. It was also shown that the advantage of normal
hearing subjects over the hearing impaired were mainly based on ILD, the ITD benefit on
intelligibility being equal for both categories. Hawley et al.
[66]
support the advantage of
binaural hearing for speech intelligibility estimating the latter in a value of up to 7dB in
terms of speech reception threshold level for two or three interferers (speech or time-
reversed speech). An advantage of up to 4dB was found when using a single interferer of
a number of different types.
35

It is evident that the contribution of a binaural system on speech intelligibility should be
considered, due to a potentially significant advantage over a monaural basis. This
realization reflects recent advances in the measurement of STI (i.e. binaural STI),
comprising an attempt to better model a binaural perception and thus obtain a better
prediction of intelligibility in terms of STI (see section 2.4.3.1). In recent studies
[62]

moreover, Colburn argued that an additional parallel mechanism in binaural hearing
exists based on the combination of binaural information to extract monaural
information such as the overall level per frequency band, among others. While the
function of sound perception is generally not yet clearly defined for the wide range of
listening environments, Colburns suggestions are supported to an extent by the studies
on the binaural STI model, being potentially an approximation of the parallel mechanism
as described; notably, the better ear approach in this case as described by Wijngaarden
[58]
,
essentially comprising a utilization of monaural information from a binaural dataset.

2.6 Sound fields in enclosed spaces for speech intelligibility
Intelligibility can vary significantly even for spaces that are closely related in terms of
architectural distinctiveness. Several factors that have an effect have been identified
through a detailed assessment of the involved spaces architecture (see section 2.6.1).
Conditions, however, become more complex with the introduction of a sound
reinforcement system (see section 2.6.2) as the transmission channel between the source
and receiver is matched with additional components. Subsequently, the parameters
relevant to the resolution of a problem can be separated in two groups, the first one
relating entirely to architectural design and the second addressing the implications
introduced by a public address (P.A.) system. The following sections discuss the
objective reasoning, based on these categories, behind the resultant sound field for natural
acoustics and sound system assisted conditions.

36

2.6.1 Sound fields in natural acoustics
Several parameters, involving a spaces architecture, need the attention of the designer
during the commissioning process. Considering intelligibility issues in particular, the
intrusion of extraneous noise that inevitably interferes with speech signals commonly
comprises one of the main problems faced. High background noise level (BGNL) can
generate masking effects in the sound field. The negative contribution has been shown to
be dependant on the direction, with respect to the listener, of both the speech signal and
interfering noise and subsequently their angle separation, see Bronkhorst and Plomp
[64,
65]
. Low values of the resultant S/N ratio are mainly responsible for impaired acoustic
performance
[67]
.

The significance of BGNL in terms of S/N has been extensively reported in the literature,
particularly for educational spaces that to a large extent share common characteristics in
their design, under the classroom acoustics category (see section 2.6.3). S/N ratios
however are subject to additional parameters since reverberation, high level echoes and
very early or late reflections can also be considered as unwanted elements. It can be
pointed out that reverberant sound usually carries content of lower frequency when
compared to the original signal. As low frequency noise is (also) considered a more
effective masker at high levels or when noise is louder than speech, reverberation time
and energy ratios generally appear as key parameters in describing a space. Nonetheless,
in terms of acoustically small spaces, such as typical classrooms at low frequencies,
conditions are somewhat altered given that wave effects dominate the sound field.
Acoustic resonances are generally separated and reflect more of the rooms individual
acoustics. Sound decay is more profoundly frequency depended thus, reverberation time
alone does not comprise in such a case an adequate descriptor of the acoustic conditions
[ 68]
(i.e. additional measures are needed to characterize the room and obtain a better
description of the acoustical environment).

Furthermore, very early reflections can result in blurring speech consequently impairing
subjective perception. Echoes, also responsible for a negative effect on intelligibility, are
attributed in analogous functions. Energy from previously uttered syllables arriving at a
37

later time when compared to direct sound can mask or obscure the sound of subsequent
syllables; the time delay and level of an echo being key factors in the process. Haas in
earlier studies using a single echo
[69]
showed that for a time delay of 1-30ms reflections
will pleasantly add up to the sound source and essentially not be perceived as an echo,
while longer delays will disturb the subjective perception of speech. However, it was
highlighted that the particular function is a dynamic one, depending on the RT, where a
longer RT will increase the immunity of the system to longer time delays. It is worth
mentioning that typically a 50ms time limit is used in speech applications to define useful
energy, while it has been shown that although not necessary, an 80ms limit can also on
occasions give acceptable results
[28, 70]
i.e. correlate with subjective intelligibility testing.

A number of additional parameters beyond the scope of the research should be
considered including focusing effects from concave planes and audible room modes,
functions that can potentially distort speech signals. Parameters, moreover, relating
purely to the subjective side of the various testing schemes would extend to the listeners
acuity and talkers gender, enunciation and rate of speech
[71]
.

2.6.2 Sound system assisted sound fields
A number of additional factors exist, addressing specifically the technical characteristics
of a system and directly affecting intelligibility. The following paragraphs discuss the
topic, while effectively constituting a complement to section 2.6.1.

In assessing a transmission channel, the various components of a sound system such as
microphones, amplifiers, loudspeakers and their related frequency responses form
accordingly the overall system response. This comprises a characteristic that is central in
any sound system design. In psychoacoustic terms, an adequately flat frequency response
is essential
[72]
as the loudness quality of a signal that is directly affecting intelligibility is
subject mainly on the technical characteristics of the reproduction system; this assumes
an initially good quality signal. A wide response would also be preferred; however,
speech doesnt need to sound natural in order to be intelligible. In the case of a high
quality system the ideal response would be in the range of 80Hz to 10kHz for male voices
38

and accurate consonant reproduction
[72]
. Lower frequency response, below 80Hz, can be
cut off as it falls out of the speech range, while having by definition the potential to result
in a particularly destructive masking at high levels
[72]
.

The directivity factor of microphones and especially loudspeakers has an important role
in the sense of minimizing meaningless reception (for microphones) and dispersion (for
loudspeakers) of sound thus, allowing more effective control of reverberation. In the case
of loudspeakers, S/N can be optimized by adjusting the radiation angle/direction,
therefore more effectively controlling the direct and near fields. Taking another approach,
the directivity and number of the loudspeakers will directly affect the coverage in terms
of SPL of a given system. A uniform coverage and consequently comparable
performance for a number of positions within a space is an essential characteristic in a
well performing space.

Considering loudspeaker interaction, a phenomenon that can be naturally generated in
rooms is comb filtering. The effect is a common filtering technique used by sound
engineers to attenuate (or emphasize) the multiples of a given fundamental frequency
through phase cancellation, in effect, a delay process controlling the gain at the feedback
path. Its significance lies in the fact that interacting loudspeakers can inadvertently affect
the character of the produced sound field, however the effect is generally unsought for
speech applications
[72]
.

Distortion is an additional function affecting signals passing through a transmission
channel as every electrical component has certain restraints and limitations in its domain.
Given that application extends beyond threshold conditions, the signal passing through
the system becomes subject to distortion effects. In the case of audio equipment, restrains
commonly occur for high signal levels where amplifier, microphone or loudspeaker
clipping will result in distorting a signal. The implications of the effect are most obvious
in this case, as additional noise is introduced to the system (i.e. additional masking
potential), therefore compromising speech intelligibility.

39

Intentional clipping and gain compensation in terms of signal processing i.e. compression,
forms a common technique in sound applications. However, its use on speech signals, as
suggested in the literature
[73]
, is limited to situations with unfavorable S/N ratios (below
0dB) and low fidelity systems.

2.6.3 (University) classroom acoustics
Classroom acoustics is a complex topic with a particular focus on speech intelligibility
issues. Given the objective reasoning behind unintelligible speech in rooms, a significant
amount of research has been undertaken on the topic. Following from previous sections,
Houtgast established in his 1980 study for typical classrooms
[67]
that a S/N of a
minimum 15dB, with regard to the (speech) signal, is needed for good speech
communication conditions. Bradley
[70]
and Sato
[74]
supported the assumption that the
control of BGNL (consequently the resultant S/N) is of more importance than the effects
of room acoustics on signal propagation. The latter study also highlighted the need, given
the Lombart effect, for having very low noise levels in unoccupied conditions so as to
subsequently achieve acceptable levels in occupied conditions, while Hodgson
[ 75 ]
,
though mainly based on RT products, emphasized the need to account for occupancy
effects at the design stage. Bradley et al.
[ 76 ]
in his studies on elementary school
classrooms confirmed the need for 15dB S/N for near ideal conditions while however
also pointing out that an acoustical environment appearing acceptable for adults may in
fact be less adequate for younger listeners. A range of increasing S/N, with decreasing
age of the listener, was found to be necessary to achieve comparable adult listening
results. Statistical analysis of the same data for a range of classrooms revealed a
maximum acceptable BGNL of 34dB, being very close to current British
[ 77 ]
and
American
[ 78 ]
recommendations for classroom design. Previous studies on a smaller
sample of classrooms resulted in a 30dB BGNL value, although accounting for hearing
impaired students and all age groups. Bistafa et al.
[79]
had previously suggested a BGNL
of 25dB and 20dB below speech levels at 1m in front of the talker, respectively for ideal
and acceptable conditions. Based on their statistical analysis from a range of classrooms
the recommended BGNL in combination with their optimum suggested RT (0.4-0.5sec)
would be expected to result in excess of 15dB S/N within typical classrooms.
40

Hodgson developed in 1999
[80]
a long term BGNL (and speech level) prediction method
using an empirical approach. Different methods can potentially be employed in
measuring absolute values. Shield et al.
[81]
, having established in their study of primary
schools that the main source of noise in a classroom is the students themselves, measured
noise levels using L
Aeq
as it was found to best represent classroom activity noise. The
average of multiple L
Aeq, 2min
was believed to give a good indication of the fluctuating
noise in classrooms during the day, thus held as a highly efficient methodology. Mapp
[82]

on another approach suggested an octave band L
10
for the measurement of BGNL given
that the spectrum of the noise would have a significant effect in assessing the octave
contributions of the speech signal to intelligibility. It follows that new legislation would
need to account in such case for a finer approach in terms of providing more detailed
recommendations for optimum BGNL i.e. recommend values for individual octave bands
rather than as a general level.

In terms of RT, Bradley in 1986
[88]
performed a comparative analysis of intelligibility
metrics in classrooms and concluded a 0.4-0.5 second RT as an optimum value. It is
worth noting that additional parameters such as EDT, T
s
and early-to-late ratios could be
predicted from the RT within 1dB for typical classrooms i.e. comparatively small sized
rooms. The same RT range was later supported by Bistafa
[79]
, again for typical
classrooms and in combination with a recommended BGNL, while Sato
[74]
who
measured occupied rooms found that reverberation values were about 10% greater when
compared to the counterpart unoccupied conditions. Other researchers however
[83, 84]

showed that defining an optimum value of RT is a dynamic process highly depended on
the speaker-listener-noise source relationship. On this basis it was established that an RT
of zero or near zero is optimum when the noise source is further than the speaker and non
zero when the noise source is between the speaker and listener for both normal and
hearing impaired subjects i.e. some reverberation is needed to enhance the speech signal
in adverse conditions.

41

Bradley in his 2003 study
[85]
highlighted the importance of early reflections for efficient
speech communication for both normal and hearing impaired listeners, while also
suggesting the Early Reflection Benefit (ERB) measure for the assessment of rooms in
this context. An increase of up to 9dB was found in the effective S/N, an effect to be
considered in any design. A later study by Yang
[84]
supported the results and suggested
that an enhancement of early reflections within a sound field rather than a stricter control
on RT would be more beneficial for speech intelligibility. On another approach, Sato
[74]

suggested that a design should also aim for a reduction of late-arriving reflection levels
for more distant listeners, e.g. at the back of a room, to levels well below the direct sound
and early reflections. This was believed to be a more direct/efficient methodology,
compared to RT control, in ensuring adequate speech communication conditions.

2.6.4 Relations between measures of speech intelligibility
The interrelations among measures for speech intelligibility and their individual relation
to the latter have been studied in the literature on the basis of quantifying their
mathematical relation, to enable inter-comparison and/or translation of results, and to
establish the potential correlation to intelligibility i.e. efficiency/quality of measure.
Bradley
[27, 28]
and Bistafa
[86]
have studied these interrelations for a range of conditions
and established a mathematical connection in a number of cases. Most notably, the C
50

ratio has been found to be linearly related to STI for simulated controlled natural
acoustics conditions, incorporating negligible background noise. An important side
outcome from this work is the derivation of a just noticeable difference (JND) relating to
the subjective perception of level changes to a sound field, subsequently enabling parallel
data processing methodologies that allow for an interpretation and further processing of
results using either of the measures. It should be noted nonetheless that in the C
50
-STI
relation individual errors of up to 0.1 STI within a given dataset were found to occur for
sound system assisted conditions, as previously established by Mapp
[ 87 ]
. This was
deduced from experiments covering a wide range of acoustic environments.

For small rooms, such as classrooms, using either a 50ms or 80ms time limit for defining
early energy can be expected to result in similar trends when compared to STI
[28]
thus,
42

implying that for speech applications an efficient assessment can be made using either
limit. It was established by Bradley
[88]
that EDT and measures using the 50ms limit (as
opposed to 80ms) could be reasonably accurately (within 1dB) predicted from the
reverberation times of a small room. The outcome concerning EDT is supported by Mapp
[87]
under low RT in sound system assisted conditions. The impact of the latter in the
context of the current study is that either RT or EDT could be used to describe a (small)
room; thus, giving a significant advantage and, as such, further confidence in the
computer simulation of an enclosed space, see sections 5.3.1 and 5.5.2.

The most widely used metric for speech intelligibility assessment, STI, has been
subjected to strong criticism on a number of occasions and on different grounds
[89, 18, 90]
.
Nonetheless, more commonly the measure has been found to give a high correlation to
subjective testing results; an outcome that has been extensively reported in the literature,
see section 2.4.3.1. Thus, STI is generally acknowledged as an efficient methodology.

As a final point and given the multitude of available intelligibility measures, a direct
comparison of the outcomes from different methods would enable an assessment on
common grounds where necessary. In 1998 the International Electro technical Committee
(IEC) published the Common Intelligibility Scale (CIS)
[ 91 ]
, relating a number of
measures on a single scale (Figure 2.11) while also including common subjective
methods (see section 2.4.1 for more details on subjective assessment methods).

The result is given as a CIS value, ranging from 0 (unintelligible) to 1 (excellent
intelligibility) and can comprise either a direct statement or form the basis for a later
translation of the outcome of a given methodology (original result), to describe the
assessment product.

43

Figure 2.11. The Common Intelligibility Scale as determined by the IEC
[91]

2.7 Sound field simulation
Sound propagation within enclosed spaces is the main focus in sound field simulation, as
used in this study. Widely used approaches based on the geometrical acoustics
framework are discussed in this section in the context of related simulation models,
applicable for the prediction of room acoustics and subsequent auralization techniques.

2.7.1 Geometrical acoustics
Primary functions in room acoustics such as reverberation or clarity are defined as time
domain effects. As such, the concept of geometrical acoustics appears as a suitable
framework for describing and simulating a sound field. The approach further enables a
relatively simple calculation of sound pressure levels and reverberation times
[68]
, also
appealing to a straightforward visualization of sound propagation.

In geometrical acoustics sound is interpreted as rays or sound particles carrying sound
energy. Travelling in straight lines, rays or particles are reflected off the room boundaries
44

losing energy from each reflection depending on the acoustic properties of the boundary
hit. The intensity of the ray also decreases by 1/r
2
for the distance travelled, given that
each ray corresponds to a small portion of a spherical wave (where r is the distance from
the centre of origin). A high number of emission angles must be used to efficiently
simulate a source, the emission pattern relating to the source directivity. The number of
propagation paths hitting a receiver position can be used to determine the energy received
at that point in the timeline, thus facilitating an estimation of the impulse response.

In geometrical acoustics the description of the sound field is reduced to energy, transition
time and direction of rays
[92]
thus the theory is generally valid if broadband signals are
used and the wavelengths of the frequencies considered are much shorter than the room
dimensions; i.e. rooms are acoustically large. While this prerequisite is normally satisfied
in room acoustics, a more stringent approach would involve the use of a cut off
frequency. The latter, also known as Schroeder limit
[93]
can be defined in the metric
system as:
( )
6
2000
s Hz
T
f c
A V
= ~ (2.23)
where c is the speed of sound, A is the absorption units, T is reverberation time and V is
the room volume.

The Schroeder limit relates to the density of eigenfrequencies in a room and enables a
subdivision of the room behaviour into different regions/frequency ranges. At frequencies
exceeding f
s
the density of eigenvalues is high enough to assume a strongly overlapping
modal distribution. As such, it is a convenient way to characterize a room as
(acoustically) small or large in terms of its prospective frequency response, thus
indirectly evaluating the applicability of geometrical acoustics theory.

Geometrical acoustics based simulation methods can be divided into two categories:
stochastic and deterministic methods, from which the two main room acoustics
simulation algorithms i.e. ray tracing and image sources respectively, originate.

45

2.7.1.1 Ray tracing (stochastic)
Digital simulation methods of room acoustics were first reported in the 1960s by
Schroeder et al
[94]
. The first publication detailing a ray tracing technique was published
in 1968 by Krokstad et al
[95]
, followed by a considerable increase in general interest and
consequently a rapid development of the field in later years.

In ray tracing sound is radiated as clusters of energy particles forming rays, in effect
simulating for the purposes of post processing, temporal delta functions. A sound source
(point source) is defined in a three dimensional space emitting an adequately large
number of rays in any direction, depending on the source directivity pattern. The rays,
carrying a certain amount of energy will propagate in the room, bouncing off its
boundaries when hit. On impact, rays are reflected to a different direction depending on
the type of reflection occurring since wave scattering is considered, while each impact
with a boundary reduces the particle energy according to the boundarys absorption
characteristics. Propagation continues until a rays energy is eliminated by surface and air
absorption or until a predefined ray truncation time limit is reached (figure 2.12).

A receiver is defined as a spherical volume in the space for the simple reason that the
probability of a ray intersecting a point receiver, thus triggering detection, is zero (a
surface receiver is also possible). Receiving positions sense particle movement through
the volume while the energy (intensity) carried and time elapsed from emission from the
source are also considered. When ray tracing is complete the impulse response at a
receiver position can be derived by counting of events i.e. rays having intersected the
detection volume, in combination with the timing and energy history information.

The density of ray detection events (carrying timing and angle of incidence information)
is directly related to the temporal resolution of the simulation. On this basis, ray tracing
techniques cannot efficiently meet the requirement for an acceptable sampling rate for
audio signal processing and auralization
[92]
. The number of rays used is, furthermore, a
factor closely connected with the resulting simulation accuracy and run time required. A
disadvantage of ray tracing thus emerges as the increased computation time, resultant
46

predominantly from the increased number of rays required for an accurate simulation,
particularly of large spaces. Considering a prediction of SPL only, earlier research has
shown that the source-receiver distance is of primary importance in determining the
number of rays required for an accurate prediction
[96]
. For multiple sources and evenly
distributed receiver positions in the areas of interest it was furthermore demonstrated that
a significantly reduced number of rays could be used, reducing the required run time by
the same factor. The number of rays required has been shown
[96, 97]
to be more important
for sound distribution parameters than for SPL while in addition being frequency
dependent, requiring more rays for higher frequencies
[97]
. Accordingly, integral
parameters such as clarity, strength or definition could be quickly estimated if necessary,
but only if assuming that reference to the absolute sound field characteristics is not
essential in the assessment
[92]
.

Figure 2.12. Ray tracing principle and example sound propagation paths

Given that an acceptable level of simulation accuracy is reached, an increased
computation time by means of additional rays is not justified except when given a target
to improve on the stochastic element in the prediction outcome. Additional computing
effort cannot compensate for the inherent limitations imposed by the simulation
algorithms, but can somewhat assist in a process of averaging out of potential errors.
Algorithmic limitations occur due to the absence of models for wave effects such as
47

diffraction and interference, limiting the efficiency of a method, nonetheless
independently of the computation load. While it has been shown that the resultant sound
field approximation is adequate for the purpose of assessing room acoustic conditions via
simulations
[92]
, additional limitations would relate to the model input data that are only
approximately defined. As such, room geometry and boundary acoustic characteristics
can have a significant impact on the derived results.

2.7.1.2 Image source method (deterministic)
The image source method has been efficiently used to generate synthetic reverberation in
small rooms by Allen and Berkley
[98]
in 1979. Nonetheless, their method was generally
restricted to simple room geometries i.e. rectangular rooms, given the intention to
establish a practical and easy to use method. Borish
[99]
in 1984 published his work based
on the Allen-Berkley model, extending the image source model to arbitrary polyhedra.

Image sources are created by mirroring the source sound ray at the reflecting plane, as
illustrated in figure 2.13. The mirror image creates a new virtual sound source (S)
emitting/reflecting the original ray specularly i.e. at a direction matching the angle of
incidence. The process is repeated for all new sources creating higher order reflections
until a predefined order of image sources is reached, see figure 2.14. The ray intensity is
reduced upon reflection from the incidence wall to 1- (compared to the impact intensity)
to account for the energy lost due to surface absorption. A deterministic energy spreading
by distance is further considered.

Figure 2. 13. Image source principle
48

The reflection order chosen can be seen as an analogy to the maximum truncation time
defined in ray tracing. Nonetheless, the specific aspect (indirectly defining a truncation
limit) is more critical in the image source method since the number of new virtual sources
increases exponentially
[92]
; each of the higher order images is mirrored by all walls in the
enclosure, however not all created images are valid. Accordingly, the predefined
reflection order limit significantly affects the computation time for the simulation, thus
imposing a considerable efficiency limitation on the method. The expression N(N-1)
i-1
gives the number of images for reflection order i (for i 1) and N room surfaces.
Allowing for a reflection order limit of i
0
, the total number of images is given by adding
these expressions up to the highest reflection order
[1]
, see Eq. 2.24:
0
0
( 1)
( )
2
i
N
N i N
N
1
=
(2.24)

The image source method assumes purely specular reflections, thus the source can be
traced back from the receiver i.e. S-R positions are interchangeable. After image sources
are created an audibility test has to be performed in order to determine the relevance of
every image in relation to a specific receiving point; the S-R reciprocity characteristic is
utilized on this basis. In effect, the route of the image sources, specific for each S-R
combination, is traced back from the receiving point to the source to ensure that the
sequence of surfaces impacted is the reverse of the original path
[100]
and thus images are
valid, i.e. images will be able to reach the receiver. Pre-processing techniques are used to
exclude images that are not relevant on this basis, see Vorlnder
[92]
, and thus reduce the
computation load.

An impulse response can be constructed for audible images at a receiving point
considering the time delay of each contribution and associated amplitude. Given a
superposition of all significant contributions at the point of interest, the distance of the
image (virtual) sources to the point receiver, see figure 2.14, will determine the time
structure of the impulse response and partially the event amplitude; the latter considers in
addition the absorption coefficient of the surfaces crossed by the straight line connecting
49

the virtual source to the receiver. Under the assumption that all sources (original and
virtual) emit the same signal simultaneously, the estimation of the impulse response is
performed by summing the energy contributions of all rays impacting the defined point
receiver, including direct sound from the original source
[1]
. A considerable advantage of
the method over ray tracing can be highlighted at this stage, as there are no limitations
relating to the temporal resolution of the system thus imposing no restrictions on the
choice of sampling rate. Consequently, audio signal processing and auralization can be
competently facilitated.

S (point)
R (point)
Figure 2. 14. Image source and resultant sound ray examples in a room

The image source method is precise solitary for the case of entirely reflecting surfaces.
However, given a small angle
o
of ray incidence i.e. considerably differentiating from
grazing incidence, it comprises a good approximation for the more common case of
absorptive room boundaries
[92]
.

50

2.7.1.3 Hybrid models (deterministic-stochastic)
Ray tracing and image sources have been presented in the previous sections. Focusing on
the advantages and disadvantages of the two algorithms and as such the range of
conditions applicable, a number of models have been developed as a combination of
both, see Vorlnder
[ 101]
, Dalenbck (CATT)
[ 102]
, Maercke and Martin
[ 103 ]
, Naylor
(ODEON)
[104]
; hybrid methods appear as a logical advance for simulation tasks in room
acoustics.

Image source methods alone are considered insufficient for typical acoustic conditions as
wave scattering is not accounted for. Scattered/diffused energy has been found to
dominate the sound field after a few reflections, see Vorlnder
[92]
, thus comprising an
important aspect of room acoustic conditions. Hodgson
[105]
also presented evidence of
diffuse reflections in rooms, portraying effectively diffuse sound as a critical element in
computer simulations. Results are supported in the same context by Howarth and Lam
[106]
and three international round-robin projects on room simulation, see Vorlnder
[107]

and Bork
[ 108 , 109 , 110 ]
, while Torres et al.
[ 111 ]
have further shown that the specific
characteristic is audible in the end result i.e. auralization. Nonetheless, using image
sources a high temporal resolution that facilitates good quality sampling rates for audio
processing can be achieved.

In contrast, ray tracing techniques offer effectively the opposite attributes, as scattering
can be included in the simulation, while however the temporal resolution of the
simulation system is limited.

Hybrid models were subsequently developed to facilitate a high temporal resolution and
inclusion of scattering effects. Based on the idea that a specularly reflecting ray tracing
procedure can be used to efficiently find audible image sources, i.e. a faster forward
(from the source to the receiver) audibility check, a faster simulation time compared to
classic image source methods is achieved. Different ray models include cone and
pyramid structures, thus geometrically expanding during propagation. In this way it is
51

ensured that the contribution of image sources is counted only once, while the latter is
estimated in a deterministic manner considering the distance from the source.

Hybrid models can be complemented by a stochastic element for the late part of the
sound decay (>100ms) to further improve the simulation run time. This is possible as the
level of detail involved in the early response part is not needed in estimating late
reverberation i.e. a high temporal resolution is unnecessary. The approximation provided
by a stochastic approach such as ray tracing is sufficient in this case, allowing for a
significant improvement in the simulation run time.

2.7.2 Auralization
An auralization can be considered as an end result since its generation is based on the
predicted, in this case, room impulse responses (RIRs). The term is used to describe the
process and product of achieving an audible example of the acoustics of a space by
means other than directly experiencing the actual space or listening to a recording.
Different methodologies can be used to generate an auralization; nonetheless, hybrid
computer models are a practical and efficient approach in this sense.

The use of auralization comprises a wide range of applications such as for training and
education purposes
[ 112 ]
, most notably however as an efficient and direct way to
demonstrate the prospective result of an acoustics project to both Acoustics experts and
non experts.

An auralization is generated for a specific receiver model, commonly a human listener.
The predicted RIR has to be suitably processed to obtain a Binaural RIR (BRIR) at the
receiver position of interest, consequently enabling a convolution with anechoic material.
Nonetheless, a number of limitations apply in the auralization technique mainly relating
to the RIR prediction methodology, see section 2.7.1. Notably, the lack in some cases of
an accurate sound source model is considered an important setback
[ 113 ]
though the
problem is lessened for sources approximating a point source, such as a human talker.
52

Similarly, the directivity and frequency response input required for loudspeakers is a key
factor in the process. A near field source response, as used in this study, is normally
required so as to better approximate sound propagation in small rooms, see section
5.2.1.2.

It was previously concluded that some differences between auralizations might be due to
the limitations imposed on the procedure, however it was assumed that some acoustic
properties as e.g. speech intelligibility are well represented
[113]
. Overall, up to date
auralization techniques appear to be capable of producing material that closely
approximates the actual conditions, also in terms of realism. This assumption is supported
by research undertaken by the author
[114]
.

2.8 Summary and conclusions
This chapter reviewed key literature relating to the different aspects of this study.
Epigrammatically, impulse response (and related parameters) measurement
methodologies, speech intelligibility, classroom acoustics and computer prediction
methods have been discussed.

Hesitation in the selection of the type of test signal that is most appropriate for a given
measurement session is a common condition. From the multitude of signals that have
been reviewed here it is evident that an exponential sine sweep and MLS are most
appropriate for the purposes of the study. However, a potential advantage in the current
context of MLS over sine sweeps (i.e. providing the option for level calibration) is not
required here. The current study is based on an approach considering the general potential
of a space thus, absolute speech levels are typically not used and are rather compensated
for during post processing, where necessary. In contrast, advantages of an exponential
sine sweep, such as the possibility to omit potential distortion artifacts from the
measurement system, are utilized here in the attempt to more efficiently control the
measurement conditions and provide a consistent basis for comparison of room
performance. The current methodology, as discussed in more detail in later sections,
53

indicates that the measurement system used is capable of measuring the rooms actual
potential, not being limited to an extent by discontinuities of the type.

The parameters used to describe a space are another factor of ambiguity since not all
measures are relevant in a given context. RT alone is not an adequate descriptor of
conditions, particularly for small rooms, while speech intelligibility specific descriptors
are by definition liable to lead to misinterpretation of results. A number of parameters
could also be directly related to RT (for small rooms) thus reducing the value of
information in these terms for a rooms performance. Parts of this study aim to clarify the
interrelations of these parameters in connection to the influence of the acoustic
environment. A combination of parameters is typically used (as later discussed in more
detail), mainly RT, EDT, C, D, T
s
and STI, to analyze the case studies and draw
conclusions on the topic.

For the STI case, a binaural model has been discussed in combination with a brief review
on the advantages of binaural hearing systems in the context of speech intelligibility.

A discussion / review on different studies considering classroom acoustics revealed that
the specific room category comprises a special group of spaces, given also the absolute
need for good acoustic performance, based largely on the acoustically small
characteristic i.e. longest wavelength considered exceeds the room dimensions. Given
this attribute, the different interrelations of the acoustic parameters have been discussed,
forming a basis for this study to further expand on university lecture rooms in particular.

Computer modelling and room auralizations are used in this study as a post evaluation
tool. Emphasis is given on model accuracy in terms of both direct numerical output and
auralization. For the purposes of the current study, a ray tracing model combined with the
image source theory for the initial parts of the prediction (hybrid deterministic-stochastic
model emerges as a consistent/reliable methodology. Assuming an accurate auralization,
a particularly efficient means of demonstrating results to both Acoustics experts and non-
experts can be highlighted. Validation, however, of the auralization is an on going
54

55
research topic closely related to model quality and performance. As such, the study
expands further on the topic with the aim to derive an efficient methodology in these
terms.

Chapter 3 Room acoustics measurements

CHAPTER 3
Room Acoustics Measurements

3.1 Introduction
Previous research
[27, 28, 86, 87, 88, 115]
investigated a large number of rooms within the
context of classroom acoustics and university lecture rooms for comparative purposes.
Objectives ranged from defining optimum RTs within classrooms to prospective
interrelations of acoustic parameters that are typically used in describing acoustic
conditions and speech intelligibility in particular. In the context of the current study, it
has been reported

that EDT and energy ratio (using a 50ms limit for early energy) values
for small spaces / classrooms are closely related to T
30
and thus could be predicted
consistently
[88]
. However, it is not clear if the result is valid for fitted spaces while little
information relating to the measuring positions is given, a factor that could have an effect
on the measurement output. Another approach would include using a multi source sound
system while the effect of such a configuration on general acoustic parameters e.g. RT
has not been addressed in comparison to standard acoustical measurements using a single
source.

This chapter is primarily based on the room acoustics measurements for a combination of
ten university lecture theatres/classrooms that are considered typical within university
premises. The measurement methodology is described in detail for the number of
different conditions examined and results are analyzed in terms of the type of source used
to determine the level of usability for a consistent assessment via either configuration.
Acoustic performance is further discussed in the context of measure interrelations.

56
57
Given the increased practicality of a simplified measurement procedure for in situ
conditions, the chapter concludes with an investigation into closed and open loop (see
section 3.4) measurement methodologies. Open loop systems have previously been
successfully utilized in large underground spaces, such as ticket halls and train platforms
[ 116 , 117 ]
, consequently significantly simplifying the measurement procedure. In the
attempt to establish the efficiency of this type of simpler implementation (open loop) the
latter is discussed in terms of feasibility of use within the context of classroom acoustics.

3.2 Acoustic conditions for the different source configurations
The following sections discuss the assessment in the ten case studies, considered under a
range of source configurations. The measurement methodology and rationale are initially
presented, followed by a comparison of the acoustic conditions in terms of related
parameters.

3.2.1 Measurement methodology
Measurements undertaken for the current part of the study aimed at a general assessment
of the spaces considered, enabling nonetheless further post processing where necessary
e.g. accounting for absolute speech and background noise levels. Natural acoustics were
assessed with the use of a single dodecahedron (omni directional) loudspeaker for two
positions, while the sound system assisted conditions were based on a portable sound
system set up (SS) common in all rooms, to eliminate inconsistencies due to different
system characteristics. Both source types approximated a flat frequency response while
the directivity pattern and aiming of the distributed system was not considered at this
stage. Two conditions of the portable sound system consisted of a combination of two or
four monitor loudspeakers, positioned respectively at the two front or main four corners
of the audience area within a room. Overall, a total of four source configurations were
considered (Figure 3.1): two omni directional source positions, two sound system
formations (2 x loudspeaker, 4 x loudspeaker).


Figure 3.1. Sound source configurations (I-IV)

Room acoustics measurements were based on the WinMLS 2004
i [6]
platform and a
combination of a sound source (single omni directional or distributed configuration) with
an omni directional receiver. The receiver positions and omni directional source positions
were set at a height of 1.1m-1.2m and 1.7-1.8m, respectively, the former aiming at the
stage (see Appendix 1). Positioning of the distributed system in terms of height depended
largely on the distance from the ceiling in each case, given an angled audience area for a
number of rooms; a height setting in the range of 1.8m -2.35m was thus used, as
appropriate.

An exponential sine sweep test signal was preferred since potential distortion artefacts
originating at the sound systems transfer function would be excluded from the outcomes
and thus not influence the measurement consistency. The matching system characteristics
for all rooms in the same sense could potentially eliminate prospective inconsistencies
related to differentiated system performance and to the system itself. A 10 second swept
sine was used in all cases and multiple impulse response measurements in line with BS
ISO 3382-2: 2008
[ 118 ]
were taken. Typical divergence from the standard procedure
implicated the use of a number of receiver positions close to reflective surfaces i.e. desks.
This variation aimed in assessing a more realistic environment; however, alternative
positioning was chosen were possible. Samples of BGNL were recorded throughout the
sessions (typically 5 samples in every room for a 4 hour session) with a class 1 sound
level meter at the centre of the room or at the low level measurement position, as shown
i
WinMLS is a sound card based software platform for audio, acoustics and vibration measurements using a PC.
58
in the detailed room information in Appendix 1. BGNL was estimated in terms of L
eq, 1min
in unoccupied conditions.

All room acoustics measurements were performed in fitted rooms and unoccupied
conditions at typical working hours during the day.

3.2.2 Equipment list

- Norsonic 140 sound level meter (SLM)
- B&K calibrator Type 4230
- Dell Latitude PC D610
- Digigram VXpocket v2 sound card
- WinMLS 2004 professional measurement software for Windows environment
- Audio SR707 power amplifier
- Audio splitter (1 input, 4 outputs)
- Dodecahedron omni directional loudspeaker
- Yamaha HS50M studio monitor loudspeaker on tripod stand (x4)
- Earthworks M30BX omni directional measurement microphone

3.2.3 Test rooms
The range of test rooms that was considered in this study consisted of fitted lecture
theatres and classrooms, typically found with university premises (see figure 3.2). The
majority of the rooms can be described as small to medium sized spaces, while two larger
rooms were also included in the analysis. Overall, ten rooms were examined, generically
categorized as seen in Table 3.1. Details on the construction type, fittings and
architectural characteristics of the rooms, among with different views and source/receiver
positioning details can be found in Appendix 1.

59

Figure 3.2. Examples of classroom population used in the study

Room Capacity (seating) Volume (m
3
) Size category
Room 1 (B336) 30 138 Small
Room 2 (B337) 30 156 Small
Room 3 (T214) 50 260 Small
Room 4 (K604) 30 148 Small
Room 5 (B361) 40 218 Small
Room 6 (B302) 80 242 Medium
Room 7 (B307) 140 250 Medium
Room 8 (Lecture room A) 110 267 Medium
Room 9 (B360) 62 356 Large
Room 10 (Event theatre) 240 753 Large
Table 3.1. Room list for room acoustics measurements
60
3.2.4 Results
Table 3.2 presents the average BGNL in terms of L
eq, 1min
measured over the ten test
rooms. The overall linear and A-weighted levels varied from 39.7dB-57.8dB and
34.1dB
A
-48.4dB
A
, respectively; a number of rooms having at times increased exposure to
low frequency noise. T
30, 1kHz
,

EDT
1kHz
,

C
50, 1kHz
, C
80, 1kHz
, T
s, 1kHz
, MTI
1kHz
, STI values
and a statistical summary of the results over all rooms are given in table 3.3 to establish
the general character of the rooms considered. T
30
as such varied from 0.41-0.86 while
EDT was measured within the range of 0.36-0.76. It has been previously suggested
[88]

that for the specific size category, EDT closely approximates T
30
values (with an
exception at lower frequencies), consequently optimum RT recommendations could be
given solely in terms of T
30
. The current measurement results partially support these
conclusions as for lower frequency bands T
30
was typically found to be considerably
different from EDT. The differences observed, based on average values over all receiver
positions and source configurations, approached 31% and 16% for the 125Hz and 250Hz
octave bands, respectively. Larger differences could be expected for individual source
setups i.e. for data processed prior to averaging for all source configurations and more so
for individual receiver positions.

Average BGNL over ten rooms
Frequency (Hz) 125 250 500 1000 2000 4000 8000 Overall level
Linear 37.3 34.8 34.1 33.9 32.6 27.8 24.1 42.1dB
A-weighted 21.2 26.2 30.9 33.9 33.8 28.8 23.0 38.8dBA
Table 3.2. Average BGNL over ten rooms (L
eq, 1min
)

Table 3.3. Acoustic parameters measured in ten test rooms and statistical summary
Room 1 2 3 4 5 6 7 8 9 10 Min Average Max STD
EDT
1kHz
(s) 0.39 0.41 0.36 0.76 0.54 0.46 0.5 0.46 0.59 0.46 0.36 0.49 0.76 0.12
T30
1kHz
(s) 0.41 0.45 0.45 0.86 0.62 0.5 0.53 0.53 0.82 0.52 0.41 0.57 0.86 0.15
T
s, 1kHz
(ms) 26 27.4 32.8 54.3 36.5 29.6 42 37.1 43.8 30.8 26 36.0 54.3 8.73
C
50, 1kHz
(dB) 7.6 7.3 6.9 2.1 5.2 6.4 4.3 5.2 4.2 6.4 2.1 5.6 7.6 1.70
C
80, 1kHz
(dB) 13 11.9 12 5.5 8.7 10.3 8.6 9.9 7.7 10.3 5.5 9.8 13 2.25
D
50, 1kHz (%)
84.8 84 82.3 65 76.3 81.7 71.5 74.7 71.8 80.8 65 77.3 84.8 6.504
MTI
1kHz
0.79 0.78 0.81 0.65 0.74 0.77 0.76 0.76 0.72 0.76 0.65 0.75 0.81 0.044
STI 0.77 0.75 0.79 0.66 0.70 0.74 0.74 0.76 0.70 0.78 0.66 0.70 0.79 0.041
BGNL (dB
A
) 39.2 39.4 36.7 39.9 48.4 41.8 37.4 34.1 40 39.6 34.1 39.7 48.4 3.7
61

Previous results are only partially confirmed in the current investigation since for higher
frequency bands various degrees of differentiation between T
30
and EDT were found. It is
of interest that in a number of cases the two parameters are in close approximation only
for individual source setups, an effect that can differentiate between receiver positions
within the same room, see Appendix 2. Taking in to account the experimental output as
reported by Bradley in terms of the T
30
- EDT similarity for small enclosures
[88]
the
current outcomes suggest that fittings within the spaces considered have a significant
effect on the measurement results, predominantly for the EDT parameter. A correlation
analysis between the two measures revealed a wide range of results, as would be
expected given partly unpredictable EDT results.

A comparison of results was further performed in terms of reverberation times to
establish the effect of the source type on the room response and assess the feasibility of
substituting source types with alternatives when necessary, or simply when more
practical to do so. Considering the data as shown in Appendix 2 it can be deduced that the
majority of the test rooms produced a primarily diffuse sound field for the higher
frequency range, with the partial exception of Room 10. Accordingly, T
30
output suggests
that the source type does not significantly influence the measurement results on this basis.

Tables 3.4-3.13 show the standard deviation for the T
30
and EDT variation among the
four source configurations as an average over all receiver positions in a room. For
comparison purposes, the standard deviations are also interpreted as an equivalent
percentage (%) relating in each case to the mean value at the particular data point.

62

T
30
and EDT among four source types (over all receivers)
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.069 0.052 0.022 0.019 0.019 0.011 0.021
T
30
0.055 0.02 0.02 0.006 0.006 0.006 0.041
% EDT 11.7 9.3 5.6 4.8 4.2 2.6 6.2
% T
30
7.7 3.5 4.7 1.5 1.3 1.3 10.3
30
and EDT among the four source types in Room 1

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.087 0.041 0.027 0.029 0.013 0.022 0.045
T
30
0.071 0.044 0.014 0.002 0.003 0.008 0.047
% EDT 12.8 7.0 6.3 7.2 2.8 4.8 11.8
% T
30
8.7 7.1 3.0 0.4 0.6 1.5 9.9
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.046 0.024 0.051 0.027 0.018 0.022 0.021
T
30
0.068 0.007 0.009 0.023 0.032 0.038 0.008
% EDT 7.8 4.8 13.1 7.4 4.1 5.7 7.0
% T
30
11.5 1.3 2.2 5.1 5.6 7.6 2.2
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.258 0.053 0.042 0.038 0.019 0.008 0.062
T
30
0.293 0.157 0.028 0.011 0.009 0.014 0.029
% EDT 26.6 4.6 4.2 5.0 2.9 1.2 11.9
% T
30
33.8 14.0 2.8 1.3 1.2 1.9 4.9
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.179 0.081 0.039 0.016 0.056 0.042 0.059
T
30
0.107 0.048 0.019 0.006 0.008 0.01 0.049
% EDT 12.2 9.0 6.5 3.0 10.3 7.7 12.2
% T
30
5.9 4.5 2.9 1.0 1.2 1.4 7.1
30

63

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.052 0.061 0.017 0.026 0.012 0.052 0.025
T
30
0.028 0.072 0.013 0.01 0.02 0.065 0.055
% EDT 7.7 9.5 3.3 5.7 2.2 10.1 6.3
% T
30
4.8 11.1 2.5 2.0 3.3 10.8 11.3
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.093 0.044 0.053 0.03 0.033 0.048 0.047
T
30
0.067 0.021 0.007 0.01 0.007 0.012 0.026
% EDT 14.8 7.3 9.9 6.0 6.2 8.6 10.7
% T
30
7.3 3.3 1.3 1.9 1.2 2.0 5.2
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.109 0.083 0.038 0.042 0.042 0.046 0.06
T
30
0.089 0.023 0.003 0.009 0.02 0.007 0.035
% EDT 17.0 15.4 7.6 9.1 8.6 9.4 16.5
% T
30
10.1 3.6 0.6 1.7 3.4 1.3 8.2
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.073 0.031 0.044 0.04 0.108 0.233 0.089
T
30
0.035 0.014 0.018 0.012 0.007 0.007 0.05
% EDT 7.2 3.7 7.4 6.8 15.3 30.6 19.1
% T
30
3.1 1.5 2.8 1.5 0.7 0.7 5.9
30

T
30
Octave band
125 250 500 1000 2000 4000 8000
EDT 0.056 0.036 0.019 0.024 0.045 0.08 0.082
T
30
0.031 0.032 0.001 0.003 0.037 0.098 0.107
% EDT 11.5 6.4 3.6 5.2 10.4 20.6 25.6
% T
30
4.9 5.3 0.2 0.6 6.5 16.7 22.9
30

64
Overall, low standard deviations were established for the T
30
case over all rooms with
values well below 5% (1 JND, ISO 3381-1 after Vorlnder
[92]
) error for the majority of
the experimental data. A notable exception can be seen at higher frequencies in Room 10
where errors approached 23% at 8kHz. Room 10 however comprised the largest room in
the investigation thus, having produced a quasi-diffuse sound field the discrepancies
observed for the 4kHz and 8kHz octave bands can be justified. In the EDT case, larger
differences were observed given the different source configurations and related
positioning within the room. Averaged results in tables 3.4-3.13 however suggest that a
reasonably accurate assessment can be made with an error margin below 10% (2 JND).
Considering any exceptions in large rooms (Rooms 9-10) at higher frequencies,
distinctively the smaller rooms in the investigation gave confidence in the consistency of
the measurements.

Consequently, a room assessment in terms of T
30
and EDT can be performed using either
of the source types to characterize a room on general performance.

Figures 3.3-3.12 show the measured values in terms of C
50
for four source configurations
as an average over the measuring positions. C
50
values depended largely on the relative
positioning of the receiver with respect to the source(s), while optimization of source
aiming or positioning was not considered.

65
Average C50 over all receivers for four source conditions (dB)
-4.0
-2.0
0.0
2.0
4.0
6.0
8.0
10.0
12.0
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

Figure 3.3. Source efficiency in terms of C
50
- Room 1
50
- Room 2

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 3

66

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 4

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 5

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 6

67

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 7

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 8

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 9

68

-4
-2
0
2
4
6
8
10
12
125 250 500 1000 2000 4000 8000
Octave band
C
5
0

d
i
f
f
e
r
e
n
c
e

(
d
B
)
S1
S2
SS4Loudspeakers
SS2Loudspeakers

50
- Room 10

Overall, results revealed that for a number of rooms (e.g. Rooms 1-4) source
configurations involving the SS produced a more consistent sound field with higher
clarity (50ms) values. In smaller flat rooms nonetheless, only small differences were
found among the source types. In many cases SS based configurations did not perform
better when compared to the omni source, something that was attributed to potential
comb filtering effects in the room. Directivity and aiming also had a significant impact on
the results as no optimization was performed in these terms; this would predominantly
lead in an underestimation of the effectiveness of the SS in either formation.

Taking another approach, in larger rooms where an increased distance of the SS to the
receivers was regularly the case, omni directional source configurations produced a better
result, see Rooms 7-9, particularly when positioned in the middle and in close proximity
to all receiver positions, see Room 6.

Overall, C
50
results were subject to the relative source-receiver distance, however, the
differences observed among source types did not exceed 2-3dB in most cases (note: C
50

JND is 1.1dB), being below the practically detectable limit of 3dB
[27]
.

The STI results are discussed in the following section in comparison to measurements
post processed to account for BGNL and speech level within the rooms.

69
3.3 Measurement output accounting for typical speech and BGNL
The incorporation of typical speech level and actual or expected BGNL enabled an
additional assessment of the measurements on a more realistic basis. Current results were
post processed to account for the two variables using typical speech levels
[119]
and the
mean value of background noise, as measured in the ten test rooms. It is worth noting that
the measured BGNL closely approximated findings as reported by Hodgson
[ 120 ]
in
relation to typical noise levels in university classrooms.

Figures 3.13-3.22 show the derived STI in ten rooms for the different source
configurations. The average difference among source types was in the JND range (0.02
STI) with no optimization in terms of sound system positioning or aiming. On a more
realistic approach, speech and BGNL was incorporated in the measurements producing
significantly reduced STI values, see figures 3.13-3.22. It should be noted that post
processed results constituted effectively a consistent conversion of the initial output with
nearly identical interrelations among receiver position performance, since the same
speech and noise levels were used in every case. Nonetheless, with results being more
realistic, the STI reduction was mainly attributed to the higher frequency bands that were
more profoundly affected by background noise (L
Aeq
55dB for a standard spectrum
[6]

was used for speech). Overall, a consistent STI assessment can be made using either
source configuration to generically describe a room, the average differences being in the
JND range; a realistic model for speech and BGNL individually for the receiver positions
is not considered in this case.

Figure 3.13. STI for four source configurations in Room 1, I) Primary - II) Post processed
70



I) II)

Figure 3.16. STI for three source configurations in Room 4, I) Primary - II) Post processed

71



Figure 3.20. STI for four source configurations in Room 8, I) Primary- II) Post processed


72


The methodology used for measuring general room performance is satisfactory for room
classification. Compared to level calibrated measurements moreover, the sessions were
not limited to a single measurement task as results originated from general impulse
response data. An assessment, nonetheless, on the efficiency of the source type in this
respect would require the process to account for the effect of the source-receiver distance
on the resultant speech level, assuming the source types considered have been optimized
for their intended use. Practical limitations in handling the portable sound system did not
allow for an in depth assessment on this basis.

Detailed data relating to the room acoustics measurements can be found in Appendix 2.

3.4 Parameter interrelations
In the following sections, key acoustic parameters are analyzed to establish any
association between them and determine the extend of the latter and related limitations.
Considering speech intelligibility parameters in particular, different elements of the
acoustic conditions are used to obtain a result. For example, clarity energy ratios make
use of the room effect on acoustic behaviour while ignoring background noise,
effectively the S/N. Other parameters such as the STI comprise a more elaborate
approach in an attempt to account for all the variables that affect acoustic performance.
The conditions present in a space during a measuring session will thus unavoidably affect
the output in different ways for different measures. As such, care needs to be taken when
73
comparing dissimilar parameters or making an assumptive assessment, based on a
particular methodology.

3.4.1 Clarity (C) energy ratios versus STI
STI comprises a measure describing speech intelligibility using a single number for seven
octave bands, subsequently corresponding to more than a single C value. In order to
enable a comparison in octave band level detail, the modulation transfer index (MTI) is
considered as the equivalent octave band STI, nonetheless the benefit of octave band
weighting and redundancy corrections as such is not considered and therefore results
could underestimate the potential relationship. In assessing the relation between the C
energy ratio and STI it should be reminded that the former does not account for the
influence of BGNL. Thus, the particular interrelation is subject to change in every
environment, depending on the noise conditions present.

Figure 3.23 shows the relation of C
50, 80
to MTI for two conditions, with and without
background noise. For noiseless conditions the relation of the two measures was
approximately linear, coinciding with earlier results by Bradley
[27]
, while C
80
appeared to
be better related to MTI. The related correlation coefficients were nonetheless
comparable with values of 0.91 and 0.96, respectively for the pairs C
50
-MTI and C
80
-MTI,
see figure 3.23 (I-II). In the conditions accounting for BGNL the particular associations
break down, as the measures compared are effectively modified in to two fundamentally
different measures. Considering that the particular relation, see figure 3.23 (III-IV) could
be altered even within the same room under different noise conditions, a comparison of C
to STI when accounting for BGNL would appear as of minor significance unless some
level of consistency can be expected in terms of the background noise character.

When the S/N is high enough to render the effect of BGNL negligible in a practical
application, it would be possible to predict the speech intelligibility in terms of STI from
the C
50, 80
datasets with a high level of accuracy. Therefore, for a high signal level
condition C might also be used as a direct descriptor of speech intelligibility. When
considering marginal conditions nonetheless, the relationship would be invalidated to a
74
large extent, as seen in the example in figure 3.23 (III-IV). Consequently, the latter
contaminated relationship could be used, if established, to ascertain BGNL as a
significant factor in the acoustical conditions.

I) II)
III)
IV)

Fi I gure 3.23. Relation of Clarity to MTI in ten test rooms, I) C
50
to MTI without background noise, II) C
80
to MT
without background noise, III) C
50
to MTI with background noise, IV) C
80
to MTI with background noise

3.4.2 Room reverberance (EDT, T
30
) versus STI
The relation of room reverberance to STI followed a similar trend. Considering EDT and
T
30
, an evident relation of reverberance to the MTI was found for noiseless (or adequate
S/N) conditions, see figure 3.24. EDT was more closely related to MTI as expected with
75
a correlation coefficient of 0.98 (0.85 for T
30
) having a near linear interaction, while a
similar degree of agreement was further found for all four source configurations.

Figure 3.24. MTI relation to space reverberance in ten test rooms (no noise)

The close relationship between the measures became less evident with the incorporation
of background noise, resulting in a correlation coefficient of 0.67 for both reverberation
indices. The resulting relationship would again be subject to the character of noise, being
the only altered variable between the two conditions, subsequently having an effect on
the value of a potential comparison in these terms similar to the C case.

3.4.3 EDT versus T
30

Section 3.2.1 has considered the interrelations of T
30
and EDT values, with any
conclusions partially supporting earlier studies
[88]
suggesting similarity of the two
measures for small rooms. Figure 3.25 shows the relation of the two measures for four
source configurations, showing a partial similarity of the two measures. A direct effect
would be that T
30
would not be suitable for use as a baseline to predict EDT and C
50
,
among other measures, within fitted university classrooms and lecture theatres. However,
considering alternate source configurations appeared to influence the T
30
-EDT
relationship. Figure 3.25 illustrates the closer connection between the reverberation
indices when the portable sound system is used as a source, particularly in the
76
SS
2loudspeakers
case. While all four configurations produced a relatively small deviation in
terms of correlation between parameters, it should be noted that excluding the two larger
rooms of the study (Rooms 9-10) from the statistical analysis resulted in an enhanced
interrelation for all conditions, see figure 3.26. Consequently, this produced a better
similitude between the different conditions giving further confidence in assessing smaller
sized rooms, as discussed in section 3.2.1.

Figure 3.25. Relation of EDT to T
30
for four source configurations in ten test rooms (S1, S2, SS
4loudspeakers
, SS
2loudspeakers
)

Figure 3.26. Relation of EDT to T
30
for four source configurations after excluding Rooms 9-10 (S1, S2, SS
4loudspeakers
,
SS
2loudspeakers
)

3.4.4 EDT versus C
50

Examining the relation between EDT and C
50
resulted in a clear trend, as expected, of
increasing clarity with decreasing values of EDT, see figure 3.27. In effect, an indication
of clarity can be obtained via the particular trend. Results hold for either of the four
source configurations, while a closer relationship was established for smaller rooms in
the study. It should be noted nonetheless that earlier studies
[87]
have established a
77
significantly reduced correlation for longer EDT, having considered sound system
measurements and a wider range of conditions, not typical within classrooms.

R
2
=0.874

Figure 3.27. Relation of C
50
to EDT in ten test rooms

3.5 Comparison of measurements for closed/open loop
In this section, data relating to the core measurement system (closed loop, see figure 3.28
(I)) as described in section 3.2.1 is compared to the equivalent data resultant of an open
loop system configuration (figure 3.28 (II)). An open loop measurement methodology has
been previously successfully applied in large underground spaces
[116, 117]
, however, the
method has not been assessed within smaller rooms. Measurements can be taken if
necessary using a portable receiver with recording capabilities (therefore reducing
cabling) suggesting that an open loop measurement system can be a more practical
methodology, when under more restrictive circumstances.

An assessment of closed and open loop measurement systems is performed to determine
the efficiency of an open loop methodology in the context of classroom acoustics.

3.5.1 Open loop measurement methodology
An open loop system (figure 3.28 (II)) typically consists of a processing unit, a sound
source and a receiver with recording capabilities. For the purposes of this investigation,
the software platform of the closed loop system was initially replaced by a multi-track
78
recorder for the first stage of the session. A 10 second sine sweep (exp) was arranged in a
track, with visual time markers at the beginning and end of the test signal. A secondary
track was set to simultaneously record the systems microphone input during
reproduction of the sine track, in turn from the four source configurations as described in
section 3.2.1. The time markers were later used within an audio editor to accurately
identify and extract the precise time limits in the recorded sample, incorporating the test
signal with the room response. The extracted samples were subsequently converted into
time domain raw data that is the WinMLS binary format. Three measurements were
processed per receiver position (per source configuration) to ensure consistency among
tests while, post processing of selected data allowed the derivation of the acoustic
parameters of interest for the 125Hz 8kHz octave band range.

Figure 3.28. Measurement system, I) Closed loop configuration, II) Open loop configuration

3.5.2 Supplementary equipment list for open loop measurements
- Sony Vegas Pro 8, professional multi-track audio editing software
- Sony Sound Forge 9, professional digital audio production suite

3.5.3 Two system data comparison (Closed-Open loop data)
In the following paragraphs, data derived from the open loop measurement configuration
in ten test rooms is compared to its closed loop equivalent to validate the approach. T
30
,
EDT, C
50
and STI are considered, while for the purposes of the current assessment the
closed loop system is assumed as most consistent and therefore used as the reference for
performance.
79

Figures 3.29-3.38 show a comparison of T
30
-EDT results for the two systems as an
average over the measuring positions per source configuration. Overall, different degrees
of error were produced for the two measures, EDT typically being more accurate.
Discrepancies of up to 19% EDT were observed in the octave bands of interest, excluding
lower frequencies, while considerable mismatches were found for a number of distinct
data points among the test rooms. In terms of T
30
, differences approached 26% in a
similar manner with further distinct erroneous data points within the results. For the
majority of the T
30
-EDT experimental data, accurate matches to the corresponding
reference values were produced with a maximum error of up to 10% and normally less
than 5%.

Similarly, C
50
results (see Appendix 3) were typically within 2dB for all octave bands in
either source configuration. A number of exceptions, most notably in Room 10, were
observed with the average error in this worst case approaching 3dB. Notably, a JND of
1.1dB has been previously established for C
50
however a 3dB value was further suggested
as more suitable for everyday situations
[27]
.

STI results were within the JND for all receiver positions and for all the source
configurations, with the exception of a small number of erroneous data points. A
maximum error of 0.06 STI however, was found for the latter case i.e. double the JND.
Considering the average over all receiver positions in the test rooms, suggested an STI
per source configuration within the JND for either arrangement.
80
Comparison of average T30 and EDT for Closed Open loop using S1
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
Comparison of average T30 and EDT f or Closed Open loop using SS4 loudspeakers
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

Figure 3.29. Comparison of T
30
and EDT results for Closed-Open loop systems as an average over all
measuring positions in Room 1 (Four source configurations)

81
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

82
Comparison of average T30 and EDT f or Closed Open loop using S1
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

83
0.00
0.50
1.00
1.50
2.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.50
1.00
1.50
2.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.50
1.00
1.50
2.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT
30
measuring positions in Room 4 (Three source configurations)

84
0.00
0.50
1.00
1.50
2.00
2.50
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.50
1.00
1.50
2.00
2.50
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.50
1.00
1.50
2.00
2.50
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.50
1.00
1.50
2.00
2.50
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

85
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

86
0.00
0.20
0.40
0.60
0.80
1.00
1.20
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
1.20
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
1.20
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
1.20
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

87
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
30
measuring positions in Room 8 (One source configuration)

88
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

89
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S1 Closed loop T30
S1 Open loop T30
S1 Closed loop EDT
S1 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
S2 Closed loop T30
S2 Open loop T30
S2 Closed loop EDT
S2 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS4 Closed loop T30
SS4 Open loop T30
SS4 Closed loop EDT
SS4 Open loop EDT
0.00
0.20
0.40
0.60
0.80
1.00
125 250 500 1000 2000 4000 8000
Octave band
T
(
s
e
c
)
SS2 Closed loop T30
SS2 Open loop T30
SS2 Closed loop EDT
SS2 Open loop EDT

30

90
3.5.4 Comments on Open loop measurement sessions
Measurements can generally be performed using a suitable receiver e.g. a sound level
meter with recording capabilities. However, time synchronization of the recorded test
signal should be considered since a misalignment could result in significant errors. In the
current methodology this issue was resolved by using time markers at the recording stage
to accurately identify the position of the test signal and recorded response within the
audio track.

It is required by the open loop methodology for settings relating to the test signal setup
within the measurement software to remain unchanged, i.e. the signal used to record the
room responses in open loop mode should match the internal software measurement
settings at the post processing stage, when using the recordings. As such, minor
differences in e.g. the length of the silence segment following the test signal or the
systems sampling rate and bit depth have been found to result in additional errors.

An important negative contribution is similarly made at the post processing stage by
altering the frequency range used to initially produce the test signal, since small
variations in the range of a few Hertz can produce significant error. A signal propagating
in an enclosure will unavoidably alter its spectrum characteristics; however, this process
is not accounted for in system calibration. An additional uncertainty factor is thus
introduced at the input signal and potentially an open loop system is more notably
affected. Ultimately, there is a significantly increased number of aspects within a
measurement session when compared to a closed loop system that could invalidate the
results.

Overall, it was established that an open loop methodology could be used as an alternative
to a closed loop system for the case where circumstances restrict the measurement
session in terms of practicality. Individual errors, inappropriately high on occasions, did
occur within the data sets from the ten test rooms suggesting that the method would
conceivably not be suitable for a detailed assessment. Considering an averaged output
over the measuring positions facilitated a reasonably accurate room characterization.
91
Notably for a speech intelligibility assessment, the open loop methodology produced an
acceptable level of confidence.

For a multi faceted assessment such as use of room acoustic data in computer modelling,
the potential accumulation of errors and subsequent expansion of error margins would
ultimately need to be accounted for to retain confidence in an assessment.

3.6 Conclusions
The current analysis has considered ten test rooms covering a typical range of acoustic
conditions found within university classrooms. Previous research on T
30
-EDT
measurements for small rooms could be partially confirmed. However, the relation of T
30

and EDT appeared to have different degrees of variation, being highly dependent on the
source configuration in use. The effect was attributed to the fittings present within the
rooms thus the EDT could be expected to deviate from T
30
in different cases. This
assumption was supported by a correlation analysis where a wide range of results was
observed for the relation between T
30
and EDT.

Either of the four source configurations could be used for a consistent room assessment in
terms of T
30
. Larger differences were observed between source types for EDT, based on
the relative source-receiver positions. However, average values over the measuring
positions suggested the feasibility of a reasonably accurate assessment. Only smaller
rooms in the investigation (Rooms 1-8) gave confidence for an acceptable error margin.

C
50
values depended on the relative source-receiver distance. Nonetheless, differences
among source types did not exceed 2-3dB in most cases, being marginally over the JND;
a realistic JND is defined as 3dB. A high correlation coefficient was established for the
relation between C
50
and EDT, thus an indication of clarity could be obtained in small
rooms from the particular trend. The measured differences among source types for STI
were in the JND range (0.02) in all but two cases. However, an assessment aiming at a
direct STI evaluation would require the utilization of the source type and setup normally
92
93
used in a space, followed by post processing of data to account for a realistic S/N at
individual receiver positions.

Good correlation was established between Clarity and STI (0.91 and 0.96 for C
50
and C
80

respectively) for noiseless or adequate S/N measurement conditions. Speech intelligibility
in terms of STI can thus be predicted from clarity datasets with confidence. For high S/N
in an actual case, clarity ratios can be used as a direct descriptor of intelligibility. Given
that the interrelation breaks when background noise is considerable, BGNL can be
identified as a significant factor for the acoustical conditions of a space in such a case. A
similar influence of background noise was found for the relation between room
reverberance (EDT, T
30
) and STI, particularly for EDT (correlation=0.98).

In terms of a more practical measurement approach, it was established that an open loop
measurement methodology could potentially be used as an alternative to closed loop
systems. Individual errors within the derived datasets suggested however that the method
would not be suitable for a direct assessment for individual receiver positions and is
rather aimed at a general room characterization. The increased number of aspects that
could invalidate a measurement was highlighted; nonetheless, an acceptable level of
confidence for a speech intelligibility assessment in a room was established.
Chapter 4 Low level measurements

CHAPTER 4
Low Level Measurements

4.1 Introduction
There are two reasons primarily responsible for low resultant S/N during a measurement
session, excessive background noise levels or an intentional reduction in the signal level
to minimize annoyance caused to the public by a high test signal level. A significant error
may consequently be introduced in the resulting data since a measurement methodology
requires, among other parameters, a minimum S/N for an accurate outcome. Threshold
efficient values of S/N are highly affected by additional factors, including the general
acoustic characteristics of the space considered, the size of the space and relative
positioning of the source-receiver configuration, the test signal in use and the
measurement duration. Speech intelligibility measurements require a high degree of
accuracy since output variations can result in unacceptable errors. Noise sensitive spaces
in particular are where conditions might be marginally acceptable, thus small errors will
have a large impact in the space assessment procedure and subsequent action.

With the aim to establish a point of reference, the following sections consider the
parameter interrelations in terms of T
30
, EDT and STI when approaching marginal
accuracy conditions. A single acoustic parameter measured by any means may be used
for further evaluation of an acoustical environment by utilizing results in the validation of
a computer simulation
[114, 121]
, see section 5.2.2.

Assuming acceptable precision of the
reference parameter e.g. T
30
or EDT, an acceptable level of model prediction accuracy
can be further achieved. Choosing a suitable reference parameter is vital for the
subsequent computer model prediction methodology, particularly when related values
originate from measurements performed at marginal conditions. The applicability of the
94

low level measurement session outcomes is thus put in to focus to assess the potential
gain in assessment efficiency in these terms.

4.2 Measurement methodology
In the following sections the measurement methodology for undertaking low level
measurements and accuracy verification course are presented.

4.2.1 Threshold efficient signal to noise ratio (S/N) measurements
Impulse response measurements within ten test rooms were performed using a WinMLS
based platform combined with the source-receiver configurations as described in previous
sections (also see Appendix 1). Multiple measurements using a 10 second exponential
swept sine test signal were performed in descending 1dB steps, to the point where an
outcome became marginal (threshold efficient S/N). The reference to establish an
accurate performance consisted of a high S/N measurement, obtained prior to
commencing on the process while the original RT (T
30
) curve was used to monitor the
output during a session. The low level resultant data from the series of measurements
was later compared to the reference T
30
curve to reflect the effect of a reduced S/N on
mainly intelligibility related measures. Depending on the session, two different source
configurations were used, a dodecahedron omni directional source and a four loudspeaker
portable sound system formation, as previously described in section 3.2.1.

The measurement methodology in the initial screening sessions considered a
dodecahedron sound source in a single test room (reverberation chamber, N11) for two
RT conditions, referred to as high and low.

4.2.2 Noise source incorporation
A pink noise source was used to eliminate the risk of the test signal level not being high
enough to excite the room. Moreover, existing and potentially varying BGNL was
expected to limit consistency in the results, given that the investigation concerned
95

marginal conditions i.e. close to the noise floor. Pink noise was thus used in an attempt to
achieve a steady state BGNL. The incorporation of the background noise in to the system
was achieved using an audio mixer. A secondary portable computer was used as a noise
generator and mixed in to the system output. The different test configurations consisted
of the test signal being reproduced by any of the two source types, alone or mixed with
simulated noise. In examining the risk of measurement inconsistencies due to
reproduction of signal and noise from the same loudspeaker a further two combinations
were later considered with the signal and noise reproduced by individual sources (see
section 4.4.3). Thus, a number of six system configurations (Figure 4.1 (I-IV)) were used
overall, referenced as:
- Signal from Omni directional (no simulated noise)
- Omni (signal), Omni (background noise)
- Omni (signal), SS
4loudspeakers
(background noise)
- Signal from Sound system (no simulated noise)
- SS
4loudspeakers
(signal), SS
4loudspeakers
(background noise)
- SS
4loudspeakers
(signal), Omni (background noise)

Test room
Test room
Test room
Test room
Test room
Test room

Figure 4.1. Six system configurations (I-VI)
96

4.3 Initial investigation and screening sessions
Early investigations by the author
[122]
considered a sine sweep test signal for two RT
conditions within a single test room (reverberation chamber) with variable absorption
characteristics, see Figure 4.2. It was found that the threshold S/N is not a static
parameter, and is highly affected by several variables, including primarily RT. A
correlation analysis performed on the measurement results revealed correlation
coefficients up to 0.7 between threshold efficient S/N and T
30
. Increasing threshold S/N
for increasing RT values were observed, see table 4.1.

Reverberation chamber

Results however could not be assumed as straightforwardly repeatable, since correlation
coefficients up to 0.8 were observed under the same conditions when the measurements
were repeated. The particular values approached perfect correlation when the 125Hz and
250Hz octave bands were excluded, generally appearing to be problematic for the test
room.

Reverberation chamber
High RT Low RT
F(Hz) RT(s) S/N (dB) RT(s) S/N (dB)
125 2.42 15 1.31 12.1
250 2.35 18 1.32 11
500 2.35 14 1.27 12.3
1k 2.59 16 1.41 16.3
2k 2.51 9.7 1.37 14
4k 2.21 9.8 1.3 4.1
8k 1.54 4.9 1 5.5
Table 4.1. Sample threshold efficient S/N ratios for Sine sweep in a test room
10m
2
absorptive material
(removable)
V=105m
3

Source
Receiver
Figure 4.2. Test room (reverberation chamber) schematic with source-receiver positioning
97

The positioning of the source/receiver pair within the test room was considered as a
primary reason for differentiated results. Receiver positions close to the single absorbing
surface and to a lesser extend close to reflective surfaces, were found to be responsible
for unpredictable system behavior (some random results) and low correlation coefficients.
It was found that absorptive material would be preferably found uniformly across the
room surfaces for increased confidence on the measurement, given marginal conditions.
Overall, conditions enhancing diffusivity in the sound field were found to be most
suitable. Similarly, source positions promoting a diffuse field, were found to be essential.

The accuracy level, in terms of STI values derived from low S/N measurements, was
verified by comparison to the high S/N reference values, see table 4.2. The modulation
transfer index (MTI) was central in the investigation as the values among individual
octave bands do not necessarily relate to a single measurement. Good agreement was
found for all conditions, demonstrating the efficiency of the test signal in the current test
room configurations and for the levels used. It should be noted however that good
agreement was further found in the screening session for lower S/N ratios, derived from
less accurate, in terms of T
30
, impulse response measurements (inadequate signal strength
to calculate T
30
). This effect was repeated throughout the sessions and is later discussed
in more detail, see section 4.4.1. Later experimentation tracked a finer approach where
marginal MTI values were considered individually, referencing the S/N required to
achieve an accurate MTI (per octave) rather than an accurate T
30
.

Modulation transfer index (MTI)
F(Hz) 125 250 500 1k 2k 4k 8k
Average
(unweighted
STI)
Reference (High RT) 0.46 0.43 0.4 0.43 0.39 0.42 0.52 0.44
Marginal conditions (High RT) 0.46 0.43 0.4 0.42 0.39 0.43 0.55 0.44
Reference (Low RT) 0.58 0.61 0.56 0.56 0.52 0.55 0.62 0.57
Marginal conditions (Low RT) 0.57 0.62 0.57 0.57 0.52 0.57 0.64 0.58
Table 4.2. Comparison of STI for reference and experimental (marginal) conditions

98

The screening session overall suggested that BGNL and its fluctuating character were key
parameters, affecting the measurements for the duration of the process. These
characteristics prevented accuracy for particular measurements, given also that results
were analyzed in 1dB steps.

It should be noted that the calculation of related S/N was based on a BGNL estimation
using the last 10% of the measured impulse response thus the particular data sets cannot
be considered as decisive. Moreover, the late part of the sound decay within a room was
assumed to be linear; this prerequisite however was often not the case, consequently
affecting the estimation of the threshold efficient S/N. The significance and practical
application of the particular datasets in terms of absolute values is thus limited. However
results constitute an indication of the functions involved when approaching threshold
efficient S/N. In the following sections a more detailed approach to the topic is presented
to verify and complement the findings and assumptions of the initial investigation.

4.4 Low level measurements in ten test rooms
The low level measurement methodology has been applied in ten test rooms of varying
size category, following the room acoustics measurements in Chapter 3. The majority of
the measurements were based on a system configuration using the portable sound system
as a source for both the signal and simulated background noise.

The initial scope of the investigation at its current state concerned the interrelation
between T
30
and EDT in relation to a reducing signal level in the series of measurements
i.e. reducing S/N. Finally, reference to MTI and subsequent resultant STI is made for a
finer assessment of the connection between the measures involved.

4.4.1 Measure interrelations and performance
The accuracy performance of T
30
and EDT for a series of measurements at reducing
signal level (-1dB per measurement) was initially examined against reference values
originating from high S/N measurements for a broader view of the performance reduction
99

rate, see figures 4.3-4.12. Without particular emphasis at this stage to the associated S/N,
the graphs for each test room relate individually to the seven octave bands considered.
For the non viable ideal conditions, continuous accuracy within the series would be
implied i.e. a straight line within the graphs starting at the reference value. Values thus
that deviate (>2JND) from the trend set by the reference measurement can be identified
as erroneous.

Considering the relation in each case of the two measures to the reference state, it was
evident that EDT gave more consistent results in the series of measurements i.e. reduced
value fluctuations when compared to T
30
, see figures 4.3-4.12, thus supporting previous
results
[123]
.

In Rooms 1 and 2, being similar in terms of shape and construction materials, differences
were observed for lower frequency bands where T
30
in Room 1 gave significantly
inconsistent results as opposed to EDT. The effect was attributed to the single
construction difference between the rooms, a small coupled space in Room 1. The
aperture size of the latter (2m
2
) was partly blocked during the measurement session by a
flat surface positioned against the host wall, thus forming a smaller sized aperture.
However, considering wave diffraction effects, results would coincide with the
fluctuating character of T
30
values (at lower frequencies) as is typical for coupled spaces.
It is of interest here that EDT ignored the specific factor, resulting in a more fixed
outcome.

In Room 4 (Figure 4.6) where measurements extended further beyond threshold
conditions, the performance reduction rate among octaves showed a clearer relation to the
BGNL in the room. Having a characteristic background noise spectrum with higher level
at lower frequency octaves and assuming a flat frequency response for the measurement
system, the associated low level plots demonstrated the correlation between the two
functions.

100

S/N=13.2dB (T
30
)
S/N=6.2dB (T
30
)
S/N=5.1dB (EDT)
S/N=7.6dB (T
30
)
S/N=2.2dB (EDT)
S/N=10.1dB (T
30
)
S/N=5.1dB (EDT)
S/N=-0.8dB (T
30
)
S/N=-0.8dB (EDT)
S/N=1.6dB (T
30
)
S/N=1.6dB (EDT)
S/N=4.2dB (EDT)
S/N=6.3dB (EDT) S/N= Questionable outcome (T
30
)

EDT limit
T
30
limit

101
Figure 4.3. T
30
and EDT accuracy/performance decay in octave bands for a series of measurements
(-1dB in signal level per measurement), Room 1, Signal from Sound system (no simulated noise)

S/N=6.4dB (EDT)
S/N=6.4dB (T
30
)
S/N=3.8dB (EDT)
S/N=8.5dB (T
30
)
S/N=9.2dB (EDT)
S/N=9.8dB (T
30
)
S/N=3.8dB (EDT)
S/N=9dB (T
30
)
S/N=4.7dB (EDT)
S/N=12.1dB (T
30
)
S/N=10.4dB (EDT)
S/N=10.4dB (T
30
)
S/N=12.2dB (EDT)
S/N=12.2dB (T
30
)

EDT limit
T
30
limit

102
Figure 4.4. T
30
(-1dB in signal level per measurement), Room 2, Signal from Sound system (no simulated noise)

S/N=-2.2dB (EDT)
S/N=-2.3dB (EDT)
S/N=6.2dB (T
30
)
S/N=2.7dB (EDT)
S/N=8dB (T
30
)
S/N=4.4dB (T
30
)
S/N=1.5dB (EDT)
S/N=2.5dB (EDT)
S/N=3.3dB (T
30
)
S/N=2.3dB (T
30
)
S/N=3.3dB (EDT)
EDT limit
T
30
limit
S/N=0.9dB (EDT)
S/N=18.6dB (T
30
)
S/N=4.9dB (T
30
)

103
Figure 4.5. T
30
and EDT accuracy/performance decay in octave bands for a series of
measurements (-1dB in signal level per measurement), Room 3, SS (signal), SS (noise)

S/N=-2.2dB (T
30
)
S/N=-2.2dB (EDT)
S/N=3.5dB (EDT)
S/N=3.5dB (T
30
)
S/N=0dB (EDT)
S/N=3.9dB (T
30
)
S/N=6dB (EDT)
S/N=6dB (T
30
)
S/N=2.3dB (EDT)
S/N=7.9dB (T
30
)
S/N=8.6dB (EDT)
S/N=8.6dB (T
30
)
S/N= Questionable outcome (EDT)
S/N=Questionable outcome (T
30
)

EDT limit
T
30
limit

104
Figure 4.6. T
30

S/N=-1.4dB (EDT)
S/N=0.4dB (EDT)
S/N=-0.6dB (EDT)
S/N=2.3dB (EDT)
S/N=12.6dB (EDT)
S/N=17.8dB (T
30
)
S/N=8.76dB (T
30
)
S/N=-4.4dB (EDT)
S/N=6.5dB (T
30
)
S/N=-3.2dB (EDT)
S/N=4.4dB (T
30
)
S/N=14dB (T
30
)
S/N=19.1dB (T
30
)
S/N=26.4dB (T
30
)

EDT limit
T
30
limit

Figure 4.7. T
30
(-1dB in signal level per measurement), Room 5- SS (signal), SS (noise)
105

S/N=-0.1dB (EDT)
S/N=-2.2dB (EDT)
S/N=8.4dB (T
30
)
S/N=1.9dB (EDT)
S/N=7dB (T
30
)
S/N=6.5dB (T
30
)
S/N=0.1dB (EDT)
S/N=5dB (T
30
)
S/N=5.1dB (EDT)
S/N=6.8dB (T
30
)
S/N=7.4dB (T
30
)
S/N=7.4dB (EDT)
S/N=4.7dB (T
30
) S/N=0.3dB (EDT)

EDT limit
T
30
limit

106
Figure 4.8. T
30

S/N=9.7dB (EDT)
S/N=0.3dB (EDT)
S/N=2dB (T
30
)
S/N=1.1dB (EDT)
S/N=1.9dB (T
30
)
S/N=2.1dB (EDT)
S/N=2.4dB (T
30
)
S/N=0.5dB (EDT)
S/N=6.7dB (T
30
)
S/N=6.7dB (EDT)
S/N=6.7dB (T
30
)
S/N=21.2dB (T
30
)
S/N=5.1dB (T
30
)
S/N=0.6dB (EDT)

EDT limit
T
30
limit

107
Figure 4.9. T
30

S/N=0.8dB (EDT)
S/N=-2.9dB (EDT)
S/N=5.9dB (T
30
)
S/N=7.7dB (T
30
) S/N=1.1dB (EDT)
S/N=-4.3dB (EDT)
S/N=-2.6dB (T
30
)
S/N=2.1dB (EDT)
S/N=7.1dB (T
30
)
S/N=20.1dB (T
30
)
S/N=6.5dB (EDT) S/N=15.6dB (T
30
)
S/N=3.8dB (EDT)
S/N=6.9dB (T
30
)

EDT limit
T
30
limit

108
Figure 4.10. T
30

S/N=2.8dB (EDT)
S/N=10.2dB (EDT)
S/N=9.5dB (EDT)
S/N=2.5dB (EDT)
S/N=16.7dB (T
30
)
S/N=20.6dB (T
30
)
S/N=23.9dB (T
30
)
S/N=2.9dB (EDT)
S/N=7.3dB (T
30
)
EDT limit
T
30
limit
S/N=9.1dB (EDT)
S/N=4.9dB (EDT)
S/N=20.5dB (T
30
)
S/N=16dB (T
30
)
S/N=17dB (T
30
)

109
Figure 4.11. T
30

S/N=4.9dB (EDT)
S/N=-1dB (EDT) S/N=8.4dB (T
30
)
S/N=12.4dB (T
30
)
S/N=3.3dB (EDT)
S/N=7.4dB (T
30
)
S/N=4.4dB (EDT)
S/N=4.6dB (T
30
)
S/N=7.5dB (EDT)
S/N=4.2dB (EDT)
S/N=7.4dB (T
30
)
S/N=7.9dB (T
30
)
EDT limit
T
30
limit
S/N=6dB (EDT)
S/N=29.6dB (T
30
)

110
Figure 4.12. T
30

A side outcome of the Room 10 results showed that for both T
30
and EDT, accurate
measurements were not produced for lower frequencies when simulated background
noise was used with the omni directional source. In this particular setup the omni source
was positioned in between and closer to the receiver compared to the nearest loudspeaker
of the portable sound system. Accordingly, noise was of a higher level at the particular
receiving direction, as opposed to noise simulation from the portable sound system where
the overall level measured at the receiver position was a cumulative result, approximately
from all directions. A higher signal level would be needed from the sound system to
overcome noise in such configurations, however the same level would possibly be
excessive at different receiver positions. As such, the reproduction of background noise
from multiple loudspeakers around the room rather than a single omni directional source
appeared to be more efficient for low level measurements. Better coverage was achieved,
consequently implying more consistent outcomes for the different system configurations
at marginal conditions.

Overall, from the series of measurements performed EDT values emerged as more
consistent. EDT had smaller range value fluctuations and appeared to be significantly
more accurate for particular octave bands among the datasets; the fact that EDT is
generally shorter than T
30
would certainly influence performance in favour of the EDT
measure, see section 4.4.2. The EDT relies on the initial level drop rather than the full
decay range used by T
60
(or x dB decay for T
x
), thus for a non linear sound decay in
particular the EDT will have an advantage producing a representative value under
threshold conditions, as consistently evidenced in figures 4.3-4.12. Under strong non
linearity characteristics the effect would further depend on the degree of non linearity in
relation to the signal level. In an example within Room 3 (Figure 4.5) unstable T
30

measurement behaviour was observed at the 1kHz octave band, where the difference
between T
30
and EDT values increased for a limited signal level range; this was in-
between consistent T
30
measurements. For a linear decay, conditions would primarily
depend on the signal level and the attainment or not of the decay range necessary for the
estimation of the measures considered.
111

The examination of the functions relating to the main intelligibility measure, STI, was
based on the analysis of MTI, effectively an octave band specific STI, in relation to the
T
30
and EDT values; for screening purposes, MTI values were used to derive an
unweighted version of STI for a given measurement, so as to enable a comparison
between reference and experimental conditions.

The MTI values corresponding to the series of low level measurements were assessed in
view of the T
30
and EDT values involved, see figure 4.3. In an example for the 250Hz
octave band in Room 1, MTI (Figure 4.13) appeared to be more closely related to EDT,
given that the initial large fluctuations in T
30
did not significantly affect the resultant MTI
at the corresponding measurements. Comparable behavior was observed for all the rooms
considered, see figures 4.13-4.22, where T
30
was entirely surpassed by EDT in terms of
significance in an MTI assessment.

The extend of the influence of a single erroneous octave band in terms of MTI appeared
to be subject to an averaging procedure at the final STI estimation, see figures 4.13-4.22
for MTI and figures 4.23-4.32 for STI. For example, in Room 1 (figure 4.23) the 250Hz
octave band alone did not appear to notably affect the final STI; accurate STI values were
derived even when involving MTI
(250Hz)
variation exceeding the JND. The effect is more
obvious when considering additional octaves, given that accurate STI values in the series
of measurements extend beyond the first erroneous MTI values, occurring at particular
octave bands. Depending on the frequency considered, an octave band weighting that is
normally applied when estimating the final STI value could enhance or reduce the effect.

112

S/N=5.1dB (MTI)
S/N=2.2dB (MTI)
S/N=5.1dB (MTI)
S/N=-0.8dB (MTI)
S/N=1.6dB (MTI)
S/N=2.4dB (MTI)
S/N=4.3dB (MTI)

MTI limit

Figure 4.13. MTI data in Room 1
113

S/N=6.4dB (MTI)
S/N=5.2dB (MTI)
S/N=8.8dB (MTI)
S/N=6.8dB (MTI)
S/N=9.5dB (MTI)
S/N=10.4dB (MTI)
S/N=10.9dB (MTI)
Figure 4. 14. MTI data for Room 2

MTI limit

114

S/N=0.9dB (MTI)

MTI limit

S/N=3.3dB (MTI)

S/N=2.5dB (MTI)

S/N=-0.7dB (MTI)

S/N=2.1dB (MTI)

S/N=1.5dB (MTI)

S/N=2.4dB (MTI)

115

S/N=5.8dB (MTI)

MTI limit

S/N=3.5dB (MTI)

S/N=2.5dB (MTI)

S/N=6dB (MTI)

S/N=-3.6dB (MTI)

S/N=-2.3dB (MTI)

S/N=-3.2dB (MTI)

116

S/N=-1.4dB (MTI)
S/N=-2.9dB (MTI)
S/N=3dB (MTI)
S/N=3.3dB (MTI)
S/N=3.1dB (MTI)
S/N=1.3dB (MTI)
S/N=10.3dB (MTI)
MTI limit

117

S/N=7.4dB (MTI)

MTI limit

S/N=5.1dB (MTI)

S/N=1.5dB (MTI)

S/N=4.2dB (MTI)

S/N=1.9dB (MTI)

S/N=-1.3dB (MTI)

S/N=3dB (MTI)

118

S/N=3.8dB (MTI)
S/N=2dB (MTI)
S/N=1.9dB (MTI)
S/N=2.4dB (MTI)
S/N=5.5dB (MTI)
S/N=6.7dB (MTI)
S/N=9.7dB (MTI)

MTI limit

119

S/N=4.8dB (MTI)
S/N=2.8dB (MTI)
S/N=6.3dB (MTI)
S/N=-0.3dB (MTI)
S/N=1.5dB (MTI)
S/N=0.8dB (MTI)
S/N=11dB (MTI)

MTI limit

120

S/N=2.5dB (MTI)
S/N=9.5dB (MTI)
S/N=10.2B (MTI)
S/N=8.9dB (MTI)
S/N=2.9dB (MTI)
S/N=7.6dB (MTI)
S/N=9.1dB (MTI)

MTI limit

121

122

S/N=6.2dB (MTI)
S/N=7.9dB (MTI)
S/N=4.8dB (MTI)
S/N=4.6dB (MTI)
S/N=2.3dB (MTI)
S/N=7.4dB (MTI)
S/N=7.5dB (MTI)

MTI limit


STI limit
Figure 4.23. STI in Room 1 (no frequency weighting)

STI limit


STI limit


STI limit


STI limit

123

STI limit


STI limit


STI limit


STI limit


STI limit

124

As the derivation of STI is subject to the averaging processes, at the MTI (and mF) level,
a somewhat reduced sensitivity of the measure to T
30
and EDT fluctuations could be
expected and appear normal.

4.4.2 Correlation of T
30
(and EDT) with threshold efficient S/N
The relation between threshold efficient S/N and T
30
EDT values was examined for the
ten test rooms in order to verify a prospective relation.

Assumptions made in the initial investigations in relation to T
30
could be confirmed, with
a number of exceptions as in Room 3 and Room 9, see Appendix 4.1. The potential effect
of the measurement system in this respect was later examined by a series of
measurements, aimed in verifying consistency as well as repeatability for a given
configuration, see section 4.4.3. The outcome relating to Room 3 was attributed to the
rooms ceiling design, forming effectively a coupled space, given an opening of 0.5m
around the perimeter of the suspended ceiling. The size of Room 9 in combination with
transient noise events exceeding the simulated noise level was mainly responsible for
differentiating results in the specific room. Nonetheless, correlation coefficients ranging
from 0.70-0.92 were established for the greater part of experimental conditions with a
mean average approximating 0.80; see Appendix 4.1 for the full set of results.
Comparable performance was further observed in terms of EDT, see Appendix 4.1, the
only exceptions being Room 1 and Room 3 due potentially to the shape characteristics
(coupled space) in Room 1 and the ceiling design in Room 3, as previously described.

The overall results, as shown in figures 4.33-4.34, demonstrate an evident trend in terms
of the relation of threshold efficient S/N to T
30
and EDT. For increasing values of the
latter a higher S/N is generally required for an accurate measurement, while for the
smaller sized test rooms forming the main focus of the current study the effect was more
evident. Data points relating to smaller rooms resulted in a better defined relation in the
T
30
case, see figure 4.33, than larger or noisier rooms (when considering BGNL and
transient noise events, see Appendix 5). A smaller spread of results was found for data
125

relating to EDT accuracy, see figure 4.34, while the advantage of the measure over T
30

for marginal conditions could be highlighted, given the lower S/N appearing within the
data sets.

Threshold efficient S/N for ascenting T30
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
-5 0 5 10 15 20 25 30
S/N (dB)
Room 1
Room 2
Room 3
Room 4
Room 5
Room 6
Room 7
Room 8
Room 9
Room 10
Figure 4.33. Threshold efficient S/N trend in relation to T
30
, 125-8kHz octave band data in ten test rooms

Threshold efficient S/N for ascenting EDT
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
-5 0 5 10 15 20 25 30
S/N (dB)
Room 1
Room 2
Room 3
Room 4
Room 5
Room 6
Room 7
Room 8
Room 9
Room 10

Figure 4.34. Threshold efficient S/N trend in relation to EDT, 125-8kHz octave band data in ten test rooms

126

Higher signal levels are used to obtain adequate S/N at lower frequencies in particular,
due to the effect of BGNL. Given that higher frequencies are more intrusive in terms of
annoyance, a significantly more tolerable level can be achieved by suitable signal
equalization to avoid unnecessary high levels at particular octave bands, i.e. equalization
according to the BGNL, to produce a constant S/N across the frequency spectrum,
facilitating measurement accuracy. The use of equalization on this basis assumes the
derivation of speech intelligibility parameters via post processing of the measurements to
suitably account for the influence of speech level and BGNL.

4.4.3 Repeatability of results with/without simulated noise floor
In an attempt to enhance control on the variable of fluctuating BGNL, a pink noise source
was incorporated in the reproduction system during measurements for the majority of the
sessions. The efficiency of the approach was examined within two test rooms (a small
sized reverberation chamber (N11) and Room 10, see figure 4.35), using six system
configurations as described in section 4.2.2 so as to verify the consistency of the
measurements in this respect. With repeatability of results suffering from occasional
transient noise events exceeding the simulated noise level and partly non linear decays
within the rooms, it was shown that the incorporation of pink noise provided on
occasions a limited advantage in terms of measurement repeatability.

Tables 4.3-4.4 compare the outcome for the six different system configurations.
Considering results in two groups within each table depending on the source type used
for the signal reproduction (i.e. SS or Omni), a close relationship was found among the
datasets. A number of exceptions were observed, relating mainly to the two
configurations not using simulated noise. Accordingly, the incorporation of pink noise
appeared overall to increase confidence on the results. Taking into account the S/N
derivation methodology (i.e. using the last 10% of the impulse response) however, it
should be reminded that a deterministic outcome cannot be assumed, given also the
variable of decay linearity that could not be controlled.

127

I) II)

Figure 4.35. Test rooms used in the assessment of result repeatability, I) N11, II) Room 10

Threshold efficient S/N (dB)
Measurement configuration
125Hz 250Hz 500Hz 1kHz 2kHz 4kHz 8kHz
SS (signal), SS (noise) 14.3 21.3 13.7 8.2 9.8 4.3 16.5
SS (signal), Omni (noise) 14.6 21.7 13.2 9.3 18.2 5.1 15.3
SS (signal), No simulated noise 20 19.4 14.4 10 10.7 6.3 16.8
Omni (signal), Omni (noise) 11.1 11.5 8.2 17.3 5.8 12.4 6.6
Omni (signal), SS (noise) 9.9 10.6 15.7 13.9 7.3 13.3 5.6
Omni (signal), No simulated noise 17.3 10.6 10 7.5 10.2 10.9 1.6
30
data in N11
(reverberation chamber)
Threshold efficient S/N (dB)
Measurement configuration
125Hz 250Hz 500Hz 1kHz 2kHz 4kHz 8kHz
SS (signal), SS (noise) 11.8 8 -0.1 6.4 3.6 6.9 6.4
SS (signal), Omni (noise) 10.5 6.1 -1 6.9 3.3 5.4 7.9
SS (signal), No simulated noise 12.1 0.8 2.4 5 3.8 6.7 -5.7
Omni (signal), Omni (noise) 10.9 13.6 6.4 6.6 5.7 3.6 3.9
Omni (signal), SS (noise) 9.6 12.1 4.5 8.8 4.4 4.9 5.7
Omni (signal), No simulated noise 12.3 13.4 5.3 8.4 4.7 4.6 3.6

30
data in Room 10

In a correlation analysis for the two datasets, see Appendix 4.2 for an overview, results
within N11 demonstrated a high correlation of EDT with the threshold efficient S/N
128

approaching 0.91 and a mean average of 0.81, while being consistently over 0.77. The
equivalent results within Room 10 showed a similar degree of consistency, excluding
however the system configurations using the portable sound system for reproduction of
the test signal. The latter resulted in a considerably less correlated relation between EDT
and threshold efficient S/N due to the size of the space, in connection to the additional
number of sources. It should be remembered that as the receiver positioning within a
space highly influences the EDT, so does the source positioning. For marginal conditions,
an increased number of sources would thus potentially increase measurement uncertainty
in terms of EDT.

Comparable results were obtained in terms of T
30
for Room 10, where using a single
source demonstrated a more evident relation of threshold efficient S/N to T
30
. The N11
results produced the same trend in terms of the sound source used, however, with
significantly lower correlation values overall. The reverberation chamber appeared to
give less consistent results due to fluctuating noise levels and a relatively long RT in a
small room. A similar effect was also found for the portable sound system in both rooms,
given the size of Room 10 and the long RT in N11.

It was established that using simulated background noise in any of the four possible
configurations did not introduce any setbacks, having comparable results for all four
datasets in terms of threshold efficient S/N derived from marginal T
30
data. Considering
the output from the two configurations not using simulated background noise it was
established that the incorporation of pink noise did provide a limited advantage in terms
of measurement repeatability, while the equivalent results for S/N derived from marginal
EDT data produced a less evident differentiation in performance.

A correlation analysis showed only small deviations when considering the EDT data for
both rooms, with the exception of the SS case in Room 10 (large room). Here, using the
multiple loudspeaker arrangement (SS) for signal reproduction, the use of simulated
background noise resulted in a less correlated relation of threshold efficient S/N to both
T
30
and EDT. In terms of T
30
, system performance for the three remaining datasets gave
129

130
no clear differentiation. However, the use of a single (omni directional) source rather than
a multi source arrangement (SS) for signal reproduction appeared to give a more
consistent outcome in terms of the relation of T
30
to threshold efficient S/N, for both
Rooms 10 and N11.

4.5 Conclusions
In this chapter low level measurements within ten test rooms have been analyzed in terms
of a reducing S/N and its effect on measured parameters, most notably T
30
, EDT and STI.

Overall, EDT emerged as the more consistent parameter when compared to T
30
, having
reduced value fluctuations for the series of measurements performed and appearing to be
significantly more accurate under marginal conditions. The outcomes as presented,
demonstrated a trend in terms of the relation between the threshold efficient S/N to T
30

and EDT. For increasing values of room reverberance a higher S/N is generally required
for an accurate measurement in the respective terms. Overall, a 20dB and 12dB S/N was
necessary in this study for an accurate estimation of T
30
and EDT respectively, over all
frequency bands with no signal averaging. An advantage of EDT could thus be
highlighted.

STI has been found to be more closely related to EDT than to T
30
. The different
averaging processes in deriving STI were highlighted, suggesting that errors at the MTI
(or mF) level can potentially be averaged out.

In the following chapter the acoustical conditions of the test rooms are analyzed on the
basis of computer modelling, where the outcomes relating to EDT efficiency are put into
a different context to highlight obtainable advantages in the new terms.
Chapter 5 Computer modelling of test spaces

CHAPTER 5
Computer Modelling of Test Spaces

5.1 Introduction
In Chapter 2 core modelling approaches for the prediction of acoustic conditions in
enclosed spaces have been presented. Hybrid models have emerged as the most
appropriate solution for room acoustic investigations
[101]
. Clear guidelines for the
preparation of models however do not exist. The geometry detail level needed for an
accurate prediction is one of the key factors that is open for interpretation. Additional
parameters central to the process include absorption and scattering functions relating to
the room surfaces. As such, absorption and scattering coefficients need to be realistically
defined considering the actual room configuration for an acceptably accurate prediction.
Often these coefficients are approximated, leading to potentially significant prediction
errors particularly when combined with additional variants e.g. a simplified definition of
room geometry or erroneous source directivity.

An approximation of the actual room geometry is commonly used depending on the size
of the room and its features, initiating from rules of thumb that arguably concern large
spaces, see section 2.7.1. Consequently, they do not apply in smaller classroom type
spaces as will be shown in this chapter.

A large selection of absorption coefficient data is widely available, normally relating to
random incidence of sound. This approximation is considered suitable for simulation
purposes, though random incidence angles present an uncertainty factor that is
nonetheless acceptable
[92]
. Possible exceptions in the usability of such data are cases
requiring angle dependent data, e.g. flat rooms
[92]
. Accordingly, classroom type spaces
131

can comprise a special case for which the estimation of suitable absorption coefficients
becomes more complex. The use of typical values in this case will invalidate the
prediction.

Similarly, a set of scattering coefficients needs to be defined to describe the amount of
energy that will not follow a specular reflection from a surface. Sound diffusion and
scattering are used to characterize diffuse reflection and comprise a more ambiguous term
as their measurement has not been standardized until recent years; data for different
surfaces is thus limited. In 2001, the AES Working Group SC-04-02 published a
document
[ 124 ]
standardizing the measurement procedure to quantify diffusion
coefficients. The latter however was not intended as input for computer models and is
thus generally incompatible with the type of input required by simulation algorithms. In
contrast, the scattering coefficient is compatible with geometric room acoustics models
(Cox, cited in Nironen
[125]
). The related ISO document
[126]
describes the measurement
of random incidence scattering coefficients in a reverberation chamber, based on a
procedure suggested by Vorlnder and Mommertz
[127]
in 2000. Edge diffraction leading
to scattering is not accounted for in the calculation although it can be considered as less
important given acoustically large room surfaces. It is worth noting that although
scattering coefficient data is gradually becoming available for different surface types,
prediction input values will still have to be estimated in most cases, see Dalenbck
[128]
.
This observation is a result of the uncertainties involved in the calculation of scattering,
relating mainly to the given surface size and potential geometry simplifications that need
to consider accordingly an altered acoustic behaviour in these terms. Although not always
the case, including finer geometric details can simplify the scattering considerations in
this respect, predominantly for small rooms.

Three international round robin projects
[107, 108, 109, 110]
have been undertaken to test room
acoustics prediction software and assess different aspects of the uncertainties involved in
combination with the resulting prediction accuracy/consistency. Results indicated that a
reasonably accurate outcome in the range of 1-2 JND for the different measures or within
the inherent experiment uncertainties can be expected. User input however remains a
132

critical variant that can be reduced in the design of the prediction model and in its
preparation i.e. choosing suitable input data in terms of the absorption and scattering
coefficients. In the same context, different authors have presented their approach in
estimating the particular prediction input (see Hodgson et al.
[129]
, Zeng et al.
[130]
, Saher
et al.
[ 131 ]
etc.). The applicability of the methods however is not universal, thus
questionable in many cases.

In the following sections, the design and preparation of computer models is discussed in
more detail to establish a suitable approach for the types of space used in this study.
Validation of the prediction is similarly a crucial stage, thus the proposed methods are
assessed in terms of their efficiency and resulting accuracy for a number of related
parameters. Ten test rooms are modelled and later examined in view of the experimental
results in section 5.3.

5.2 Preparation methodology
5.2.1 Model design
In large spaces wave effects can be efficiently approximated using statistical theory and
thus, computer modelling can more competently approximate the actual conditions for
such enclosures. Accordingly, modelling small spaces is a challenge, as wave effects are
most prominent in such rooms.

A general rule of thumb in designing a computer model is to use a maximum resolution
of 0.5m for the interior of the room
[92]
. However, it is argued that rules of thumb do not
apply in classroom type enclosures. In small rooms the detail resolution limit can
potentially be extended to marginally finer resolutions for a better interpretation of
conditions (e.g. for receiver positions in close approximation to structures) and a more
efficient account of scattering by the automated algorithms. CAD software can assist in
the design process by offering an improved platform, with subsequent effects on a
number of aspects of the efficiency of the process.

133

A methodology that enables the actual conditions to be better approximated is the main
objective of this chapter. An efficient approach could thus become simpler to define.

5.2.1.1 Model detail resolution
Representing room geometry in a computer model requires an assessment of the room
and its architectural details, so as to evaluate the design resolution necessary for an
adequate computer representation. With the dimensions and shape of the room as a
starting point, additional parameters can be accounted for such as the rooms fittings;
related dimensions, location and density information needs to be appropriately assessed,
considering the case of a simplified version of the specific geometry characteristics, to
determine the model input in these terms. The time needed to construct the actual model
is also an important consideration central to the process. Third party CAD software can
be applied to speed up the process of model construction, influencing the resulting detail
and accuracy of the space representation. Automated processes such as point to point
connections can allow for a reduced number of setbacks in the construction process, e.g.
an open model, misrepresented room symmetry and erroneous object positioning
among others. Considering that CAD platforms are typically architecture oriented, a
reduced construction time can be expected. Geometry details can be significantly
enhanced, see figure 5.1, however this can affect the crucial balance between an accurate
representation and prediction efficiency. The processing time needed for the simulation
can be adversely affected as a direct result of the altered overall process.

Common outcomes of current methodologies are either overly simplified room
geometries, thus being inadequate for a detailed assessment, or inefficient complex
geometries resultant of a simpler (when using CAD software) approach. An experimental
session is presented in this context based on typical lecture rooms i.e. small rooms, see
section 5.3.2. This will establish the potential advantage of a finer design resolution in the
construction of a model, as reflected on the prediction output for the space type
considered.
134

Figure 5. 1. Example room geometry, I) Simple representation II) Enhanced detail

5.2.1.2 Source directivity
The directional characteristics of the sources used in computer simulations have been
shown to affect, to various degrees, both the numerical output as well as the subjective
perception of auralized material, see Dalenbck
[132]
, Wang
[133]
. For critical conditions in
particular, source directivity can improve or significantly impair speech intelligibility.

A white paper by Dalenbck
[134]
suggests that prediction accuracy does not necessarily
rely solely on the angular resolution of directivity data or frequency resolution used in
computer predictions. An accurate representation of the near field of the source could
also be a critical point. It can thus be deduced that for smaller rooms it is vital to utilize
directivity data measured in the near field of the source considered.

Two types of source were used within the rooms as described in Chapter 3. While a
default omni directional model was used for the omni source, near field response data at
an acceptable angular and frequency resolution for the studio monitors needed to be
established. The latter were measured at a distance of 1m from the source in free field
conditions for a 10 resolution (1/1 octave bands), according to the measurement
procedure described BS EN 60268-5:2003
[ 135 ]
. The obtained results, as used in the
simulations are shown in figures 5.2- 5.3.

135

Figure 5. 2. Measured directivity response for Yamaha HS50M monitors at 1m in free field conditions
(balloon)

136

Figure 5. 3. Measured directivity response for Yamaha HS50M monitors at 1m in free field conditions (polar)

5.2.1.3 Definition of absorption and scattering coefficients
Accurate absorption and scattering coefficients are fundamental in achieving acceptable
prediction accuracy. A number of aspects are involved in the process as described in
section 5.1; surface absorption is primarily considered.

Rules of thumb for defining scattering functions can be followed i.e. a minimum of 20%
default diffusion for average-size smooth flat surfaces or 10% for big flat smooth
surfaces, a high value (80%) for rough surfaces where the roughness scale is of the
order of the wavelength and edge diffusion
[128]
. At the same time it is clear that
overestimating rather than underestimating scattering coefficients is preferred. These
simple steps, in combination with an adequately detailed room geometry, can serve to
efficiently approximate scattering effects, though minor tuning might be needed post
prediction to rebalance the influence of absorption. It is worth noting that the sensitivity
of a room to scattering coefficients in the context of prediction output may depend largely
on the amount of absorption present; i.e. larger output variations for lower overall
absorption when altering scattering input values
[136]
.

137

Absorption coefficients can be deduced from measurements if the computer model is to
be used as a post evaluation tool. However, this is not the case for the design condition.
In the latter case a calibration procedure to match the prediction outcome to measured
values (see section 5.2.2) cannot be performed since room acoustics measurements are
unavailable. Predefined values of absorption coefficients thus need to be used and the
user is expected to assess the suitability of any data. There is a large selection of related
databases available however, text book values are to some extent not valid for normal
enclosed spaces due to the unrealistic conditions under which they were measured e.g.
free field. Acceptable statistical approximations in the context of absorption coefficients
are described in section 5.1; however, for small rooms in particular a given dataset will in
many cases not match the actual conditions, if individual cases e.g. a representation of an
audience area are not suitably evaluated and their acoustical characteristics readjusted
when necessary. An alternative approach is an empirical estimation of absorption, as
described by Hodgson and Scherebnyj
[129]
. Using acoustical measurements in actual
classrooms, the absorption coefficients for a number of materials were empirically
calculated, in some cases significantly differentiating from text book values. However,
the applicability of such data is conceivably limited to nearly identical rooms since the
definition of absorption is an absolute case. A combination of carefully selected text book
values with empirical data would potentially be a most efficient approach.

In contrast, given that a computer simulation is utilized as a post evaluation tool, the
potential availability of actual room acoustics data enables the option of an objective
prediction accuracy assessment and fine tuning of results i.e. validation/ calibration.
Considering the limitations of empirical procedures, an estimation of absorption
coefficients by comparison to actual room acoustics measurements would certainly be the
most appropriate solution.

The process of confirming the prediction consistency by comparison to measured
parameters is referred to as model validation. Similarly, fine tuning a prediction to match
the results to measured parameters is referred to as model calibration. A
validation/calibration procedure in terms of T
30
has been found to be an efficient
138

methodology
[114, 121]
. Absorption (and scattering) coefficient values are initially
approximated allowing the fine tuning of the simulation by comparing actual and
predicted acoustic parameters, normally T
30
, although alternative parameters could be
used
[114]
. The validation/calibration procedures are described in the following section.

5.2.2 Model validation/calibration methodology
The validation/calibration procedures assume the use of computer models as a post
evaluation tool. As such, approximate scattering and absorption coefficients act as a
starting point. Scattering coefficients must be generically set if no specific data is
available, according to theoretical assumptions i.e. depending on the surface area and
material roughness of related surfaces, see section 5.2.1.3. Fine tuning can be later
performed, if necessary, after establishing a balance in the definition of surface
absorption coefficient values; however further action is rarely required.

A simple calibration procedure can be used to generically adapt the model to the actual
conditions in terms of the absorption coefficients. Reverberation time (T
30
) values
comprise the determinants of model performance and thus, a step by step evaluation is
enabled using multiple receiver positions, typically six. Performance optimization is
initially undertaken for one receiver, preceding sequential reference to data for the
additional receiver positions. Reaching a balance among the prediction setups will enable
the determination of a set of acoustic properties for the room surfaces, which can be
assumed as correct and thus, allow for reliable predictions under the conditions necessary
for different scenarios/experimental setups in the same room.

The calibration procedure can be similarly applied with the use of alternative acoustic
parameters e.g. EDT rather than T
30
. While T
30
has been shown to largely incorporate
most of the room acoustical characteristics
[1]
the potential advantages of using EDT in
particular is later examined and discussed in more detail, see sections 5.4 and 5.5.

139

5.3 Experimental results
In the context of small enclosed spaces, this section presents experimental results relating
to the validation/calibration procedures. The focus is on the acoustic parameter to be used
in the process, so that the prediction output and room acoustics measurements can be
compared. Results are further complemented by an assessment of the level of detail
resolution that is required by a model for an accurate prediction in the terms used.

5.3.1 Basis for model validation/calibration
The key procedures were examined in practice in a pilot study with the use of three test
rooms. The latter were objectively assessed onsite and their geometries modelled so as to
enable a comparison of results, thus allowing for an evaluation of the methods used in the
approach. Four source configurations utilizing an omni directional speaker and a portable
sound system were used in each room, as described in Chapter 3. This provided the basis
for the prediction and measurement sessions. In the following sections, the test
methodology is described and key points are discussed in view of the results obtained.

5.3.1.1 Test methodology - Room acoustics measurements
Room acoustics measurements were performed in the three test rooms using a WinMLS
2004
[6]
based measurement system combined with a pair of omni directional sound
source and receiver. A swept sine test signal was utilized to excite the space and multiple
measurements, based on BS ISO 3382
[118]
, were taken for six receiver and two source
positions.

5.3.1.2 Test methodology T
30
calibration
A T
30
based approach was used as described in section 5.2.2 to match the prediction
output to room acoustics measurements; the procedure considered only one source
configuration.

140

5.3.1.3 Test methodology Results via output comparison
The calibration procedure precedes a direct comparison of prediction output to actual
measured values. Consequently, the simulation quality can be assessed in terms of any
number of parameters that are included in the process.

The calibrated models for the three test rooms (figure 5.4) achieved a good result in terms
of the measured T
30
, i.e. prediction and measurement output was comparable. To validate
the simulations response to parameters other than the reference, used for calibration,
additional measures were examined, with the C
50
and STI measures being of most interest
due to their high correlation to speech intelligibility.

Table 5.1 shows a comparison
between actual and predicted values.

For the single source case it was found that resulting values were comparable, giving an
STI (and C
50
) marginally over the JND. A somewhat altered character for the sound
system assisted conditions was observed here in some instances, any differentiations
however attributed to the input in terms of the sources characteristics and simple room
geometry.

The incorporation of a sound system generally requires accurate performance
characteristics at hand to ensure a realistic comparison. However, given a consistent
model, experimentation for different scenarios is enabled in any case since predictions
can reveal to a large extend the room potential and/or limitations on a relative basis. The
calibration procedure provided confidence that the models would perform to an extent
consistently under different configurations, such as for alternative source types.

Figure 5. 4. Top view of test rooms for the validation of T
30
calibration methodology
141

142
F[Hz] 125 250 500 1000 2000 4000
Prediction Measurement Pred. Meas. Pred. Meas. Pred. Meas. Pred. Meas. Pred. Meas.
EDT[s] 0.58 0.61 0.56 0.60 0.61 0.47 0.67 0.53 0.73 0.57 0.71 0.54
T15[s] 0.58 0.61 0.55 0.55 0.61 0.57 0.7 0.64 0.76 0.68 0.72 0.67
T30[s] 0.60 0.65 0.58 0.58 0.64 0.63 0.72 0.77 0.78 0.80 0.73 0.73
C50[dB] 4.2 2.8 4.4 3.5 3.6 5.8 3 5.3 2.2 4.2 2.6 4.5
SPL[dB] 76.8 N/A 76.6 N/A 77.3 N/A 78.1 N/A 78.6 N/A 78.2 N/A
STI 0.68 0.73 Rating: Good Good
STIrMal N/A 0.73 Rating: N/A Good
STIrFem N/A 0.74 Rating: N/A Good

Table 5. 1. Example mean values for single omni directional source (prediction against measurement)

5.3.1.4 Session conclusions
The validation procedure appears to be a reliable way to adapt the models to the actual
conditions. The process, performed in terms of the RT (T
30
) values at the receiver
positions confirmed, to some extent, that the specific parameter largely incorporates the
general room characteristics for computer modeling purposes.

Smaller differences, found for T
15
and EDT in the comparison, suggested that
optimization relying solely on T
30
is principally a simplified though efficient version of
the process. Given a more detailed approach, a need for further references e.g. EDT, SPL
might aid the simulation to perform at a higher accuracy level.

5.3.2 Model resolution
In the following sections, the influence of the detail level incorporated in a computer
model is examined in terms of the resulting prediction accuracy and overall efficiency.

5.3.2.1 Assessment preparation and the impact of detail resolution
The computer models used in this assessment were constructed using two different
approaches to examine the efficiency of a particular design in terms of level of detail. For
reference purposes, the first approach used a coordinate system in text file format that is
typical within modelling software packages. Considering the time needed to obtain a final

working version, see figure 5.5 (I), basic architectural detail was incorporated with a
construction time ranging from ~24-96 hours.

In the second approach, third party CAD software
i

[137]
was used to construct and later
export
[138]
the models in a format compatible with the prediction software. Given a faster
working routine a greater amount of room detail could be incorporated in the models, see
figure 5.5 (II), within a significantly reduced construction time. Subsequent effects on
simulation run time and quality of results were recorded and are later referenced in more
detail. Model construction required ~4-8 hours for a final working version of the rooms.

II) I)

Figure 5. 5. Example of model detail resolution in Room 8, I) Via coordinate system, II) Via CAD software

Computer simulations were performed in CATT Acoustics v8.0f
ii

[128]
where all models
were debugged prior to use. A calibration procedure based on T
30
values was used to
generically adapt the simulations to actual conditions, resulting in the derivation of two
data sets for the test conditions. A direct comparison of the predictions to actual values
was thus possible.

5.3.2.2 Assessment result for a single omni directional source
The output from the prediction process was compared to the equivalent output from
actual measurements to determine the model efficiency in the terms used (T
30
, EDT, C
50

and STI). The models analyzed, abbreviated to simple and CAD, achieved a diversity
of results, with the accuracy level overall being greater for the CAD model. Exemplar

i
CAD software platform was selected by considering processing features, simplicity of use and

availability. Selected software was
Google SketchUp Pro v.6 incorporating the Rahe-Kraft exporting plug-in.
ii
CATT Acoustics v8.0f is a leading simulation software platform, selected on account of its hybrid nature, processing features and
availability.
143

data presented in this section for Room 8 (tables 5.2-5.7, figures 5.6-5.9) demonstrate the
differences for the two models that, at various instances, both provided a satisfactory
result.

Given that validation was performed in terms of T
30
values both models achieved a good
result on this basis (tables 5.2 and 5.5). For additional parameters and/or altered
conditions nonetheless e.g. different source type and position (see also section 5.3.2.3),
the simple model was found to be unconstructively influenced in terms of its prediction
accuracy.

Table 5. 2. T
30
for actual and predicted conditions
(simple) in Room 8
Table 5. 3. EDT for actual and predicted conditions
(simple) in Room 8

Figure 5. 6. Mean EDT for actual and predicted
conditions (simple) in Room 8

Table 5. 4. C
50
(simple) in Room 8
Figure 5. 7. STI comparison for actual and predicted
conditions (simple) in Room 8

144

Table 5. 5. T
30
(CAD) in Room 8
Table 5. 6. EDT for actual and predicted conditions
(CAD) in Room 8

Figure 5. 8. Mean EDT for actual and predicted
conditions (CAD) in Room 8

Table 5. 7. C
50
(CAD) in Room 8

Figure 5. 9. STI comparison for actual and predicted
conditions (CAD) in Room 8

In the EDT comparison high accuracy was observed for the CAD model, while the
simple model appeared to loose consistency in these terms (tables 5.3 and 5.6). The
EDT prediction had a direct effect on the derivation of STI, as seen in figures 5.7-5.9.
With the STI (and C
50
) measure being of most interest,

the comparison to measured
values showed STI prediction errors up to twice the JND for the simple model with
discrepancies over 0.04 at all receiver positions. For the CAD model in contrast, values
were within the specified limits over all but one receiver position. C
50
results appeared
more satisfactory for both models, however with noticeably higher precision for the
CAD model, see tables 5.4 and 5.7. Receiver position 6 resulted in a larger error than
the trend for the prediction for both models; this however attributed to a limited effect of
localized BGNL on the consistency of room acoustics measurements.
145

5.3.2.3 Use of Alternative Source Configurations
Experimentation using source configurations other than the original omni directional
arrangement (S1) aimed in establishing the simulation consistency for additional
configurations. Given that validation normally takes place for a single condition i.e. one
source and a set of receiving positions, the combination of validation comprehensiveness
and model detail can be used as an indication of the prospective prediction consistency
when using alternative configurations.

Given the current methodology, the simple and CAD models were examined to
establish differences in performance on this basis. Simulating a sound system installation
of four directional loudspeakers a new set of predictions was performed without
additional fine tuning, and compared to the measured values (tables 5.8 and 5.9, see also
section 5.4). The variation in each case when compared to measurements showed that the
simple model resulted in reasonable accuracy being however notably influenced by the
altered environment. Discrepancies in terms of T
s
, C
50
and C
80
were marginally
acceptable however an error of 0.04 was found in the predicted STI. A more elaborate
initial validation could potentially be used in this case to enhance performance. Smaller
magnitude differences were found for the CAD model, where most notably the STI
discrepancy was 0.01.

F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 17.2 5.8 10.7 13.6 11.2 12.1
C50(dB) -0.9 -0.4 -2.6 -4.4 -3.5 -3.2
C80(dB) -0.9 0.2 -3.0 -4.4 -3.7 -3.6
STI -0.04

Table 5. 2. Example of average error for Simple model using an alternative
source configuration in Room 8 (sound system, SS4)

F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 7.2 1.2 4.8 7.6 4.0 5.5
C50(dB) 0.8 0.7 -0.7 -1.9 -0.8 -1.2
C80(dB) 0.9 1.1 -1.3 -2.2 -1.3 -1.9
STI -0.01

Table 5. 3. Example of average error for CAD model using an alternative
source configuration in Room 8 (sound system, SS4)
146

A more detailed data examination revealed the extend of the discrepancies for individual
receiver positions, see figures 5.10 and 5.11. Significantly larger predicted STI errors up
to 0.09 (typically JND for all receiver positions) were found in the simple model,
while the CAD model predictions were more consistent with the measurements. In the
latter case, discrepancies of up to 0.05 (typically JND for the majority of receiver
positions) were found. Thus, given an assessment via simple type models, the user
would need to account for an increased error margin that is introduced due to the reduced
detail resolution.

Figure 5. 10. Example STI error in multi source conditions
(S4) for Simple model in Room 8

Figure 5. 11. Example STI error in multi source conditions
(S4) for CAD model in Room 8

Overall the session gave further confidence in using a more detailed model for alternative
experimental conditions.

5.3.2.4 Discussion
In the following paragraphs, a number of simulation efficiency aspects are discussed with
reference to the model conditions that enhance usability in the current context.

Reference for Model Performance (T
30
)
The validation procedure using T
30
appeared as a reliable way to adapt the models to the
actual conditions. Given the detail resolution in each case the process could calibrate the
147

148
simulation, nonetheless being limited to an extent by the accuracy potential of a simple
model. The resultant prediction accuracy could be described as adequate; however, for
finer model detail the calibration process enabled an efficient simulation in more complex
cases when examining alternative experimental conditions.

Earlier investigations on the topic
[114]
suggested that a more detailed approach in the
validation procedure e.g. using EDT and SPL as additional references in model
calibration, could allow a simple model to perform closer to the required accuracy for a
complicated task. This study demonstrated that for a more detailed model the accuracy
level is simultaneously increased for additional parameters (e.g. EDT, C
50
) without
further effort, thus a more elaborate approach would be unnecessary at this stage.

Model Optimization for Improving Run Time
Simulation efficiency is a complex topic, nonetheless, often resolving in balancing run
time with prediction accuracy and model development time. Typical run times ranged
from 3-5 minutes for the simple version and 8-20+ minutes for the CAD version (on a
Pentium M 2.0Ghz computer). Concentrating on the latter case, it was established that the
increased and to some extent unnecessary detail resulted in an increased number of
surfaces within the model. This was a key factor for the additional time required to
complete the prediction. To address the problem, simple steps were taken via model
redesign to reduce the number of surfaces in use, see figure 5.12.

I) II)

Figure 5. 12. Model detail example, I) Full, II) Optimized

The main characteristic in the current example is an improved construction allowing the
model to retain the original design e.g. including individual desk and seating models,
having nonetheless a significantly reduced number of surfaces. The end result in a
number of cases was a reduced run time, being on average ~3 times faster than the draft

CAD version (depending on the number of surfaces being excluded) and thus comparable
to the simple model.

5.3.2.6 Session conclusions
The experimental procedure suggested that using either simple or CAD models can
result in an acceptably accurate prediction. Nonetheless, enhanced performance was
observed for the more detailed geometry representation in this study.

The validation/calibration procedures using T
30
values proved to be an efficient
methodology for the purpose of adapting the simulation to an actual environment without
the need for additional references in the process i.e. a more elaborate calibration. While
the latter option, although notably impractical to implement, could potentially for a
simple model enhance the prediction accuracy for parameters other than the reference RT,
it appeared unnecessary for more detailed models e.g. CAD where a high degree of
consistency was demonstrated in this respect. The more detailed geometry also enabled a
more accurate prediction of acoustical environments for conditions differentiating from
the original model state that was used for validation.

Overall, it was established that only details that are essential for the prediction output
should be incorporated in a computer model, as the complexity of the latter has a direct
effect in resulting efficiency. Among others, a significantly reduced run time can also be
expected for models having a well balanced detail level.

5.4 Prediction results
In the following sections a comparison of predictions and measurements is presented for
the ten primary test rooms in the study. The simulation sessions were based on models
that were designed (see figures 5.13-5.22) in line with the experimental outcomes as
described in section 5.3. Results shown (tables 5.10-5.29) are the product of the
validation/calibration process
i
, performed primarily in terms of T
30
. EDT was also used

i
For comparison purposes and to confirm the suitability of the methodology, see section 5.2.1.3, Appendix 6.1 presents exemplar
prediction data for a generic/initial version of the models, i.e. based on a generic definition of scattering coefficients, and textbook
derived absorption coefficient input.
149


in a number of cases, see section 5.5.2, to support the viability of the alternative approach.
The optimization of the prediction process and outcomes are later discussed, further
referencing the use of EDT for prediction calibration purposes.

5.4.1 Room 1 data
Table 5. 4. Comparison of prediction data to room acoustics measurements in Room 1 (omni source), averaged over
all receiver positions

Measurement and prediction data in Room 1 for Omni source (1 2)
F[Hz] 125 250 500 1000 2000 4000
EDT[s] 0.63 0.55 0.52 0.60 0.39 0.4 0.36 0.41 0.39 0.46 0.40 0.41
T30[s] 0.78 0.70 0.61 0.58 0.44 0.43 0.44 0.41 0.56 0.47 0.51 0.47
Ts[ms] 41.3 48.3 33.5 47.8 21.6 29.5 18.9 27.8 21.2 29.3 20.6 26.8
C50[dB] 3.9 4.4 5.4 3 8.4 7.1 9.1 7 8.5 6.3 8.5 7.2
C80[dB] 7.5 8.2 9.3 7.4 12.8 12.7 13.7 12.2 12.9 11 12.8 11.6
O
m
n
i

s
o
u
r
c
e

1

STI 0.79 0.77 Rating: Excellent Excellent
EDT[s] 0.63 0.57 0.52 0.57 0.38 0.38 0.36 0.41 0.37 0.44 0.39 0.42
T30[s] 0.76 0.69 0.68 0.58 0.48 0.45 0.49 0.42 0.5 0.48 0.54 0.47
Ts[ms] 38.9 47.5 32.3 40 20.5 27.8 18.8 25 19.4 27.5 20.5 26.8
C50[dB] 4.3 4.4 5.6 4.9 8.6 7.5 9.4 7.5 8.9 6.5 8.5 7
C80[dB] 7.8 8.7 9.5 9.2 13.1 12.4 13.5 13.2 13 11.4 13 11.7
O
m
n
i

s
o
u
r
c
e

2

Measurement and prediction data in Room 1 for Sound system (2 - 4 speakers)
F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 43.2 54.8 36.6 41.3 24.1 28.8 23.1 26.8 26.7 29 24.7 27.3
C50[dB] 3.6 3.4 4.9 4 8 7.8 8.3 7.8 7.3 6.2 7.8 6.8
C80[dB] 7 5.9 8.6 8 12.2 12.5 12.6 13 11.7 11.6 12.2 11.6
2

s
p
e
a
k
e
r
s

Ts[ms] 40.4 46.3 35.1 39 24.1 28.5 19.6 24.5 24.9 29 23.9 26
C50[dB] 4.1 4.8 5 5.2 8 6.8 9.2 8.2 7.7 5.7 8 6.8
C80[dB] 7.4 8.6 8.8 9.4 12.3 11.8 13.6 13.6 11.6 11.3 12 11.8
4

s
p
e
a
k
e
r
s


Table 5. 5. Comparison of prediction data to room acoustics measurements in Room 1 (sound system), averaged over
A0, A1
Omni directional source
positions

B1, B2, B3, B3
Sound system (directional)
sources

01, 02, 03, 04, (05, 06)
Receiver positions

150

Figure 5. 13. Geometry representation of Room 1 in simulation software

5.4.2 Room 2 data
F[Hz] 125 250 500 1000 2000 4000
EDT[s] 0.64 0.62 0.53 0.59 0.39 0.42 0.37 0.40 0.40 0.47 0.39 0.45
T30[s] 0.76 0.89 0.64 0.61 0.46 0.47 0.46 0.45 0.52 0.54 0.53 0.55
Ts[ms] 42.2 54.0 34.8 46.5 21.4 32.8 18.9 29.5 20.1 34.0 19.8 32.3
C50[dB] 3.7 3.5 5.1 4.2 8.6 6.8 9.2 7.0 8.7 5.7 8.8 6.0
C80[dB] 7.3 7.1 9.1 7.6 13.1 10.6 14.0 11.5 13.1 9.9 13.1 10.2
O
m
n
i

s
o
u
r
c
e

1

STI 0.79 0.75 Rating: Excellent Good
EDT[s] 0.63 0.70 0.55 0.63 0.39 0.43 0.39 0.44 0.38 0.46 0.38 0.44
T30[s] 0.79 0.80 0.67 0.68 0.49 0.50 0.44 0.45 0.56 0.54 0.52 0.55
Ts[ms] 38.3 55.3 33.1 46.8 19.3 30.5 18.1 28.0 19.0 30.8 18.2 27.0
C50[dB] 4.3 2.5 5.4 3.6 8.9 7.0 9.2 6.8 9.2 6.1 9.3 7.1
C80[dB] 7.9 6.5 9.1 7.6 13.4 11.4 13.6 11.4 13.3 10.7 13.2 11.2
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 41.8 67.0 36.7 42.0 24.2 33.0 23.0 28.3 25.5 34.0 25.6 29.5
C50[dB] 3.8 1.6 4.7 4.3 8.0 6.0 8.4 7.6 7.5 5.2 7.4 6.0
C80[dB] 7.1 3.3 8.4 7.7 11.9 9.4 12.2 12.3 11.4 9.8 10.9 10.2
2

s
p
e
a
k
e
r
s

Ts[ms] 39.8 51.5 35.2 41.0 23.1 27.0 19.6 23.8 23.1 26.8 23.9 28.3
C50[dB] 4.1 4.3 4.9 4.1 8.0 7.5 8.9 7.9 7.8 7.0 7.7 6.6
C80[dB] 7.5 7.0 8.6 8.9 12.1 12.2 12.9 12.4 12.0 11.2 11.4 10.9
4

s
p
e
a
k
e
r
s



151


5.4.3 Room 3 data

F[Hz] 125 250 500 1000 2000 4000
Prediction Measurement Pred.


Meas. Pred. Meas. Pred. Meas. Pred. Meas. Pred. Meas.
EDT[s] 0.45 0.56 0.39 0.35 0.51 0.45 0.37 0.38 0.42 0.44 0.41 0.37
T30[s] 0.56 0.54 0.50 0.53 0.71 0.42 0.44 0.42 0.49 0.53 0.44 0.45
Ts[ms] 27.4 47.5 42.8 24.6 21.0 37.5 22.5 33.8 27.6 36.8 25.8 33.3
C50[dB] 6.8 4.4 4.6 7.7 9.0 5.1 8.5 6.2 6.8 5.2 7.1 6.4
C80[dB] 11.2 8.0 8.9 12.2 13.9 9.6 13.5 11.6 11.5 10.1 12.1 11.8
O
m
n
i

s
o
u
r
c
e

1

STI 0.78 0.78 Excellent Rating: Excellent
EDT[s] 0.45 0.65 0.42 0.47 0.37 0.37 0.35 0.38 0.42 0.42 0.41 0.37
T30[s] 0.54 0.53 0.48 0.52 0.47 0.43 0.47 0.48 0.56 0.60 0.49 0.49
Ts[ms] 25.4 56.0 23.2 39.5 19.7 31.8 19.6 31.0 25.1 34.0 23.6 30.8
C50[dB] 7.1 1.2 7.7 5.2 9.2 6.7 9.5 7.3 7.5 6.1 7.8 7.2
C80[dB] 11.4 6.8 12.3 10.4 13.8 12.8 13.8 11.7 11.7 10.5 12.4 12.2
O
m
n
i

s
o
u
r
c
e

2

F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 30.2 66.3 25.9 52.5 21.9 40.5 21.1 37.5 23.9 41.5 23.5 38.8
C50[dB] 6.3 1.1 7.1 3.2 8.0 5.4 8.8 6.1 7.5 4.9 7.6 5.4
C80[dB] 10.4 6.8 11.5 8.0 12.9 11.2 13.2 11.6 12.0 9.6 12.2 10.3
2

s
p
e
a
k
e
r
s

Ts[ms] 29.0 62.3 28.3 48.3 22.2 29.8 22.1 29.0 25.1 34.0 25.4 31.5
C50[dB] 6.5 1.8 6.7 2.7 8.0 7.7 8.3 8.1 7.2 5.6 7.2 6.6
C80[dB] 10.6 7.6 11.0 8.5 12.7 13.1 13.0 12.9 11.6 10.1 11.9 11.1
4

s
p
e
a
k
e
r
s

152

5.4.4 Room 4 data

Measurement and prediction data in Room 4 for Omni source (1)
F[Hz] 125 250 500 1000 2000 4000
EDT[s] 1.41 1.26 1.23 1.11 0.93 0.95 0.71 0.75 0.64 0.67 0.60 0.65
T30[s] 1.57 1.18 1.33 1.18 1.11 1.01 0.80 0.85 0.77 0.76 0.71 0.72
Ts[ms] 98.9 105.5 86.1 87.5 63.6 68.5 48.3 54.8 42.3 46.0 39.5 45.0
C50[dB] -1.6 -2.8 -0.8 -1.7 1.2 0.2 2.7 2.1 3.7 2.9 4.1 2.8
C80[dB] 1.1 0.4 2.0 2.2 4.2 3.9 6.2 5.3 7.2 6.4 7.8 6.1
O
m
n
i

s
o
u
r
c
e

1


Table 5. 11. Comparison of prediction data to room acoustics measurements in Room 4 (sound system), averaged
over all receiver positions
F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 98.1 67.5 85.7 82.3 63.8 66.3 45.6 50.5 39.2 41.3 38.8 43.3
C50[dB] -1.5 2.4 -0.8 -0.1 1.1 0.8 3.2 2.7 4.1 3.6 4.1 2.9
C80[dB] 1.2 6.4 2.0 3.3 4.0 4.0 6.3 6.1 7.4 7.5 7.7 6.7
2

s
p
e
a
k
e
r
s

Ts[ms] 97.7 60.5 85.5 91.8 64.8 73.0 47.1 57.5 40.9 46.0 40.7 46.0
C50[dB] -1.4 2.4 -0.7 -0.7 1.0 -0.1 2.9 1.6 3.9 3.1 3.9 2.8
C80[dB] 1.2 8.2 2.0 2.6 3.9 3.4 6.2 5.1 7.2 6.7 7.6 6.9
4

s
p
e
a
k
e
r
s



153

5.4.5 Room 5 data

F[Hz] 125 250 500 1000 2000 4000
EDT[s] 1.21 1.27 0.91 0.92 0.63 0.58 0.52 0.54 0.51 0.47 0.47 0.50
T30[s] 1.46 1.70 1.36 1.13 1.07 0.67 0.75 0.62 0.72 0.66 0.68 0.72
Ts[ms] 83.7 96.2 60.2 64.8 35.8 42.2 28.3 35.3 27.3 33.0 27.2 32.8
C50[dB] -0.5 -0.6 1.7 1.5 5.0 4.1 6.3 5.4 6.6 6.1 6.7 5.8
C80[dB] 2.3 2.0 4.7 4.0 8.2 7.8 9.9 8.8 10.2 9.7 10.7 9.7
O
m
n
i

s
o
u
r
c
e

1

EDT[s] 1.20 1.38 0.92 1.00 0.65 0.65 0.53 0.53 0.51 0.53 0.48 0.52
T30[s] 1.44 1.77 1.36 1.08 1.10 0.66 0.73 0.62 0.67 0.66 0.67 0.72
Ts[ms] 82.6 103.7 61.5 69.7 36.1 39.8 29.4 35.5 27.9 35.5 26.7 34.2
C50[dB] -0.6 -1.2 1.4 1.1 4.7 4.5 5.9 5.4 6.2 5.4 6.8 5.6
C80[dB] 2.3 1.5 4.5 3.5 8.2 7.4 9.8 8.9 10.1 9.1 10.7 9.3
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 88.0 114.2 66.3 66.8 44.8 39.7 34.1 34.0 29.1 38.2 28.8 38.2
C50[dB] -1.0 -0.6 1.0 0.8 3.8 4.6 5.4 5.3 6.4 4.8 6.1 4.5
C80[dB] 1.8 1.6 3.9 4.8 6.6 8.1 8.3 9.0 9.3 8.0 9.3 8.1
2

s
p
e
a
k
e
r
s

Ts[ms] 84.7 125.3 63.8 69.7 40.9 47.2 31.4 41.0 25.8 44.0 26.9 45.5
C50[dB] -0.6 -2.4 1.3 0.5 4.2 3.5 5.8 4.6 6.9 3.8 6.7 3.3
C80[dB] 2.1 0.9 4.1 5.0 7.2 7.5 8.9 8.2 10.0 7.5 9.9 7.4
4

s
p
e
a
k
e
r
s



154

5.4.6 Room 6 data

F[Hz] 125 250 500 1000 2000 4000
EDT[s] 0.74 0.62 0.67 0.61 0.49 0.53 0.34 0.46 0.46 0.52 0.43 0.47
T30[s] 0.80 0.54 0.70 0.67 0.53 0.53 0.49 0.49 0.61 0.62 0.62 0.62
Ts[ms] 42.4 46.8 39.3 40.5 26.6 32.0 16.4 28.3 24.1 37.0 22.7 35.5
C50[dB] 3.8 4.8 4.1 4.4 6.6 5.9 9.8 6.6 7.4 4.8 7.9 5.0
C80[dB] 6.8 7.5 7.4 8.6 10.6 9.5 14.6 10.5 11.4 8.8 11.9 9.5
O
m
n
i

s
o
u
r
c
e

1

EDT[s] 0.81 0.64 0.70 0.57 0.51 0.53 0.35 0.43 0.48 0.53 0.45 0.48
T30[s] 0.80 0.58 0.70 0.54 0.53 0.51 0.52 0.51 0.61 0.62 0.62 0.64
Ts[ms] 57.3 59.0 49.2 45.0 35.0 37.3 22.6 31.0 32.8 39.0 30.5 36.0
C50[dB] 1.3 1.3 2.3 3.9 4.9 4.8 8.8 6.1 5.7 4.3 6.2 4.7
C80[dB] 4.7 6.8 6.0 8.7 9.2 8.8 13.3 10.4 9.9 8.2 10.6 8.3
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 54.3 63.8 46.8 52.8 32.0 36.0 20.6 30.3 27.1 34.8 28.5 32.3
C50[dB] 1.9 1.5 2.8 2.5 5.5 5.2 8.7 6.4 6.7 5.1 6.3 5.7
C80[dB] 5.0 4.5 6.3 5.7 9.7 9.9 13.6 10.5 10.7 8.8 10.4 10.4
2

s
p
e
a
k
e
r
s

Ts[ms] 53.0 63.3 46.4 51.5 31.0 38.0 20.5 28.8 27.4 33.5 29.8 39.0
C50[dB] 2.2 1.5 3.0 2.7 5.8 4.7 8.9 6.5 6.6 5.2 6.2 4.0
C80[dB] 5.3 5.5 6.4 5.9 9.8 9.2 13.2 9.9 10.4 9.2 10.1 7.8
4

s
p
e
a
k
e
r
s




155

5.4.7 Room 7 data

F[Hz] 125 250 500 1000 2000 4000
EDT[s] 0.84 0.57 0.69 0.60 0.48 0.58 0.42 0.53 0.43 0.58 0.39 0.58
T30[s] 0.88 0.86 0.69 0.66 0.52 0.57 0.54 0.54 0.61 0.60 0.55 0.59
Ts[ms] 60.0 58.8 48.7 58.0 32.0 50.0 24.6 47.5 26.0 51.5 23.8 50.8
C50[dB] 1.1 3.7 2.5 1.5 5.7 2.8 7.8 3.1 7.4 2.5 8.2 2.3
C80[dB] 4.4 7.4 6.1 6.8 9.9 7.0 11.7 7.6 11.3 6.9 12.1 6.8
O
m
n
i

s
o
u
r
c
e

1

EDT[s] 0.80 0.55 0.64 0.54 0.46 0.46 0.39 0.46 0.40 0.50 0.37 0.50
T30[s] 0.87 0.85 0.69 0.62 0.54 0.56 0.57 0.54 0.60 0.60 0.52 0.58
Ts[ms] 45.2 47.5 37.3 36.5 25.5 32.2 18.4 29.3 19.9 30.3 17.8 30.3
C50[dB] 3.5 5.2 4.5 5.9 7.1 6.7 9.1 6.9 8.7 6.3 9.6 6.3
C80[dB] 6.2 8.6 7.8 9.3 11.1 10.0 13.1 10.3 12.5 10.0 13.9 9.9
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 59.2 77.7 46.3 63.0 31.7 54.5 24.1 47.8 23.4 49.7 23.3 53.8
C50[dB] 1.2 -0.3 2.8 0.4 5.7 2.0 7.5 3.5 7.8 3.1 7.7 2.0
C80[dB] 4.4 5.1 6.4 5.1 9.6 7.0 11.5 7.7 11.5 7.1 11.6 6.0
2

s
p
e
a
k
e
r
s

Ts[ms] 57.3 69.0 46.3 53.8 31.9 50.0 25.0 43.3 24.7 42.5 24.4 46.5
C50[dB] 1.6 0.7 3.0 2.1 5.6 3.1 7.3 3.9 7.4 4.3 7.3 3.1
C80[dB] 4.7 6.4 6.5 6.4 9.7 7.6 11.3 8.6 11.1 8.1 11.5 7.4
4

s
p
e
a
k
e
r
s


156

5.4.8 Room 8 data

F[Hz] 125 250 500 1000 2000 4000
EDT[s] 0.65 0.68 0.56 0.59 0.43 0.50 0.38 0.52 0.45 0.50 0.41 0.49
T30[s] 0.80 0.79 0.66 0.66 0.58 0.54 0.55 0.53 0.56 0.58 0.54 0.54
Ts[ms] 41.5 55.5 35.2 47.2 26.7 38.8 22.3 38.0 28.1 39.0 26.2 35.8
C50[dB] 4.2 3.3 5.1 3.8 7.3 4.6 8.3 4.7 6.9 4.6 7.3 5.3
C80[dB] 7.3 6.9 8.6 8.0 11.2 9.7 12.4 9.1 10.8 9.2 11.6 9.5
O
m
n
i

s
o
u
r
c
e

1

EDT[s] 0.76 0.50 0.56 0.42 0.43 0.46 0.38 0.42 0.47 0.43 0.44 0.43
T30[s] 0.82 0.82 0.63 0.61 0.55 0.54 0.58 0.54 0.59 0.62 0.55 0.56
Ts[ms] 53.6 47.7 36.8 43.7 27.0 38.5 23.4 38.2 31.6 38.7 27.6 35.5
C50[dB] 1.7 4.7 4.6 5.7 6.9 5.4 8.3 5.2 5.8 5.4 6.8 6.0
C80[dB] 5.2 9.5 8.5 9.3 11.4 9.6 12.7 9.9 10.1 10.1 11.2 10.2
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 43.4 71.2 36.5 59.3 24.9 46.8 17.7 42.2 20.6 40.7 21.3 41.2
C50[dB] 3.7 1.2 4.7 1.0 7.4 3.7 9.4 4.3 7.7 4.4 8.0 4.3
C80[dB] 7.0 4.8 8.4 6.1 11.6 7.9 13.7 9.7 12.1 9.0 12.3 8.7
2

s
p
e
a
k
e
r
s

Ts[ms] 48.8 56.0 42.1 43.3 28.2 33.0 22.6 30.2 26.1 30.0 25.8 31.3
C50[dB] 2.9 3.7 3.7 4.4 6.7 6.0 8.4 6.5 6.8 6.0 7.0 5.9
C80[dB] 6.4 7.2 7.7 8.8 10.9 9.6 13.0 10.9 11.3 10.0 11.6 9.7
4

s
p
e
a
k
e
r
s


157

5.4.9 Room 9 data

F[Hz] 125 250 500 1000 2000 4000
EDT[s] 1.07 1.11 0.84 0.88 0.48 0.61 0.51 0.57 0.67 0.63 0.59 0.57
T30[s] 1.10 1.10 0.87 0.93 0.86 0.64 0.76 0.81 1.11 1.06 0.96 1.05
Ts[ms] 68.5 77.7 53.0 55.2 25.1 39.0 26.1 36.8 40.0 38.2 35.7 33.8
C50[dB] 0.8 0.5 2.2 2.9 7.4 5.0 7.1 5.7 4.4 5.6 5.2 6.3
C80[dB] 3.5 3.3 5.3 5.4 11.2 8.0 10.6 8.6 7.6 8.4 8.6 9.1
O
m
n
i

s
o
u
r
c
e

1

EDT[s] 1.07 1.01 0.82 0.82 0.44 0.64 0.52 0.56 0.65 0.60 0.57 0.55
T30[s] 1.11 1.08 0.88 0.95 0.56 0.66 0.87 0.84 1.01 1.07 0.87 1.06
Ts[ms] 70.4 86.0 53.9 62.3 24.0 42.7 29.4 40.3 39.3 41.7 34.6 37.3
C50[dB] 0.6 -0.8 2.1 1.0 7.6 4.3 6.6 4.9 4.5 4.9 5.4 5.9
C80[dB] 3.3 1.8 5.2 4.7 11.6 7.1 9.8 8.2 7.8 8.4 8.7 9.0
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 72.0 80.3 53.6 66.5 26.2 43.5 28.8 49.0 35.6 51.0 35.4 58.0
C50[dB] 0.3 0.2 2.2 1.3 7.3 4.3 6.7 3.0 5.4 3.7 5.4 2.9
C80[dB] 3.1 3.7 5.1 4.4 10.8 7.8 9.8 6.9 8.1 6.4 8.1 5.3
2

s
p
e
a
k
e
r
s

Ts[ms] 74.2 93.0 56.2 69.2 28.0 47.2 30.3 48.8 38.7 54.7 39.0 64.7
C50[dB] 0.1 -2.4 1.8 0.7 6.9 3.8 6.4 3.3 4.8 3.0 4.7 2.0
C80[dB] 2.9 1.9 4.9 4.3 10.3 7.8 9.5 7.1 7.7 6.0 7.6 4.7
4

s
p
e
a
k
e
r
s



158

5.4.10 Room 10 data

F[Hz] 125 250 500 1000 2000 4000
EDT[s] 0.52 0.43 0.52 0.57 0.53 0.55 0.45 0.45 0.36 0.37 0.26 0.31
T30[s] 0.67 0.62 0.79 0.64 0.77 0.54 0.66 0.53 0.66 0.54 0.54 0.50
Ts[ms] 31.8 37.5 30.7 34.3 29.5 32.0 25.6 27.3 22.4 22.8 17.6 18.0
C50[dB] 6.2 7.4 6.5 5.8 6.6 6.2 7.7 7.3 8.7 8.8 11.1 10.5
C80[dB] 9.4 11.1 9.5 9.8 9.5 9.5 10.6 10.5 12.3 12.1 14.8 14.5
O
m
n
i

s
o
u
r
c
e

1

EDT[s] 0.58 0.56 0.59 0.51 0.58 0.54 0.54 0.49 0.50 0.46 0.45 0.44
T30[s] 0.74 0.60 1.29 0.60 0.93 0.54 0.86 0.52 0.92 0.61 0.70 0.67
Ts[ms] 34.9 51.0 37.8 43.0 35.9 37.7 32.1 36.3 30.1 35.2 25.8 31.8
C50[dB] 4.8 4.0 4.7 5.0 5.1 4.5 5.5 5.0 6.2 5.4 7.2 6.1
C80[dB] 8.7 8.2 8.7 9.3 8.9 9.5 9.5 9.7 10.5 10.4 12.1 11.5
O
m
n
i

s
o
u
r
c
e

2


F[Hz] 125 250 500 1000 2000 4000
Ts[ms] 34.9 45.7 34.4 35.8 29.7 32.5 28.0 28.3 23.9 22.5 18.0 18.2
C50[dB] 5.1 6.8 5.2 5.9 6.2 5.9 6.4 7.0 7.7 8.4 10.0 10.3
C80[dB] 8.7 10.0 8.7 8.8 9.7 9.5 10.2 10.4 11.3 12.2 13.5 14.5
2

s
p
e
a
k
e
r
s

Ts[ms] 36.4 51.8 40.0 46.5 35.2 37.7 30.2 31.0 26.9 31.5 22.6 31.8
C50[dB] 4.3 4.9 4.0 3.9 4.8 4.8 5.6 6.3 6.5 5.9 7.8 6.3
C80[dB] 8.4 8.6 8.0 7.2 8.7 9.3 9.7 10.8 10.6 10.8 12.1 10.7
4

s
p
e
a
k
e
r
s

STI 0.75 0.77 Rating: Good Excellent


159

Detailed prediction data can be found in Appendix 6.2.

5.5 Discussion
In the following sections different simulation aspects are discussed in view of the
numerical output for the ten test rooms. The use of EDT in the calibration process is
further assessed in terms of the resulting procedures capacity, while a model design and
preparation guideline is defined. Simple steps to enhance the efficiency of the overall
prediction process are also considered.

5.5.1 Prediction results
The numerical output of the simulations, as presented in section 5.4, showed a high
degree of accuracy for the majority of experimental conditions over the terms considered,
i.e. EDT, T
30
, T
s
, C
50
, C
80
and STI. An error analysis in comparison to measured values
revealed differences normally in the range of a few ms for EDT and T
30
. A maximum
error of up to 26% for EDT and T
30
was observed after excluding odd values, while value
differences typically below 10% were found.

With overall results deemed satisfactory, the partly problematic T
30
output for omni
source 2 in Room 10 was attributed to the lack of a dedicated seating model for the room.
The source in this case was positioned in-between seating rows; consequently, the
geometry simplification for the audience area did not accurately simulate source emission
into the large volume of the room. While a more complex geometry would result in a
significantly increased computation time and having by definition the capacity to
invalidate the prediction, the effect could potentially be avoided by placing the source
further away from overly simplified sections. The particular source should certainly not
be used for calibration; however it is worth noting that similar configurations did not
appear problematic within smaller rooms in the study.

In terms of clarity (C ratios), good agreement was found over all configurations for the
majority of rooms, with differences being less than 3dB for both C indices. A partial
160

exception was Room 7 where differences up to 6dB were found for omni source 1 and
SS2
loudspeakers
at higher frequencies. The prediction in this case resulted in higher clarity
values, however, potentially due to the influence of the BGNL on room acoustics
measurements reducing confidence on the results at these particular data points. The
effect did not significantly stand out in terms of complementing parameters, except centre
time, with values nonetheless within 10ms in most cases.

Differences in terms of STI were at most 0.04 in one room for one source configuration,
and normally below the JND or zero for the remaining source configurations in all rooms.
As such, the simulations appeared exceedingly efficient in the context of intelligibility
prediction (STI and in part C
50
and T
s
) where any errors of complementing parameters,
though limited, were in effect not reflected on STI in particular.

Overall, the prediction sessions achieved a high degree of accuracy based on the T
30
(or
EDT) calibration as performed for one source condition, for both the original and
alternative configurations (i.e. omni source 2 and two sound system configurations)
without alterations or further tuning post calibration.

5.5.2 Simulation calibration using reference EDT
Section 5.3.2.3 has discussed the prospective use of additional parameters i.e. other than
T
30
in the calibration process, so as to achieve a higher level of prediction accuracy. It
was established that while a more elaborate procedure could potentially enhance the
accuracy level for a simple model, it is not essential for a model build with a finer detail
resolution. Chapter 4, however, has discussed the increased robustness of EDT over T
30

at marginal conditions i.e. low S/N during room acoustics measurements. In the context
of post evaluation, considering that the accuracy of a simulation relies on accurate room
acoustics measurements, it is evident that using EDT values in the calibration procedure
will enhance confidence in a prediction when the consistency of the measurement output
is questionable.

161

Reference EDT values were used for a number of randomly selected simulations in
section 5.4 (see Rooms 4, 5 and 10) to establish the potential effect on simulation
efficiency and accuracy of results.

Assuming that the model geometry is consistent, the predicted relation between T
30
and
EDT is going to be somewhat fixed. Consequently, although EDT is not as wide
descriptor of room acoustics as T
30
its use in the calibration process instead of T
30
was
not expected to result in significant accuracy differences in the prediction process. This
assumption was supported by the prediction outcomes, see sections 5.4 and 5.5.1. Given a
fixed relation of EDT with T
30
based on the geometry influence, an advantage in using
EDT is also highlighted, having a closer monitoring of the latter while considering its
influence on the calculation of STI in particular.

During the calibration process, the prediction outcome is continuously compared to
measured values to determine the simulation precision at a given stage. Disagreement of
results will lead to alteration of the acoustic characteristics of particular surfaces so as to
improve the prediction quality. Thus an attempt to control the early part of sound decay
in this context would be more efficient. The calibration process for the test rooms using
reference EDTs suggested that the alternative process facilitates a simpler identification
of the surfaces and acoustic characteristics that need alteration/tuning, particularly when
EDT is short. As a result the calibration procedure is expedited.

5.5.3 Model design and preparation
Methodologies for the design and preparation of computer models are often open for
interpretation by the user; as a result, prediction accuracy can suffer due to invalid model
input in terms of e.g. the defined geometry or sound source characteristics. Sections 5.2
and 5.3 addressed different aspects of the process to facilitate a better defined approach
for the prediction of acoustic conditions in rooms. Accordingly, a synopsis of the
outcomes was compiled, effectively formulating a modelling guideline, see section
5.5.3.2.

162

5.5.3.1 Detail resolution requirement
The level of detail incorporated in a model is directly related to the frequency range
considered. However, small entities that individually can be considered as acoustically
invisible e.g. desks and chairs, can influence the resulting sound field when positioned in
close approximation to each other e.g. stacked or at contiguous arrangements; this is not
directly accounted for in a simulation though the combined influence of such smaller
surfaces can be incorporated by approximation, being dependant on the surface size, via
scattering effects. The simulation in this case is not entirely realistic nonetheless the
particular approach appears advantageous for small (or flat), as opposed to large rooms,
given the short propagation path, see section 5.3.2. Enhanced detail resolution is
accordingly preferred in particular cases, see section 5.5.3.2.

5.5.3.2 Room acoustics modelling guideline
The following process chain (steps 1-7) summarizes the recommended approach for
model design and preparation in the context of the present study:

1. Room appraisal to determine significance of architectural details in designing a
model. A detail resolution of ~0.5m should be used as a starting point
[92]
,
however see 2.
2. Finer detail resolution should be included for smaller objects that are stacked or
positioned in close approximation e.g. chairs and desks, in line with section
5.3.2.4 for an efficient representation of different entities (for small or flat rooms).
3. Generic definition of scattering coefficients, see section 5.2.1.3.
4. Generic definition of absorption coefficients, see section 5.2.1.3.
5. Source characterization in terms of near field directivity and frequency response
(small rooms), see section 5.2.1.2.
6. Validation/calibration procedure for one source configuration using reference
EDT (recommended) or T
30
, see section 5.2.2.
7. Prediction of acoustic conditions for any source-receiver configuration, also see
table 7.1 for related uncertainty factors.

163

5.5.4 Comments on modelling sessions
A significant drawback in the process can be the lack of prediction repeatability. For
instance, two differentiating sets of data from the same unaltered model can be
misleading when investigated in detail, particularly when the effect is not detected at an
early stage. For more complex models, prediction times can be in the range of hours and
thus a significant error can be introduced simply by relying on a single prediction using
the final model version.

A pilot approach has established a close association between the number of materials
used to represent room surfaces to prediction repeatability issues, while prediction output
suggested that an increased number of materials adds to the uncertainty. Problematic
models appear to become more tolerant and error free in this respect when using up to a
limited number of different materials. Ultimately, model design should account for
differentiated surface materials only when a simplification in this respect removes
essential model detail, see section 5.3.2 and section 5.5.3. An increased number of rays
traced will allow for enhanced consistency in the prediction results when necessary,
while however increasing the computation time in such cases.

Considering the time needed for model calibration, it was established that using as a
starting point surface acoustic characteristics as defined in rooms that are similar to a
given case study can lead to reasonably accurate prediction results. Given differences
between rooms some tuning is normally necessary, however the time needed to complete
the process is somewhat reduced.

5.6 Conclusions
This chapter has considered different aspects of computer simulation efficiency; the
outcomes were utilized in computer simulations for the primary test rooms in the study.

The definition of suitable scattering and absorption coefficient data was highlighted as a
critical element in reducing simulation uncertainties, thus enhancing confidence in the
164

result. Different approaches have been found in the literature however their applicability
is not universal and thus questionable in many cases. At the prediction input, scattering
functions can be initially generically defined. The estimation of absorption coefficient
data can be accomplished using model calibration which appears as most suitable for
predictions that are used in post evaluation applications. Source directivity (and
consequently aiming) is an additional parameter central in the process, requiring a near
field response for simulating acoustic conditions in small rooms.

Experimental results revealed that a more detailed room geometry is preferred for
modelling smaller rooms, as opposed to large spaces. In this sense, CAD platforms can
greatly assist given the specialist nature of the software for this purpose by simplifying i.e.
speeding up model generation. This has a direct effect on the resulting model, particularly
in terms of the processing time required for a prediction. The latter is normally longer for
more complex models but could be optimized to approximate simple model run times.

A balance between overly simplified rooms and complex inefficient geometries is
necessary i.e. only essential room characteristics should be included in the design.
Generally, simple steps can be taken to improve the simulation efficiency, such as
reducing the number of surfaces in the model while retaining the required detail
resolution.

The calibration of predictions was done in terms of a single parameter, typically T
30
. The
use of additional reference parameters appeared as unnecessary for models with more
detailed geometry representations. The use of EDT was also examined in this context and
found as a viable approach, resulting in similar accuracy levels. Considering the
conclusions from Chapter 4 relating to the increased robustness of EDT (when compared
to T
30
) an EDT based validation/calibration appeared to be more efficient in reducing the
related uncertainty factor in the prediction.

Pilot approaches have established prediction results marginally over the JND of STI for
simple models. The same parameter was within the JND for models with enhanced
165

166
detail resolution. Results in terms of C
50
, T
30
and EDT could be described as acceptable
for both approaches, however with noticeably enhanced accuracy for the detailed models.
Predictions for alternative experimental conditions resulted in similar accuracy level for
the two modelling approaches, without the need for any alterations (or further tuning) of
the simulation post calibration, i.e. based on the original T
30
(or EDT) calibration for one
source configuration.

A design and preparation guideline was defined for simulating university classroom type
rooms and alike spaces.

Simulations for ten test rooms were accurate in most cases in terms of the parameters
considered i.e. EDT, T
30
, T
s
, C
50
, C
80
and STI. The models emerged as particularly
efficient in terms of a speech intelligibility assessment.
Chapter 6 Auralization

CHAPTER 6
Auralization

6.1 Introduction
Auralization is typically the final stage in a predictive assessment. The end product
enables a number of different approaches, such as the use for training or educational
purposes
[112]
, most commonly to demonstrate of a spaces prospective performance to
both experts and non experts in acoustics. Auralizations are considered a highly efficient
methodology for this purpose as they comprise a direct link to prospective acoustical
conditions, thus no specialized knowledge is required in assessing a given case.

The quality of auralized material, in this study based on hybrid algorithms, is dependant
on the computer simulation being able to accurately represent the acoustic conditions in a
room. Thus an auralization verification issue arises as some means of confirming
consistency with the predictions numerical output and consequently with the measured
acoustic parameters is necessary. It is reasonable to assume that a simulation is consistent
with acoustic measurements if a validation/calibration process has previously been used.
Based on this assumption an auralization is required to be verified against the prediction
output only, while however accounting for related error margins at the prediction stage.

Verification is typically based on a subjective assessment where recorded room responses
are compared to auralized material via listening tests to assess realism. A suitable HRTF
and a frequency response filter to compensate for the source characteristics are normally
needed for consistency, given a typical binaural audio presentation approach through
headphones. This process provides a direct way for the assessment of audio realism but
no absolute information of the acoustic characteristics of a space can be extracted. For
167

speech intelligibility in particular, a plain estimation is unfeasible unless a listening test
involving human subjects (see section 2.4.1) is undertaken. This process, though viable
[ 139 ]
, is very time consuming and not cost effective. The prospect of an objective
validation appears thus more suitable for an efficient verification process. In this context,
Christensen
[ 140 ]
has presented a method based on post processing analysis of the
predicted impulse response. The extent to which an actual auralization can reproduce the
room acoustics characteristics as defined by the predicted impulse response filter is
however not clear.

Taking a different approach, a simulation can result in a reasonably accurate estimation
of acoustic parameters, even without producing an analogous result in terms of audio
realism. In the context of this study, the detail resolution of a model could be an
important factor not only in obtaining an objectively accurate auralization but a realistic
sounding auralization as well. Previous research using models that incorporated basic
architectural detail, demonstrated that numerical output and realism in the auralization
part are not necessarily related
[114]
. In another publication by the author the advantage of
using a more detailed model for small enclosed spaces was shown thus, suggesting that
realism of the auralization is similarly influenced
[121]
.

This chapter discusses primarily the objective validation of auralized material using an
improved hybrid approach for a monaural setup. The relation of auralization quality, i.e.
realism, to the model detail resolution is examined using subjective listening tests in
comparison to binaural recordings for different cases. In the following sections, the
methodologies used and results obtained are presented.

6.2 Objective validation of auralized responses
The product of an auralization process is based on the predicted (or otherwise obtained)
impulse response of a system. The validation methodology as used by Christensen
[140]

was based on post processing of the actual impulse response, i.e. a direct deconvolution
of an impulse response with a Dirac signal. The process thus enabled the derivation of
168

roduct.
acoustic parameters via an open loop measurement system (Dirac v3.0
[ 141 ]
) to
objectively quantify the quality of the auralization filter. This assumed that the same level
of accuracy would be conveyed in the final auralization. On this basis the impulse
response filter is nonetheless an intermediate p

An altered method is suggested as an improvement to enable the assessment of the end
product that is the auralization. By examining the auralization process at a final stage,
the overall accuracy level of a simulation system can be objectively quantified /
confirmed. At the same time it is clear that, depending on the deconvolution software
used, it might be more practical to use one method over the other, since both methods
dictate the use of explicitly either a reference Dirac pulse or sine sweep at the
deconvolution stage. However, alternative signals could potentially be used by the
proposed method. In the following section, the altered methodology is presented.

6.2.1 Objective validation of auralized responses using a swept sine
The new hybrid method utilizes a swept sine test signal within the deconvolution process
to replicate an assessment of the end product of an auralization.

An anechoic sample of the swept sine signal was used as the audio material, convolved
with the impulse response filter that was predicted by the preceding computer simulation
for the different source-receiver configurations in the test rooms. The duration of the
signal was dependant on the RT, while the source characteristics and the type of receiver
within the modeling software were matched to the prediction settings so as to enable a
consistent comparison of auralization to the numerical output. The characteristics of the
impulse response filter could thus be expressed as an end product i.e. an auralization.

Results, being in effect raw room responses as would be measured by typical acoustical
measurement practices, were assigned at the input of an open loop measurement system
(see also section 3.5.1 for open loop procedure details) and post processed to derive a set
of room acoustics parameters. Figure 6.1 shows a schematic for the process.

169

Open loop processing Auralization Computer prediction

Figure 6. 1. Objective validation of auralization schematic

6.2.2 Evaluation of results
The accuracy of the predicted auralizations for ten test rooms was assessed using the
current and proposed method of objective validation. Ultimately, given that the datasets
match the predicted and/or actual data an auralization could be described as accurate in
the terms used for the assessment e.g. STI, T
30
, EDT etc. A monaural filter was used in
the experimentation as an indication of the auralization process efficiency.

The scope of the preliminary assessment included a comparison of the two validation
methods to determine the level of accuracy that is conveyed from the auralization filter
(assessed by the current method) to the end product i.e. the actual auralization (assessed
by the proposed method). Primarily, the proposed method was used for the general
verification of auralization accuracy over all of the different source/receiver
configurations in the ten test rooms. Data for the assessment objectives are shown in
section 6.4.2.

Given that data for both methods are derived using an open loop measurement system, it
is worth noting that the validity of the particular measurement method has been assessed
in earlier work by the author
[116, 117]
, see also section 3.5.3. It was found that an open loop
methodology is reasonably consistent when compared to a closed loop system, while the
accuracy of the method is further improved given the controlled conditions in the current
experiment (processing is performed internally i.e. physical computer I/O is not used).
170

8

6.3 Subjective validation of auralized responses
The subjective validation of auralizations involved the recording of audio samples in
actual conditions, see figures 6.2-6.3. Using speech and music stimuli, room responses
were obtained within the ten test rooms to enable a direct comparison to the predicted
auralization via listening tests. The recording procedure is described in the following
sections.
Figure 6. 3. Head and torso simulator in measurement
position
Figure 6. 2. Binaural recording setup

6.3.1 Recording room responses
For the purposes of the study, a binaural feature was necessary to account for the binaural
character of human hearing, thus enabling a more efficient comparison to subjective
impression. BS ISO 3382: 2000
[11 ]
prescribes the use of a head and torso simulator
conforming to ITU recommendation P.58
[ 142 ]
for the measurement of binaural
parameters e.g. IACC. Accordingly, binaural recordings were made using the procedure
described in the particular BS document. The session specific characteristics included the
use of a portable PC as a means to play back and record audio. A multi track recorder
with simultaneous I/O was used as the software platform. Four source configurations, as
described in Chapter 3 (2x omni directional loudspeaker, 2x sound system), were used in
turn in the ten test rooms and binaural recordings were obtained at each receiver position.

6.3.2 Equipment list
- Norsonic 140 sound level meter
171

- B&K Calibrator Type 4230
- B&K Head & Torso simulator Type 4100, with NEXUS pre-amp
- Dell Latitude PC D610
- Digigram VXpocket v2 sound card
- Dodecahedron omni directional loudspeaker
- Yamaha HS50M studio monitor loudspeaker on tripod stand (x4)
- Audio SR707 power amplifier
- Sony Vegas Pro 8, professional multi-track audio editing software

6.3.3 Comparison of recordings to predicted auralization
A listening test with a group of 10 listener subjects (6 Acoustics professionals, 4
untrained listeners) was undertaken to assess the quality of the predicted auralizations.
The latter were presented via headphones in negligible BGNL conditions, followed by
the equivalent binaural recording. For consistency, samples of the existing background
noise at the time of recording were extracted from silent sections, i.e. no signal playing,
of the binaural recordings within the test rooms and mixed with the associated
auralization prior to presentation. All audio samples were level calibrated using a pink
noise test signal preceding the test stimulants (speech, music). The test comprised
primarily a basic realism assessment requiring the subjects input as to the level of
similarity between the presented auralization / recording pairs; source localization was
also assessed. Ultimately, the auralizations were categorized according to the realism
incorporated, as perceived by the group of listeners.

The assessment outcomes are presented in section 6.4.3.

6.4 Auralization study in ten test rooms
In the following sections, the current and proposed auralization validation methods are
compared to establish the advantage of using one method over the other. Auralizations
from ten rest rooms are further assessed in both objective and subjective terms to
determine primarily the level of consistency to the related predictions in terms of
172

numerical output. The auralization quality is subjectively assessed considering the
resultant realism as compared to actual room recordings.

6.4.1 Comparison of objective validation methods
The current and proposed methods for an objective validation of auralization were
simultaneously used in the ten test rooms to derive a set of acoustic parameters i.e. EDT,
T
30
, T
s
, C
80
and STI, for the same conditions in the rooms. This enabled a comparison of
results so as to determine the consistency of the impulse response filter with the
auralization, based on the value differences observed.

The examination of room data for two source configurations (see Appendix 7) showed
overall fair agreement between the values derived from the two methods. Tables 6.1 and
6.2 show an example of the output variation for data averaged over all rooms. Output
consistency in this case suggested that for the majority of conditions the auralization filter
is capable of conveying its characteristics to the final auralization. Accordingly, if
individual data points are not the main focus, uncertainties in terms of the auralization
accuracy as related to the impulse response filter would not require additional
consideration.

Variation of results between two validation methods, S1
F[Hz] 125 250 500 1000 2000 4000
EDT (%) -0.9 1.4 0.9 0.8 0.2 0.2
T30 (%) -0.1 -1.6 -1.0 -0.8 -0.4 -0.3
Ts (ms) 0.1 0.0 -0.2 -0.2 -0.1 -0.2
C80 (dB) 0.0 0.0 0.1 0.0 0.0 0.0
STI 0.01

Table 6. 1. Result variation example between the two auralization validation
methods (averaged over ten test rooms for omni source 1)

Variation of results between two validation methods, S2
F[Hz] 125 250 500 1000 2000 4000
EDT (%) 5.2 3.6 2.9 2.1 0.2 -1.9
T30 (%) -0.3 0.9 -0.9 -0.9 -0.6 -0.5
Ts (ms) 0.1 0.0 -0.3 -0.2 -0.2 -0.1
C80 (dB) 0.0 0.0 0.1 0.1 0.0 0.0
STI 0.00

Table 6. 2. Result variation example between the two auralization validation
methods (averaged over ten test rooms for omni source 2)
173

Taking a more detailed approach, a number of discrepancies were observed in the
comparison. For the EDT measure in particular, value variations relating mainly to the
125Hz and 250Hz octave bands up to 24% and 7% respectively were found; similarly,
C
80
variations reached 7dB for lower frequency bands, as deduced from data averaged
over all the receiver positions, see Appendix 7. In these atypical conditions, the existing
validation method cannot account for the accuracy level that is lost in the process of
applying the auralization filter to the anechoic material. Considering the possibility of a
noticeable error margin at lower frequencies, as shown in the current results, the
proposed methodology is advantageous as it is able to detect any discrepancies in this
respect. The new method thus can improve on the assessment uncertainty relating to the
discrepancies at individual data points.

Considering the similarity of the rooms examined in this study, a clear understanding of
the reasons underlying the observed discrepancies could not be fully determined. The
outcomes of the proposed method nevertheless suggest an improvement on the
uncertainty factor.

6.4.2 Objective assessment of auralizations
The assessment in the ten test rooms for four source configurations was performed using
the proposed validation method to determine the consistency of auralization with the
prediction. Results from this comparison gave an indication of the level of accuracy that
could be expected from the auralization of a space, given the preceding simulation as
interpreted in terms of its numerical output.

The previous section has demonstrated differences between two validation methods for
two source configurations. The observed discrepancies were primarily for the output from
the existing validation method using post processing of the impulse response being
misleadingly comparable to the prediction. The effect was mainly reflected in limited
EDT errors at lower frequency octave bands, mainly 125Hz, suggesting in the context of
174

the assessment that low frequency characteristics are not always consistently conveyed in
an auralization using one source. In a comparison between auralizations and prediction
output, absolute differences up to 70% EDT and 9dB Clarity were found for the 125Hz
octave band in some instances, see Appendix 8. However, for remaining octave bands
and additional parameters no significant discrepancies were observed overall, with results
closely approximating the prediction. For clarity and STI in particular the absolute
differences when compared to the prediction output were typically below 2dB for clarity
and below 0.03 (=JND) for STI, see table 6.3. Thus, auralizations simulating one source
largely incorporated the room acoustics characteristics as suggested by the predictions
numerical output, with the partial exception of the 125Hz octave band.

Prediction and Auralization data comparison for Omni source 1
F[Hz] 125 250 500 1000 2000 4000
Prediction AuralizationPred. Aural. Pred. Aural. Pred. Aural. Pred. Aural. Pred. Aural.
EDT[s] 0.64 0.4 0.53 0.55 0.39 0.44 0.37 0.42 0.4 0.42 0.39 0.48
T30[s] 0.76 0.6 0.64 0.65 0.46 0.52 0.46 0.48 0.52 0.51 0.53 0.5
Ts (ms) 42.2 30.3 34.8 40.5 21.4 31.5 18.9 27.3 20.1 27 19.8 30
C50[dB] 3.7 8.1 5.1 5 8.6 6.7 9.2 7.5 8.7 7.9 8.8 6.8
C80[dB] 7.3 12 9.1 9 13.1 10.7 14 12.5 13.1 11.6 13.1 11.7
Table 6. 3. Prediction and auralization data comparison example (Room 2, S1)

For multi source conditions the analyzed auralizations showed similar levels of accuracy
as discrepancies at the 125Hz octave band for the clarity values were similarly repeated,
see table 6.4. A somewhat altered behavior was however introduced with a number of
clearly invalid clarity values at, arguably, random octave bands, see Appendix 8. The
altered behavior was further reflected in the STI where discrepancies up to 0.09 STI were
found. In such cases, the subjective difference of speech intelligibility in the auralization
would be clearly audible. However, with an overall STI divergence typically below the
JND a screening of the source data as related to individual receiver positions could enable
the exclusion of flawed data from the analysis. Consequently, auralizations that might not
accurately represent a particular position can be excluded and the overall accuracy of an
assessment enhanced.

175

Prediction and Auralization data comparison for SS4
F[Hz] 125 250 500 1000 2000 4000
Prediction Auralization Pred. Aural. Pred. Aural. Pred. Aural. Pred. Aural. Pred. Aural.
Ts (ms) 39.8 28.5 35.2 37.8 23.1 30.0 19.6 25.0 23.1 25.5 23.9 28.0
C50[dB] 4.1 9.8 4.9 6.3 8 5.3 8.9 6.6 7.8 7.7 7.7 7.0
C80[dB] 7.5 13.5 8.6 8.8 12.1 10.0 12.9 12.2 12.0 11.4 11.4 11.0
Table 6. 4. Prediction and auralization data comparison example (Room 2, SS4)

The comparison of the prediction output to data derived from the auralizations using the
proposed validation method showed that the majority of auralizations closely
approximated the measured conditions in terms of the acoustic parameters considered.
For speech intelligibility in particular, the limited number of discrepancies observed did
not have a significant impact in the auralizations simulating single source conditions. A
somewhat more critical effect was found for multi source auralizations. The possibility of
a misleading assessment due to individual receiver position data and their subsequent
effect on auralization could be easily detected if the latter are objectively validated. The
limited auralizations that do not correspond to the predictions numerical output become
easy to identify, thus enabling exclusion to maintain consistency in the assessment.
Nonetheless, given that the number of significant discrepancies was limited, an
assessment based on presenting a group of auralizations i.e. not a single one, would
potentially give a fair indication of the potential acoustic performance of a space.

A subjective perception assessment of the auralization accuracy in comparison to room
binaural recordings is shown in the next section for the different source conditions.

6.4.3 Subjective assessment of auralizations
In a pilot listening test concerning the auralizations for ten test rooms, the audio examples
were scored by listeners (0-100%, equivalent to not accurate-accurate) at 76% and 75%
accuracy on average (STD=1.6 and 5%), for single source and multi source conditions
respectively (average STD of different listener results among test rooms was 11.9% and
15.3%). The outcomes thus indicate that a satisfactorily realistic result was produced
overall, with no significant differentiation between the source configurations.
176

Considering the additional element of source localization, the average score was 77% for
both source conditions (STD=6.9% and 4.7%), suggesting that a reasonably accurate
positioning of the sound source was incorporated in all auralizations (average STD of
different listener results among test rooms was 15.6% and 16.8%).

6.5 Auralization accuracy and relation to model detail
The accuracy of the auralizations predicted from simple and CAD models was
assessed to establish the efficiency of the auralization process given a somewhat different
generation approach. Considering a more accurate prediction outcome in terms of
numerical output for the CAD model, as shown in section 5.3.2, an objective validation
was initially carried out to establish potential differences in the quality of convolution of
the impulse response filter characteristics with the anechoic material. The level of
accuracy conveyed at the end product was thus again the reference for performance. The
second part of the assessment involved a listening test to determine the level of
naturalness in the predicted auralizations in relation to the detail level involved in the
model construction.

6.5.1 Objective assessment of convolution quality from simple and CAD models
A close examination by objective means demonstrated that to a large extend the simple
and CAD models conveyed the predicted parameters to the auralization in a similar
manner. For a single source condition (S1), minor discrepancies were observed for the
125Hz octave having however an insignificant effect in terms of speech intelligibility
parameters, see tables 6.5 and 6.6. C
50
values resulted in an error margin close to the JND,
while STI also produced discrepancies below the JND with good agreement for both
models (tables 6.5 and 6.6). T
s
discrepancies were typically below 5ms. An example
comparison of the T
30
and EDT parameters in Room 8 is shown in figures 6.4 - 6.5 and
6.6 - 6.7 for the simple and CAD models respectively, validating the initial
assumption of a consistent approach for either detail level in the model construction.

177

Figure 6. 4. Comparison of T
30
from prediction output
and auralization validation (simple) in Room 8
Figure 6. 5. Comparison of EDT

and auralization validation (simple) in Room 8
Figure 6. 7. Comparison of EDT

and auralization validation (CAD) in Room 8
Figure 6. 6. Comparison of T
30
and auralization validation (CAD) in Room 8

Considering a multi source configuration (SS4), the comparison outcome approximated
single source conditions, see tables 6.7 and 6.8. Again with minor exceptions in the
125Hz octave band, clarity values produced differences that were typically limited in the
JND range, while STI variation was also within the JND. T
s
produced differences within
5ms in most cases.

Prediction-Auralization difference (Simple, S1)
F[Hz] 125 250 500 1000 2000 4000
Ts(ms) -0.2 -3.3 -7.8 -4.6 2.6 3.0
C50(dB) -0.7 -0.3 1.0 0.8 -1.9 -1.9
C80(dB) -1.8 -0.5 1.0 1.2 -1.7 -1.7
STI -0.01

Table 6. 5. Example of prediction and auralization based acoustic parameter
differences for Simple model, Single source (S1) in Room 8

Prediction-Auralization difference (CAD, S1)
F[Hz] 125 250 500 1000 2000 4000
Ts(ms) 5.2 -1.3 -6.8 -3.0 4.3 4.4
C50(dB) -2.3 -1.6 0.3 0.9 -1.6 -1.7
C80(dB) -2.8 -0.6 0.3 0.8 -1.6 -1.9
STI -0.02

Table 6. 6. Example of prediction and auralization based acoustic parameter differences
for CAD model, Single source (S1) in Room 8

178

Prediction-Auralization difference (Simple, SS4)
F[Hz] 125 250 500 1000 2000 4000
Ts(ms) -0.2 -5.6 -4.6 0.4 -0.2 -3.4
C50(dB) -2.1 0.5 0.8 0.6 0.2 0.9
C80(dB) -3.0 -0.5 0.4 0.5 -0.1 0.0
STI 0.01

for Simple model, Multi source (SS4) in Room 8

Prediction-Auralization difference (CAD, SS4)
F[Hz] 125 250 500 1000 2000 4000
Ts(ms) 13.0 4.0 -5.2 -1.1 2.1 -6.5
C50(dB) -3.9 -2.1 0.1 0.5 -0.7 1.0
C80(dB) -3.4 -1.5 0.8 1.0 -0.1 0.9
STI -0.01

for CAD model, Multi source (SS4) in Room 8

Overall, the data comparison based on the convolution process for simple and CAD
models did not produce significant differences between the two modelling approaches.
The influence of a models detail resolution was minimal, thus not central in the context
of auralization validation process consistency. Consequently, the enhanced accuracy of
CAD models in terms of numerical output, as demonstrated in section 5.3.2, would
further suggest a better quality auralization.

6.5.2 Assessment of auralization realism by subjective means
The subjective assessment was based on the comparison of binaural recordings to the
predicted auralization for each case, while models were judged in terms of realism to
indicate differences in performance.

Results from the listening tests suggested that simple and CAD models both produced
a sufficiently realistic auralization. In the quality assessment scale used by the listeners
(0-100% range) simple based auralizations scored on average 71% for single source
conditions, compared to 75% for CAD based auralizations (STD=4.9% and 2.2%
respectively, average STD of different listener results among test rooms was 12.4% and
10.9% in the same order). The equivalent results for multi source conditions were 73%
179

and 78% for the two modelling approaches (STD=4.4% and 4.9% respectively, average
STD of different listener results among test rooms was 17.2% and 13.2%). The listening
tests therefore indicated that CAD auralizations were of marginally better quality than
simple i.e. more realistic for both single and multi source conditions in the rooms
considered. Considering the quality differences in terms of numerical output as described
in section 5.3.2, CAD auralizations support the advantage of CAD models for small
rooms.

The scores awarded for simple auralizations nonetheless support to an extent earlier
findings
[114]
suggesting that in terms of an auralization quality assessment, audio realism
and speech intelligibility performance can comprise two individual tasks, not necessarily
related with each other.

6.6 Conclusions
This chapter has presented a new methodology for an objective validation of auralization,
aimed as an improvement on the only existing method. The new method makes use of a
swept sine test signal as the anechoic material for the convolution process that produces
the auralization. Using a validated simulation and an open loop measurement system as a
basis, the method is able to objectively assess the end result of an auralization process i.e.
the auralized material. By comparison to the existing method, test simulations revealed
that in a limited number of cases the level of accuracy that is conveyed from the
simulation to the impulse response filter (measured by the existing method) can be
somewhat different from the qualities conveyed from the impulse response filter to the
auralization (measured by the proposed method). The observed variations related
primarily to the EDT and C
80
for low frequencies, while lesser differences were found for
the additional measures considered i.e. T
30
, T
s
and STI in all octave bands. Overall, the
new method allows for the latter discrepancies to be identified and taken into account.
For consistency purposes, related auralizations can thus be excluded prior to the
presentation of a given set.

180

At the same time it is clear that the proposed methodology increases the flexibility of an
objective auralization assessment by enabling the use of a swept sine at the deconvolution
stage; accordingly, a broader choice of software platforms for the open loop processing is
available.

Using simple and CAD models, the accuracy of the convolution process in obtaining
an auralization was examined in relation to the model detail. Trivial differences were
found between the two modelling approaches, thus suggesting that detail resolution is not
a critical factor in this respect. Consequently, considering that CAD models have an
advantage in terms of the consistency of numerical output with measurements, shown in
section 5.3.2, an analogous outcome can be expected in terms of auralization accuracy. A
comparative subjective assessment of auralizations derived from a combination of
simple and CAD models showed that both approaches produced satisfactory results,
however with CAD auralizations being of marginally better quality i.e. more realistic,
for both single and multi source conditions.

Auralizations for ten test rooms were further assessed in objective terms to determine the
accuracy of the auralizations for the primary test rooms. For single source conditions,
comparison to the predictions numerical output revealed occasional discrepancies for
low frequencies in terms of EDT and clarity. For multi source conditions, a similar level
of accuracy was found in terms of clarity with some additional discrepancies at random
octave bands, suggesting that screening of results is essential.

Given an uncertainty factor in the level of auralization accuracy for either source
configuration, an assessment should generally allow for an error margin equal to the JND
for STI and 2dB for clarity, compared to the predictions numerical output; if an
appropriate error margin is not considered, the resultant uncertainty could become
unacceptable for projects involving marginal intelligibility conditions.

A subjective assessment of auralization was undertaken for a broader view of the results.
The assessment considered realism and source localization, as compared to room
181

182
recordings, indicating that a satisfactorily realistic result was produced overall, with no
significant differentiation between the source configurations. The additional element of
source localization assessment suggested that a reasonably accurate positioning of the
sound source was incorporated in all auralizations.
Chapter 7 Summary and Conclusions

CHAPTER 7
Summary and Conclusions

7.1 Overview
The aim of this PhD study was to identify an efficient approach, from room acoustics
measurements to computer simulations, for the prediction and assessment of speech
intelligibility and related acoustic parameters in enclosed spaces.

The work undertaken consisted of four main parts:
- Defining a measurement methodology that enables a consistent measurement of
the acoustic environment in rooms under different conditions.
- Examining a series of low level measurement data to facilitate the acquisition of
usable data from measurements performed under marginal conditions.
- The development of suitable computer models incorporating an appropriate
validation/calibration procedure for an accurate prediction of room acoustics
parameters within lecture rooms under different conditions.
- The development of a new hybrid methodology for an objective validation of
auralization accuracy and subsequent auralization assessment in relation to the
detail resolution of the associated models.

The conclusions of the study are summarized in the following sections.

7.2 Room acoustics measurement methodology
Room acoustics measurements in ten test rooms have been analyzed for four different
source configurations (closed loop system) to determine the consistency of the output
183
under the different conditions. The interrelationships between room acoustic parameters
were addressed for a better understanding of the acoustic conditions in lecture rooms,
while open loop resultant data were evaluated to determine the efficiency of the
methodology as a reference for acoustic performance.

The examination of measurement data established that either of the four source
configurations could be used in conjunction with an exponential sine sweep measurement
methodology for a consistent room assessment in terms of T
30
(deviations 5%, 1 JND).
Given established measured differences among the source configurations for EDT, data
that is averaged over all the receiver positions are required for a reasonably consistent
assessment (deviations 10%, 2 JND). The smaller rooms in the study (rooms 1-8) gave
enhanced confidence in the consistency of measurements. Clarity variation appeared
dependant on the source-receiver arrangement, however differences did not exceed 2-3dB
in most cases. STI variation was typically within the JND range (0.02).

The C
50
and EDT values measured in the test rooms were found to be highly correlated in
small rooms. Good correlation was also established between Clarity and STI (0.91 and
0.96 for C
50
and C
80
respectively) for noiseless or under adequate S/N measurement
conditions. STI can thus be predicted using clarity values with a high precision. For high
S/N in an actual case i.e. low BGNL, clarity can be used as a direct descriptor of
intelligibility.

The feasibility of using data resultant from an open loop measurement methodology as
reference values was examined by a series of open loop measurements in ten test rooms.
It was found that open loop methodologies could potentially be used as an alternative to
closed loop systems; however, the method would not be suitable for a comprehensive
assessment of room acoustics.

184
7.3 Low level measurements
A series of room acoustics measurements at a reducing S/N have been examined using an
experimental low level measurement methodology to establish the effect on the
measurement output i.e. T
30
, EDT and STI under marginal conditions. The relation of
room reverberance to S/N was also examined.

Measurement data for ten test rooms revealed that EDT is a more consistent parameter
when compared to T
30
, producing reduced fluctuations at the derived values for the series
of measurements at a continuously reducing S/N. Accordingly, at marginal conditions
EDT can be significantly more accurate than T
30
, while its use for further post processing
e.g. for computer modelling purposes as the reference parameter of choice would enhance
confidence on the consistency of later analysis outcomes. Overall, a 20dB and 12dB S/N
was necessary in this study for an accurate estimation of T
30
and EDT respectively, over
all frequency bands with no signal averaging.

The measured acoustic parameters revealed a trend in terms of the relation between
threshold efficient S/N to T
30
and EDT, where for increasing values of reverberance a
higher S/N is generally required for an accurate measurement.

Higher signal levels are used to obtain adequate S/N, particularly at lower frequencies,
due to the effect of BGNL. Given that higher frequencies are more intrusive in terms of
annoyance, a significantly more tolerable level can be achieved by suitable signal
equalization to produce a constant S/N across the frequency spectrum while facilitating
measurement accuracy. For an absolute speech intelligibility assessment, this is only
valid if post processing of the measurements is used to suitably account for the influence
of speech level and BGNL.

185
7.4 Development of an optimized methodology for improved computer
models
Computer simulations that can accurately predict the acoustic environment have been
developed. The simulations have been used to investigate the prediction efficiency within
university lecture rooms in terms of speech intelligibility parameters under different
conditions. The focus was primarily on two aspects of the simulation process i.e. the
validation/calibration methodology and the detail resolution that is required for a
consistent prediction outcome under different conditions. The optimization of processing
speed was also addressed in parallel with the resultant simulation precision. A near field
response was confirmed as essential in terms of source directivity for simulating acoustic
conditions for small rooms.

The proposed prediction validation/calibration process utilized a single acoustic
parameter in comparison to acoustic measurements. Given that an accurate definition of
scattering and absorption coefficients is a critical element in reducing simulation
uncertainties, the estimation principally of absorption coefficients via simulation
calibration appeared to be most suitable for post evaluation applications. The
methodology enabled an accurate prediction i.e. prediction results that are consistent with
measurements under different conditions.

Enhanced detail in the representation of room geometry was found to be preferable for
smaller rooms. CAD platforms provide a useful tool for this purpose by speeding up
model generation, however typically resulting in an increase of the processing time
required for a prediction. Comparative pilot approaches for models of varying detail
resolution have established prediction results marginally over a JND STI for simple
models, while the same parameter was within the JND for models with enhanced detail
resolution. Results in terms of C
50
, T
30
and EDT were found to be acceptable for both
approaches, however with noticeably enhanced accuracy for the detailed models. In the
same context, the use of alternative experimental conditions e.g. different source
configurations, resulted in similar levels of accuracy for the two modelling approaches
without the need for any alterations (or further tuning) of the simulation post calibration,
186
i.e. based on the original T
30
(or EDT) calibration for one source configuration. In small
rooms thus, enhanced detail resolution has improved the prediction accuracy.

Simultaneous use of additional parameters during the validation/calibration process could
enable a more accurate prediction for simple models. However, this more elaborate
procedure appeared unnecessary for more detailed geometry representations. Considering
the potential advantages of using EDT as the reference parameter due to increased
robustness at marginal conditions, experimentation confirmed the approach as viable,
producing similar accuracy levels to a T
30
based procedure. The use of EDT as a
reference parameter is thus recommended as a means to increase confidence in the
predictions.

Considering the necessary balance between overly simplified rooms and complex
inefficient geometries, the processing time of CAD models was optimized to
approximate simple model run times. Simple steps have been described to improve the
simulation efficiency by model redesign, while retaining the required detail resolution. A
model design/preparation guideline was defined to optimize the simulation methodology.

The analysis of prediction outcomes for ten test rooms and four source configurations
demonstrated high accuracy in terms of the parameters considered i.e. EDT, T
30
, T
s
, C
50
,
C
80
and STI, emerging as particularly efficient for a speech intelligibility assessment.

7.5 Auralization accuracy assessment
A new hybrid methodology has been proposed for an objective validation of auralization
accuracy. The method uses a swept sine test signal at the convolution stage and an open
loop measurement system to assess the end product of an auralization process.

Comparisons to the only existing objective auralization validation method revealed
variations in the assessment results, predominantly for low frequencies. The proposed
method allows for the discrepancies to be identified and taken into account in an
187
objective auralization validation. For consistency purposes thus, related auralizations can
be excluded prior to the presentation of a given set.

An additional improvement over the old method is the increased flexibility of an
objective auralization assessment. By enabling the use of a swept sine at the
deconvolution stage, a broader choice of software platforms for the open loop processing
are accordingly available since the process is not restricted to Dirac pulses.

Auralizations for ten test rooms were further assessed using the proposed method in
comparison to direct numerical output of the predictions to determine the accuracy of the
auralizations for the primary test rooms. Typically, minor variations were found for the
acoustic parameters considered while clarity and STI variations in particular were below
2dB and 0.03 respectively, for all source configurations. However, individual data points
exceeding the typical error suggested that for multi source conditions in particular a
screening of results is essential.

Given an uncertainty factor in the level of auralization accuracy for either source
configuration, an assessment should generally allow for an error margin equal to the JND
for STI and 2dB for clarity, compared to the predictions numerical output, or the
resultant uncertainty could become unacceptable for projects involving marginal
intelligibility conditions.

A subjective assessment of auralization was undertaken considering realism and source
localization, compared to room recordings. The assessment indicated a satisfactory result
overall for both auralization aspects, with no significant differentiation between the
source configurations.

Using simple and CAD models, the quality of the convolution process that produces
the auralization was examined in relation to the model detail, showing no significant
differences between the modelling approaches. As the CAD model predictions were
more consistent with measurements, an analogous outcome can be expected for the
188
auralization products. A comparative subjective assessment of auralizations for simple
and CAD models showed that both approaches produced satisfactory results. However,
the CAD auralizations were of marginally higher quality.

7.6 Further work
The research undertaken could be furthered in a number of aspects:
- The low level measurement methodology needs to be further investigated for the
different source configurations in different environments to better understand the
measurement conditions and extract further propositions to improve the
measurement efficiency.
- Binaural STI measurements can be compared to standard single omni directional
based measurements to establish prospective differences. Model behaviour can be
analyzed in relation to binaural data to verify the prediction quality for different
conditions.
- A detailed model was preferred for computer simulations of smaller fitted rooms.
This is not desired for large rooms however, further work is needed to identify the
prospective limits past which an enhanced level of detail is no longer beneficial.
- The models could be improved by further considering prediction repeatability,
processing speed and overall efficiency.
- The models should be used to give suggestions for optimal acoustic conditions in
classrooms, particularly when sound systems are utilized, e.g. controlling
loudspeaker time delay, positioning, aiming, directivity pattern etc.

7.7 Overall conclusions
The different stages of a general room acoustics/speech intelligibility assessment in
university lecture rooms have been examined and analyzed. Computer models were
developed to accurately predict acoustic parameters while reducing the uncertainties
involved, facilitating a confident assessment in the rooms considered.

189
190
In Chapter 3, the room acoustics measurements in ten test rooms have been analyzed for
four source configurations and two measurement methodologies to identify data
consistency between different approaches. Measurement uncertainty was further analyzed
in terms of S/N at marginal conditions in Chapter 4, highlighting the advantages of using
EDT in terms of later use of data for a validation/calibration methodology. A
validation/calibration methodology and the preferred approach in terms of model detail
resolution have been examined in Chapter 5, producing simulations that enable accurate
prediction outcomes based on EDT. Finally, a flexible objective auralization assessment
methodology has been developed and validated in Chapter 6. The associated uncertainties
for each stage of the assessments were specified, see table 7.1, to account for the potential
error margins, therefore enhancing confidence on the assessment outcome for analogous
research.
Table 7. 1. Assessment uncertainty via error margins, calculated by considering the data origin at the assessment stages
Error margins based on the origin of related data
Reference process T
30
EDT C
50
STI
Four source configurations' data consistency ()
5% 10% 3dB 0.02 *
Open loop data (either source configuration) compared to closed loop data
10% 10% 2dB 0.03 *
Prediction (simple, single source) compared to measurements
25% 50% 7dB 0.06 *
Prediction (CAD, single source) compared to measurements
10% 25% 3dB 0.03 *
Prediction (simple, multi source) compared to measurements
N/A N/A 5dB 0.06 *
Prediction (CAD, multi source) compared to measurements
N/A N/A 3dB 0.03 *
Prediction (numerical output) compared to auralization (single source)
5% 5% 2dB 0.03

Prediction (numerical output) compared to auralization (multi source)
5% 5% 2dB* 0.03*

* screening needed for extreme values
* averaged over receiver positions

This work has thus developed a suitable methodology for the assessment of primary
acoustic parameters and speech intelligibility using computer simulations and
auralization in university lecture rooms. Confidence in the assessment has been achieved,
specifying the potential error margins at the different stages of the assessment.

The outcomes can be used by acoustics researchers and consultants to study and improve
the acoustic conditions in university lecture rooms and other alike spaces.
References

References

1
Kuttruff H., Room Acoustics, 4th Edition, ISBN: 0-419-24580-4, Elsevier Science publishers, Taylor &
Francis Group

2
Schroeder M. R., New Method of Measuring Reverberation Time, J. Acoust. Soc. Am., 37, 409412
(1965)

3
Alrutz, H. & Schroeder, M.R., A fast Hadamard transform method for the evaluation of measurements
using pseudorandom test signals, Proc. of 11th International Congress on Acoustics, Paris (France 1983) 6
(1983) p.235-238

4
Vorlnder M., Kob M., Practical Aspects of MLS Measurements in Building Acoustics, Applied
Acoustics, Vol. 52, No. 314, pp. 239-258, 1997

5
Rife D. D., Vanderkooy J., Transfer-Function Measurement with Maximum-Length Sequences, J. Audio
Eng. Soc., Vol. 37, No. 6, June 1989

6
Morset L., Morset development, WinMLS 2004, Professional Measurement Software for PC and
Soundcard, Users Manual, www.winmls.com

7
Griesinger D., Beyond MLS - Occupied Hall Measurement with FFT Techniques, J. Audio Eng. Soc.,
preprint 4403 (Nov 1996)

8
Farina A., Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique,
J. Audio Eng. Soc., preprint 5093 (Feb 2000)

9
Moriya N., Kaneda Y., Study of harmonic distortion on impulse response measurement with logarithmic
time stretched pulse, Acoust. Sci. & Tech. 26, 5 (2005)

10
Muller S., Massarani P., Transfer-Function Measurement with Sweeps, J Audio Eng Soc, Vol. 49, No 6,
June 2001

11
BS EN ISO 18233-2006, Acoustics - Application of new measurement methods in building and room
acoustics

12
Farina A., Advancements in Impulse Response Measurements by Sine Sweeps, Presented at AES 122nd
Convention, Vienna, Austria, 2007 May 58, Convention paper 7121

13
BS EN 60268-16;2003, Sound system equipment - Objective rating of speech intelligibility by speech
transmission index

14
Mapp P., Speech intelligibility measurement-The current state of the art, Proc. Institute of Acoustics, Vol.
25, Pt. 7 (2003)

15
Bjor O. H., STIPA-The golden mean between full STI and RASTI, Proc. Institute of Acoustics, Vol. 25,
Pt. 7 (2003)

16
Steeneken H. et al, Development of an Accurate, Handheld, Simple-to-use Meter for the Prediction of
Speech Intelligibility, Presented at Reproduced Sound 17, Stratford-on-Avon, November 16, 2001

17
Mapp P., Limitations of Current Sound System Intelligibility Verification Techniques, Presented at AES
113th Convention, Los Angeles, California, 2002 October 5 8, Convention paper 5668

191
References

192

18
Mapp P., Systematic & Common Errors in Sound System STI and Intelligibility Measurements,
Presented at AES 117th Convention, San Francisco, California, 2004 October 28-31, Convention paper
6271

19
Mapp P., Is STIPa a robust measure of speech intelligibility performance?, Presented at AES 118
th

Convention , Barcelona, Spain, 2005 May 28-31, Convention paper 6399

20
Skarlatos D., Applied Acoustics (in Greek), Second edition, ISBN: 960-87710-1-3

21
Fitzroy D., Reverberation Formula Which Seems to Be More Accurate with Nonuniform Distribution of
Absorption, J. Acoust. Soc. Am., 31 (7), 893-897 (1959)

22
Kang J., Neubauer R. O., Predicting reverberation time: Comparison between analytic formulae and
computer simulation, 17th International Congress on Acoustics (ICA), Rome, Italy, 2001

23
Jordan V. L., Acoustical Criteria for Auditoriums and Their Relation to Model Techniques, J. Acoust.
Soc. Am., 47, 408 (1970)

24
Damaske P., Ando Y., Interaural Crosscorrelation for Multichannel Loudspeaker Reproduction, Acustica,
27 (1972) 232-238

25
Thiele R., "Richtungsverteilung und Zeitfolge der Schallrckwrfe in Ramen,", Acustica 3 , 291-302
(1953).

26
Reichardt W. et al., Abhngigkeit der grenzen zwischen brauchbarer und unbrauchbarer durchsichtigkeit
von der art des musikmotives, der nachhallzeit und der nachhalleinsatzzeit, Appl. Acoustics, 7 (1974) 243-
264 (In German with English abstract)

27
Bradley J.S., A just noticeable difference in C
50
for speech, Applied Acoustics 58 (1999) p.99-108

28
Bradley J. S., Relationships among Measures of Speech Intelligibility in Rooms, J. Audio Eng. Soc., Vol.
46, No. 5, May 1998

29
Kurer R., Zur Gewinnung von Einzahlkriterien bei Impulsmessungen in der Raumakustik, Acustica, 21
(1969) 370 (in German)

30
French N. R., Steinberg J. C., Factors Governing the Intelligibility of Speech Sounds, J. Acoust. Soc.
Am., 19 (1), 90-119 (1947)

31
Kryter K. D., Methods for the Calculation and Use of the Articulation Index, J. Acoust. Soc. Am., 34
(11), 1689-1697 (1962)

32
ANSI S3.5-1969, Methods for the calculation of the articulation index

33
ANSI S3.5-1997, Methods for the calculation of the speech intelligibility index (SII)

34
Peutz V. M. A., Articulation Loss of Consonants as a Criterion for Speech Transmission in a Room, J.
Audio Eng. Soc., Vol. 19, p.915-919, Dec 1971

35
Steeneken H.J.M., Houtgast T., Phoneme group specific octave band weights in predicting speech
intelligibility, Speech Communication 38 (2002) 399411

36
BS EN ISO 9921: 2003, Ergonomics - Assessment of speech communication

References

193

37
Barron M., Auditorium Acoustics and Architectural Design, E & FN Spon, First edition,
ISBN:0419177108

38
ANSI S3.2-1989, Method for Measuring the Intelligibility of Speech over Communication Systems

39
House A.S. et al., Articulation-Testing Methods: Consonantal Differentiation with a Closed-Response
Set, J. Acoust Soc. Am. 37, 158-166 (1965)

40
Steeneken H., The measurement of speech intelligibility, TNO human factors, White paper

41
Katz J., Handbook of Clinical Audiology, Fifth edition, ISBN: 0-683-30765-7

42
Houtgast T., Steeneken H. J. M., The modulation transfer function in room acoustics as a predictor of
speech intelligibility, Acustica, Vol.28 (1973) 66-73

43
Houtgast T., Steeneken H. J. M., A review of the MTF concept in room acoustics and its use for
estimating speech intelligibility in auditoria, J. Acoust. Soc. Am., 77 (3), 1069-1077 (1985)

44
Houtgast T., Steeneken H. J. M., Evaluation of speech transmission channels by using artificial signals,
Acustica, Vol.25 (1971) 355-367

45
Steeneken H. J. M., Houtgast T., A physical method for measuring speech-transmission quality, J.
Acoust. Soc. Am., 67 (1), 318-326 (1980)

46
Houtgast T., Steeneken H. J. M., Plomp R., Predicting Speech Intelligibility in Rooms from the
Modulation Transfer Function. I. General Room Acoustics, Acustica, Vol.46 (1980) 60-72

47
Plomp R., Steeneken H. J. M., Houtgast T., Predicting Speech Intelligibility in Rooms from the
Modulation Transfer Function. II. Mirror Image Computer Model Applied to Rectangular Rooms,
Acustica, Vol.46 (1980) 73-81

48
van Rietschote H. F., Houtgast T., Steeneken H. J. M., Predicting Speech Intelligibility in Rooms from
the Modulation Transfer Function IV: A Ray-Tracing Computer Model, Acustica, Vol.49 (1981) 245-252

49
Schroeder M. R., Modulation transfer functions: Definition and Measurement, Acustica Vol. 49 (1981)
179-182

50
Steeneken H. J. M., Houtgast T., Mutual dependence of the octave-band weights in predicting speech
intelligibility, Speech Communication 28 (1999) 109-123

51
Anderson B. W., Kalb J. T., English verification of the STI method for estimating speech intelligibility of
a communications channel, J. Acoust. Soc. Am., 81 (6), 1982-1985 (1987)

52
Steeneken H. J. M., Houtgast T., Validation of the revised STI
r
method, Speech Communication 38
(2002) 413-425

53
van Wijngaarden S. J., Steeneken H. J. M., Houtgast T., Quantifying the intelligibility of speech in noise
for non-native talkers, J. Acoust. Soc. Am. 112 (6), 3004-3013 (2002)

54
van Wijngaarden S. J., Steeneken H. J. M., Houtgast T., Quantifying the intelligibility of speech in noise
for non-native listeners, J. Acoust. Soc. Am. 111 (4), 1906-1916 (2002)

References

194

55
van Wijngaarden S. J., Houtgast T., Bronkhorst A. W., Steeneken H. J. M., Using the Speech
Transmission Index for predicting non-native speech intelligibility, J. Acoust. Soc. Am. 115 (3), 1281-1291
(2004)

56
Mapp P., Private communication (2008)

57
van Wijngaarden S. J., Drullman R., Development of a binaural speech transmission index (A), J. Acoust.
Soc. Am., 119 (5), 3442 (2006), Abstract

58
van Wijngaarden S. J., Drullman R., Binaural intelligibility prediction based on the speech transmission
index, J. Acoust. Soc. Am., 123 (6), 4514-4523 (2008)

59
van Wijngaarden S. J., Verhave J. A., Recent advances in STI measuring techniques, Proc. of the
Institute of Acoustics, Vol. 28, Pt. 6, 2006

60
Drullman R., van Wijngaarden S. J., New directions for a speech-based speech transmission index (A), J.
Acoust. Soc. Am., 119 (5), 3442 (2006), Abstract

61
Mapp P., Speech intelligibility measurement-The current state of the art, Proc. of the Institute of
Acoustics, Vol. 25, Pt 7 (2003)

62
Colburn H. S., Binaural Hearing Mechanisms, Proc. of the 37
th
International Congress and exposition on
noise control engineering - Inter noise 2008, Shanghai, China, 26-29 October (2008)

63
David M. Howard, James Angus, Acoustics and Psychoacoustics, second edition, Focal
Press Music Technology Series, ISBN: 0 240 51609 5

64
Bronkhorst A. W., Plomp R., Binaural speech intelligibility in noise for hearing-impaired listeners, J.
Acoust. Soc. Am., 86 (4), 1374-1383 (1989)

65
Bronkhorst A. W., Plomp R., The effect of head-induced interairal time and level differences on speech
intelligibility in noise, J. Acoust. Soc. Am., 83 (4), 1508-1516 (1988)

66
Hawley M., Litovsky R. Y., Culling J. F., The benefit of binaural hearing in a cocktail party: Effect of
location and type of interferer
A)
, J. Acoust. Soc. Am. 115 (2), February 2004

67
Houtgast T., The effect of ambient noise on speech intelligibility in classrooms, Applied Acoustics 14
(1981) 15-25

68
Kuttruff H., Sound fields in small rooms, Presented at AES 15th International Conference: Audio,
Acoustics & Small Spaces (October 1998), paper number: 15-002

69
Haas H., The Influence of a Single Echo on the Audibility of Speech, J. Audio Eng. Soc., Vol. 20, No. 2,
p.146-159, March 1972

70
Bradley J. S., Reich R. D., and Norcross S. G., On the combined effects of signal-to-noise ratio and room
acoustics on speech intelligibility, J. Acoust. Soc. Am. 106 (4), Pt. 1, October 1999

71
Mapp P., Intelligibility Winning the acoustics battle, Proceedings of the AES 18
th
UK Conference, Live
sound -Technological advances to satisfy new audience expectations, London, UK, 2003 April

72
Eargle J., Foreman C., JBL, Audio engineering for sound reinforcement, Hal Leonard corp.,
ISBN: 0 634 04355 2

References

195

73
Mapp P., Online article, http://www.svconline.com (Cited 25/01/2005)

74
Sato H., Bradley J. S., Evaluation of acoustical conditions for speech communication in working
elementary school classrooms, J. Acoust. Soc. Am. 123 (4), April 2008

75
Hodgson M., Experimental investigation of the acoustical characteristics of university classrooms, J.
Acoust. Soc. Am. 106 (4), Pt. 1, October 1999

76
Bradley J. S., Sato H., The intelligibility of speech in elementary school classrooms, J. Acoust. Soc. Am.
123 (4), April 2008

77
DfES, Building Bulletin 93, Acoustic design for schools, http://www.teachernet.gov.uk/acoustics

78
ANSI S12.60-2002 American National Standard Acoustical Performance Criteria, Design Requirements,
and Guidelines for Schools

79
Bistafa S. R., Bradley J. S., Reverberation time and maximum background-noise level for classrooms
from a comparative study of speech intelligibility metrics, J. Acoust. Soc. Am. 107 (2), February 2000

80
Hodgson M., Measurement and prediction of typical speech and background-noise levels in university
classrooms during lectures, J. Acoust. Soc. Am. 105 (1), January 1999

81
Shield B., Dockrell J., External and internal noise surveys of London primary schools, J. Acoust. Soc.
Am. 115 (2), February 2004

82
Mapp P., Measuring speech intelligibility in classrooms with and without hearing assistance, Proc.
Institute of Acoustics, Vol. 25, Pt. 7 (2003)

83
Hodgson M., Nosal E.M., Effect of noise and occupancy on optimal reverberation times for speech
intelligibility in classrooms, J. Acoust. Soc. Am. 111 (2), February 2002

84
Yang W., Hodgson M., Optimal reverberation time for speech intelligibility for normal and hearing-
impaired listeners using auralization, Proc. of the 19
th
International Congress on Acoustics, Madrid, Spain

85
Bradley J. S., Sato H., Picard M., On the importance of early reflections for speech in rooms, J. Acoust.
Soc. Am. 113 (6), June 2003

86
Bistafa S. R., Bradley J. S., Reverberation time and maximum background-noise level for classrooms
from a comparative study of speech intelligibility metrics, J. Acoust. Soc. Am. 107 (2), February 2000

87
Mapp P., Relationships between Speech Intelligibility Measures for Sound Systems, Presented at AES
112
th
Convention , Munich, Germany, 2002 May 10-13, Convention paper 5604

88
Bradley J. S., Speech intelligibility studies in classrooms, J. Acoust. Soc. Am. 80 (3), September 1986

89
Onaga H., Furue Y., Ikeda T., The disagreement between speech transmission index (STI) and speech
intelligibility, Acoust. Sci. & Tech. 22, 4 (2001)

90
Mapp P., Modifying STI to Better Reflect Subjective Impression, Presented at AES 21
st
Conference , St
Petersburg, Russia, 2002 June 01-03

91
BS EN 60849:1998, IEC 60849:1998 - Sound systems for emergency purposes

92
Vorlnder M., Auralization, ISBN:978-3-540-48829-3, Springer-Verlag (2008) Berlin Heidelberg

References

196

93
Schroeder M.R., Die statistischen Parameter der Frequenzkurven von grossen Rumen, Acustica Vol. 4
(1954) 594-600

94
Schroeder M. R., Atal B. S., Bird C., Digital computation in room acoustics, 4
th
International Congress
on Acoustics, Copenhagen, 1962

95
Krokstad A., Strom S., Srsdal S., Calculating the acoustical room response by the use of a ray tracing
technique, J. of Sound and Vibration (1968), Vol. 8, 118-125

96
Dance S., Shield B., The effect on prediction accuracy reducing the number of rays in a ray-tracing
model, Proc. of Inter noise 1994, Yokohama, Japan, 29-31 August (1994)

97
Yang L., Computer modelling of speech intelligibility in underground stations, PhD thesis, London
South Bank University (1997)

98
Allen J. B., Berkley D. A., Image method for efficiently simulating small-room acoustics, J. Acoust. Soc.
Am. 65 (4), April 1979

99
Borish J., Extension of the image model to arbitrary polyhedra, J. Acoust. Soc. Am. 75 (6), June 1984

100
Long M., Architectural Acoustics, Elsevier academic press (2006), ISBN: 13: 978-0-12-455551-8

101
Vorlnder M., Simulation of the transient and steady-state sound propagation in rooms using a new
combined ray-tracing/image-source algorithm, J. Acoust. Soc. Am. 86 (1), July 1989

102
Dalenbck, B. I., CATT-Acoustic: Image source modelling augmented by ray tracing and diffuse
reflections, Applied Acoustics, Volume 38, Issues 2-4, 1993, 350 (Abstract)

103
van Maercke D., Martin J., The prediction of echograms and impulse responses within the Epidaure
software, Applied Acoustics 38 (1993) 93-114

104
Naylor G. M., ODEON-Another hybrid room acoustical model, Applied Acoustics 38 (1993) 131-143

105
Hodgson M., Evidence of diffuse surface reflections in rooms, J. Acoust. Soc. Am. 89 (2), February
1991

106
Howarth M. J., Lam Y. W., An assessment of the accuracy of a hybrid room acoustics model with
surface diffusion facility, Applied Acoustics 60 (2000) 237-251

107
Vorlnder M., International round robin on room acoustical computer simulations, 15
th
International
Congress on Acoustics, Trondheim, 26-30 Jun 1995

108
Bork I., A comparison of room simulation software the 2
nd
round robin on room acoustical computer
simulation, Acustica - Acta Acustica, Vol. 86 (2000) 943-956

109
Bork I., Report on the 3rd Round Robin on Room Acoustical Computer Simulation Part I:
Measurements, Acta Acustica united with Acustica, Vol. 91 (2005) 740 752

110
Bork I., Report on the 3rd Round Robin on Room Acoustical Computer Simulation Part II:
Calculations, Acta Acustica united with Acustica, Vol. 91 (2005) 753 763

111
Torres R.R., Kleiner M., Dalenback B.I., Audibility of 'Diffusion' in Room Acoustics Auralization: An
Initial Investigation, Acustica - Acta Acustica, Vol. 86 (2000) 919-927

References

197

112
Kleiner M., Dalenback B.I., Svensson P., Auralization An Overview, J. Audio Eng. Soc., Vol. 41, No.
11, November 1993

113
Dalenback B.I., Kleiner M., Svensson P., The Audibility of Changes in Geometric Shape, Source
Directivity, and Absorptive treatment: Experiments in Auralization, Presented at AES 91
st
Convention,
New York, U.S.A., 1991 October 4-8, Convention paper 3123

114
Nestoras C., Dance S., Computer model utilization for speech intelligibility assessment in enclosed
spaces using sound systems, Proc. Institute of Acoustics, Vol. 30, Pt. 2 (2008)

115
Hodgson M., Experimental investigation of the acoustical characteristics of university classrooms, J.
Acoust. Soc. Am. 106 (4), Pt. 1, October 1999

116
Nestoras C., Gomez L., Dance S., Murano S., Speech intelligibility measurements in a diffuse space
using open and closed loop systems, 19
th
International Congress on Acoustics (ICA), Madrid, 2-7 Sep
2007

117
Gomez L., Nestoras C., Dance S., Murano S., Speech intelligibility measurements in a non-diffuse space
using open and closed loop systems, 19
th
International Congress on Acoustics (ICA), Madrid, 2-7 Sep
2007

118
BS EN ISO 3382-2: 2008, Acoustics - Measurement of room acoustic parameters - Reverberation time
in ordinary rooms

119
Hodgson M. et al., Measurement and prediction of typical speech and background-noise levels in
university classrooms during lectures, J. Acoust. Soc. Am. 105 (1), January 1999

120
Hodgson M., Rating, ranking and understanding acoustical quality in university classrooms, J. Acoust.
Soc. Am. 112 (2), August 2002

121
Nestoras C., Dance S., Design and Validation of Computer Models for the Assessment of Speech
Intelligibility in Enclosed Spaces, Proc. of the 37
th
International Congress and exposition on noise control
engineering - Inter noise 2008, Shanghai, China, 26-29 October (2008)

122
Nestoras C., Dance S., Speech intelligibility measurements with low level output efficiency limitations,
Proc. Institute of Acoustics, Vol. 28, Pt. 6 (2006)

123
Nestoras C., Dance S., Measurement of Speech Intelligibility Using Low Level Output - Threshold
Efficient S/N Ratios, Acoustics 08, Paris, 29th June 4th July 2008

124
AES-4id-2001 (r2007): AES information document for room acoustics and sound reinforcement systems
-- Characterization and measurement of surface scattering uniformity, J. Audio Eng. Soc., Vol. 49, No. 3,
149-165, 2001

125
Nironen H., Diffuse Reflections in Room Acoustics Modelling, MSc dissertation, Helsinki University of
Technology (2004)

126
ISO 17497-1: 2004, Acoustics -- Sound-scattering properties of surfaces -- Part 1: Measurement of the
random-incidence scattering coefficient in a reverberation room

127
Vorlander M., Mommertz E., Definition and measurement of random-incidence scattering coefficients,
Applied Acoustics 60 (2000) 187-199

128
CATT-Acoustic v8.0f (build 2.01) Acoustics prediction software, User manual, (CATT 1988-2006)

References

198

129
Hodgson M., Scherebnyj K., Estimation of the absorption coefficients of the surfaces of classrooms,
Applied Acoustics 67 (2006) 936944

130
Zeng X., Christensen C. L., Rindel J. H., Practical methods to define scattering coefficients in a room
acoustics computer model, Applied Acoustics 67 (2006) 771786

131
Saher K., Nijs L., van der Voorden M., Definition of material properties in the acoustical model
calibration, Presented at AES 118th Convention, Barcelona, Spain, 2005 May 28-31, Convention paper
6496

132
Dalenbck B. I., Kleiner M., Svensson P., The Audibility of Changes in Geometric Shape, Source
Directivity, and Absorptive Treatment: Experiments in Auralization, Presented at AES 91st Convention,
New York, USA, 1991 October 4-8, Convention paper 3123

133
Wang L. M., Vigeant M. C., Evaluations of output from room acoustic computer modeling and
auralization due to different sound source directionalities, Applied Acoustics (2007),
doi:10.1016/j.apacoust.2007.09.004

134
Dalenbck B. I., CATT DLL Directivity Interface (DDI) v1.0, DDI White Paper, Rev. 0, 981021

135
BS EN 60268-5:2003, Sound system equipment-Loudspeakers

136
Wang L. M., Rathsam J., The influence of absorption factors on the sensitivity of a virtual
rooms sound field to scattering coefficients, Applied Acoustics (2007),
doi:10.1016/j.apacoust.2007.09.004

137
Google SketchUp v6 Pro, 3D CAD software, User manual

138
http://www.rahe-kraft.de/cms/su2catt/index.htm

139
Peng J., Feasibility of speech intelligibility assessment based on auralization, Applied Acoustics 66
(2005) 591601

140
Christensen C. L., Weitze C. A., Rindel J. H. , Gade A. C., Validation of an auralization system
(Abstract), J. Acoust. Soc. Am. 111 (5), May 2002

141
B&K Dirac 3.0, Room Acoustics Software Type 7841, http://www.bksv.com

142
ITU-T Recommendation P.58, Telephone transmission quality - Head and torso simulator for
telephonometry - August 1996

Final - 1) 2 - PHD

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Final - 1) 2 - PHD

Încărcat de

Drepturi de autor:

Formate disponibile

THE ASSESSMENT OF SPEECH INTELLIGIBILITY IN ROOM ACOUSTICS

FOR EFFICIENT APPLICATION IN COMPUTER MODELLING AND

S-ar putea să vă placă și