Assessor selection process for efficient multisensory evaluation

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/261614129
Assessor selection process for multisensory applications
Conference Paper · January 2009
CITATIONS READS
3 239
2 authors, including:
Nick Zacharov
Delta
86 PUBLICATIONS 772 CITATIONS
SEE PROFILE
All content following this page was uploaded by Nick Zacharov on 02 December 2015.
The user has requested enhancement of the downloaded file.

Audio Engineering Society
Convention Paper 7788

Presented at the 126th Convention
2009 May 7–10 Munich, Germany
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have
been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from
the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes
no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Assessor selection process for multisensory

applications
Søren Vase Legarth1 , and Nick Zacharov1
1
DELTA SenseLab, Venlighedsvej 4, DK-2970 Hørsholm, Denmark
Correspondence should be addressed to Søren Vase Legarth (SVL@delta.dk)
ABSTRACT
Assessor panels are used to perform perceptual evaluation tasks in the form of listening and viewing tests. In
order to ensure the quality of collected data it is vital that the selected assessors have the desired qualities in
terms of discrimination aptitude as well as consistent rating ability. This work extends existing procedures in
this field to provide a statistically robust and efficient manner for assessing and evaluating the performance
of assessors for listening and viewing tasks.
1. INTRODUCTION ies [17, 15], where a high degree of sensory acuity is

Listening tests and viewing tests continue to play an required as well as verbal fluency skills.
important role in the evaluation of perceived qual-
The use of expert assessor panels is thus widespread,
ity of different systems, e.g. speech codecs, audio
well accepted and quite a number of procedures have
codecs, display quality, etc. Depending on the spe-
been developed over the years to screen assessors
cific field, different requirements are imposed regard-
from a pool of naive assessors towards becoming ex-
ing assessor expertise and experience. In the field of
perts. Much of the panel selection work has been
telecommunication speech it is often proposed that
based upon the work of Hansen [9], which was later
so called naive assessors are used as discussed in [21].
expanded upon by Bech [2, 3] and then by Zacharov
However, in other areas, such as audio coding, the
[30], Mattila [26] and Isherwood et al [10]. All of
use of expert assessors is required, e.g. in relation
these studies used a 3 stage approach to screening
to recommendations such as ITU-R BS.1116-1 [19]
assessor suitability, comprising of
or ITU-R BS.1534-1 [20]. Experts assessors are also
required when performing sensory evaluation stud-
• a questionnaire,
Legarth AND Zacharov Assessor selection process
• audiometry, and are referred to in a number of standards. How-

ever, for the most part there is no unique definitions
• screening experiments.
of the level of expertise of assessors in each these
In each of these cases, the panel screening was in- standard. For the purpose of consistency the termi-
tended for audio only applications, without tak- nology employed in this paper will be that defined
ing into considerations acuity in other modalities. in ISO 8586-2 [14] and referred to in Table 1. The
In the present research environment, many hi-tech progression of an assessor from naive towards expert
devices and services are being developed that em- status is well described in ISO 8586-2 [14] and illus-
ploy multi-modal stimuli, for example audio-visual trated in Figure 1. This topic is discussed in greater
codecs, multi-modal GUIs, etc. In all cases, the detail on section 1.1.
characteristics and qualities of these are important The purpose of the procedure described in this paper
and should be optimized for end-user experience. So aims to improve upon the existing screen procedures
whether the audio, visual and haptic modalities are through application of the triangle test methodology
studied individually or together, there is a need for [16] for screening of assessors. This approach has
expert assessors with skills in all of these 3 areas. been demonstrated by Lorho for the assessment of
One of the key aims of the screening procedure de- assessor performance and is discussed in [25]. Ad-
fined here is to effectively evaluate the potential of ditionally, the method provides a means for rapidly
assessors for multimodal assessment tasks. evaluating assessor performance for both audio and
Another approach which has been considered by visual test, both within and between assessors, but
Ghani et al [7], is to consider whether the audi- also provide a degree of absoluteness that allows
tory capabilities of assessors can be measured us- for results to compared between different rounds of
ing a range of psychometrics tests, for cases such as screening.
frequency discrimination, gap detection, etc. It was
Assessors having passed through this process can
found that even when studying assessors with a large
be categories as selected assessors according to ISO
battery of tests, it was difficult to predict their ex-
8586-2 [14] and are ready for further training and as-
pertise for sound quality related listening tests from
sessment for them to be categorized as expert asses-
this data. This approach is thus not beneficial for
sors for any given application domain (speech cod-
screening assessors for sound quality listening test
ing, noise suppression, etc.). This selection proce-
application or other associated test types.
dure established their clear potential to be trained
One of the additional requirement for our screening toward expert status in both audio and visual eval-
is to establish the goodness of assessors for the de- uations.
velopment of descriptive vocabularies and attribute
sets, as used in sensory profiling, assessors should 1.1. Assessor categorisation
also be able to express the characteristics of what Whilst the terms untrained, naive, experienced and
they experience and perceive. Two earlier examples expert are often employed in the audio literature to
of verbal fluency tests have been considered and ap- describe the nature of a assessors performance, a re-
plied by Koivuniemi and Zacharov [24] and Wickel- view of a number of the standards and recommenda-
maier and Choisel [29]. Using these methods it is tions (e.g. [5, 19, 22, 23, 21]) will yield inconsistent
possible to establish how well assessors can describe definitions of these terms. This topic is discussed in
their experiences of what they perceive. In the pro- depth by Bech and Zacharov in [4] for the interested
cedure described here a more traditional verbal elic- reader. However, a well formulated summary of as-
itation process is employed to study the assessors sessor categorisation has been developed in the field
verbal fluency and this will be described in detail. of agricultural food products and reported in several
General guidance of the selection and training of as- ISO standards discussed in the following section.
sessors for sensory evaluation in provided in the ISO 1.1.1. ISO assessor categorisation
recommendation 8586-1 [13]. The ISO has standarised a set of terms which are em-
The use of the terms naive, selected and expert as- ployed in the agricultural food industry to describe
sessor are used extensively throughout this paper different kinds of assessors also commonly referred
AES 126th Convention, Munich, Germany, 2009 May 7–10

Page 2 of 17
provides some information regarding how assessors

Naive can progress from being untrained assessors through
assessor to being expert assessors.
It is considered by the authors that this terminology
is very clearly defined and can be applied to any field
Recruitment, preliminary screening and instruction of sensory perception/evaluation irrespective of the
specific nature of that field. It is suggested that this
unambiguous terminology be adopted in the field of
Initiated audio in order to clarify communication regarding
assessor assessor categorisation and their associated perfor-
mance.
Training in methods and general principles 2. ASSESSOR SELECTION PROCEDURE

The assessor selection process defined here aims to
screen the best assessors from a pool of naive candi-
Selected
dates, that show the best potential for being trained
assessor toward the status of expert assessors. At the end
of this selection process, assessors can be considered
as selected assessors according to ISO 8586-2 [14]
and will still require training and qualification before
Selection for training
they can be considered expert in any given category
of evaluation.
Evaluation of potential
The three stage process of screen is defined in the
following section comprising of
Pre selection Basic questionnaire for evaluation of

Monitoring of performance and/or testing
overall suitability.
Qualification stage I Auditory / visual acuity
Expert tests, verbal fluency test and personal interview.
assessor
Qualification stage II Four screening tests
2.1. Pre selection

Fig. 1: The process of sensory assessor development According to ISO 8586-1 it is recommended to pre-
according to ISO 8586-2 [14]. select 4 times as many assessors as needed for the
final panel. This rule of thumb is made to ensure
that the pool of assessors to pick from is large enough
to in audio as assessors. This terminology is defined to contain good assessors and supports earlier per-
predominantly for application in descriptive analy- formed screenings [26].
sis tasks, where objective assessment of attributes is In this case a pool of 80 persons signing up for the
the primary purpose. panel and the target panel size was 15. The per-
sons had filled in a web questionnaire which con-
Two key standards on this topic exist including ISO
tained similar questions as described by Mattila and
standard 8586-1 [13] and 8586-2 [14]. The former
Zacharov [26], with additional questions in the vi-
focus upon the selection, training and monitoring of
sual domain. The questionnaire is found in Table
selected assessors whilst the latter consider these as-
2.
pects for experts. In order to clarify these meanings
and terms, please refer to Table 1. Figure 1 also The web questionnaire was divided into subgroups:

Page 3 of 17
Assessor category Definition

Assessor Any person taking part in a sensory test
Naive assessor A person who does not meet any particular
criterion
Initiated assessor A person who has already participated in a
sensory test
Selected assessor Assessor chosen for his/her ability to carry out
a sensory test
Expert In the general sense, a person who through
knowledge or experience has competence to
give an opinion in the field about which he/she
is consulted. (Please note that the term expert
does not provide any indication regarding the
qualification or suitability of the individual to
perform listening tests.)
Expert assessor Selected assessor with a high degree of sensory
sensitivity and experience in sensory method-
ology, who is able to make consistent and re-
peatable sensory assessments of various prod-
ucts
Specialised expert assessor Expert assessor who has additional experience
as a specialist in the product and/or process
and/or marketing, and who is able to perform
sensory analysis of the product and evaluate
or predict effects of variations relating to raw
materials, recipes, processing, storage, ageing,
and so on.
Table 1: Summary of assessor categories employed in sensory analysis, as defined in ISO standard 8586-2
[14], applied to the food industry and recommended for adoption in the field of audio.
• Personal data Several pre-selection criteria had been formulated to

identify potential candidates for the panel.
• Availability
• Health
1. No known history of hearing damage
• Previous test experience
2. No known history of colour blindness
• Interest and experience with sound
• Interest and experience with vision 3. Danish as mother tongue
The purpose of the questionnaire was mainly to gain 4. Age between 18 - 50 years
knowledge about age, experience and interest in lis-
tening and viewing tests, availability and native lan- 5. Availability for tests during daily hours (Mon -
guage. Fri 9-18)

Page 4 of 17
Table 2: Summary of the web questionnaire.
43 persons from the original pool complied with all The first qualification stage contained the following
five selection criteria and were selected for the first tasks/tests
qualification test session.
2.2. Qualification stage I • Verbal fluency

The purpose of the qualification tests was to screen • Audiometry
the persons for hearing and visual impairments and
to get an impression of their motivation and person- • Loudness test
ality. The persons were invited in groups of 3 - 5 to
perform the tests. • Visual tests

Page 5 of 17
• Personal interview oped by Hermann Snellen, 1834-1908). The let-

ter board was placed on a wall just below eye
To evaluate the verbal fluency of the persons, a short height in a room with normal daylight and a
chocolate tasting task was performed. This proce- spotlight on the board. The number of read-
dure followed a traditional verbal elicitation proce- ing errors in the last line (visual acuity 1.0) was
dure as normally used in sensory evaluation to define counted by the test leader for each eye sepa-
attribute scales (see ISO 13299 [15]. The task was rately and for both eyes.
to taste three different chocolates and write down
the experience (taste, texture, look etc.). It was an 2. The stereopsis test was performed to check the
individual task where the persons had 5 minutes to persons depth vision. The test was performed
write down as many words they found relevant to sitting at a table in normal daylight using the
describe each chocolate. The results were discussed RANDOT° R
test method [28]. The random dot
briefly in the group for the purpose of observing in- diagrams were not indicated to the person be-
dividual behavior and group interaction. Assessors fore he or she was wearing the Polaroid glasses.
were evaluated with regards to their skill at describ- The person should wear glasses or contact lenses
ing what their perceived and experienced as well as if needed behind the Polaroid glasses. The per-
their interaction in the group with regards to devel- son was asked to hold the random dot diagram
oping attribute scales. in an arms length and tell the test leader what
he/she could see. The most critical readable di-
2.2.1. Audiometry agram was reported by its related second of arc
Standard pure tone audiometry was performed ac- value.
cording to ISO 8253-1 [12] (Ascending method, 5 dB
step size, 2 out of 3 identical levels) for the frequen- 3. A color blindness test was performed using Ishi-
cies: 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, 4000 Hz hara color test [27] for testing red-green color
and 8000 Hz). The test was performed manually us- deficiencies. A number of pictures (Ishihara
ing calibrated Sennheiser HDA 200 headphones and plates) containing color dots were displayed to
Interacoustics AD 229e audiometer. the person and the task is to read the picture
correct. The pictures show numbers or paths
2.2.2. Loudness test
The test was performed on a PC with headphone and for a color blind person these pictures don’t
playback. The triangle test method described in ISO show anything structured. This test was per-
formed under the same conditions as the stere-
4120 [16] was implemented in Labview code and used
for the test. The sound stimuli were a reference pink opsis test.
noise sample adjusted in loudness levels: 0 dB, 1
dB and 3 dB. The acoustic output level from the Finally, a short personal interview was held with
headphones was adjusted to a most comfortable level each person. The primary target of the interview
by the test person. was to establish an impression of motivation, expe-
rience and area of interest (visual, audio or both).
Prior to the test, the panel leader instructed the per- The interview was kept in an informal atmosphere
sons in the method and supervised the first six famil- and would also serve the purpose that the person
iarization triads in the test. The presentation order could ask further about the work as an assessor.
of the triads in the real test was randomized in a
double blinded way. 2.3. Stage I selection Criteria
The selection criteria were not purely based on the
2.2.3. Visual tests physiological test results. Due to the importance
Three tests were chosen to test the visual capabil- of appropriate personality in group work (consensus
ities of the person. The tests were designed by an language development), one of the main selection
optometrist and are common tests chosen for their criteria was the impression of personality and moti-
relevance in testing the visual physiology. vation/enthusiasm.
1. Visual acuity test was performed standing 4.5 However the persons should preferably have normal
meters away from Snellen letter board (devel- hearing, vision and tactile sensitivity defined as:

Page 6 of 17
Vision • Speech codec test

• Audio codec test
1. Maximum of two errors per individually tested
eye when reading last line in 1.0 Visual acuity • Picture compression test
test. If more errors are made it can potentially
just be a matter of corrective glasses. • Picture brightness test
2. Stereopsis better than 250 seconds of arc. The session started with a follow up group discussion
Preferably better than 50 seconds of arc. on the chocolate tasting task performed in Qs1. The
intention was to introduce the potential assessors to
3. No colour deficiency the work process of word elicitation and language de-
velopment methods. The task was to subgroup a list
Hearing of 20 descriptive words on chocolate collected from
Qs1 into three groups. The three subgroups should
1. The person should have an audiogram showing be well defined by the persons and there should be
equal or less than 15 dB HL for all frequencies. consensus about the meaning of each of the descrip-
However for one frequency per ear 20 dB HL tors. One subgroup of words was picked out and
were allowed. the panel would define one scale including two an-
chor point labels to rate chocolate. This work had
2. Loudness test should be 100 % correct. two purposes: to introduce the potential assessor to
panel work and to get an impression of group inter-
It was not a criterion, that the persons should com- action and social skills.
ply with both the vision and hearing criteria, but it
The four perceptual tests were performed at differ-
was highly desirable.
ent computer work stations in the listening room.
The visual criteria were specified by the optometrist To minimize disturbance between test persons, sep-
designing the tests. The audiogram criterion is gen- aration walls were put up. The persons rotated be-
erally accepted as normal hearing threshold devia- tween the work stations. Hearing protectors were
tions. provided for persons performing the visual tests to
eliminate the audible sound transmission from the
Twenty-two persons qualified from Qualification
open headphones used by persons performing the
stage I and were invited to Qualification stage II
listening tests.
(Qs2). The selected persons gave the impression of
being suited for panel work and group discussions 2.4.1. Perceptual screening tests
and showed a high level of motivation and enthusi- The four perceptual tests are explained in this sec-
asm. 17 persons complied with the criteria for nor- tion. Each test had its own dedicated computer to
mal vision, and 19 persons complied with the criteria eliminate any variations of tests material across test
for normal hearing. Thirteen persons complied with persons. The tests were performed sitting at a ta-
the criteria for both normal vision and hearing. ble with a screen distance of approximately 50 cm
from eye. The ambient light measured perpendicular
Due to drop outs from two persons prior to Qs2, the
to the computer screens was 110 Lux. The acous-
final selected group consisted of 20 persons.
tic output levels from the Sennheiser HD 650 head-
2.4. Qualification stage II phones used for the two listening tests were adjusted
The purpose of Qualification stage II was to test the to a calibrated most comfortable level. The listening
persons ability to perform perceptual tests within room complied with NR 10 background noise level.
the sound and visual domains. The persons were
For the purpose of these tests the triangle test was
invited in groups of 3 - 5 to perform the tests. The
chosen to overcome problems associated with the
perceptual tests were performed individually.
pair comparison method, as employed in [26], where
The second qualification stage contained the follow- a specific attribute should be assessed by the as-
ing tasks/tests sessor. Considering that assessors in this screening

Page 7 of 17
experiment are not familiar with complex attributes Group Difficulty Sample A Sample B
such as speech quality or spatial quality, the trian- H1 Intro 10 kbps PCM
gle test offers a more understandable task, i.e. to H2 * 17 kbps PCM
identify which of the three samples is different. Ad- H3 ** 10 kbps 17 kbps
ditionally, the statistical analysis of the triangle test H4 *** 12 kbps 17 kbps
is robust and absolute, reported in terms of percent-
age correct. Table 3: Stimulus description of the AMR (adap-
tive multirate) narrowband codec stimuli employed
Each test was designed to start with two easily dis- in the speech codec screening test.
criminative samples presented in six balanced tri-
ads to work as an introduction. According to the
ISO 4120 [16] triangle test standard method there Group Difficulty Sample A Sample B
should be six triads for each pair of stimuli. The tri- H1 Intro 24 kHz 64 kbps PCM
ads have the following balanced stimuli order: ABB, H2 * 32 kHz 80 kbps PCM
AAB, ABA, BAA, BBA, BAB. The presentation se- H3 ** 32 kHz 96 kbps PCM
quence of stimuli pairs was the same for all persons H4 *** 32 kHz 112 kbps PCM
and the reason for having the same presentation or-
der with increasing task difficulty was to ensure the Table 4: Stimulus description of the MP3 codec
same learning effect for each person. The presen- stimuli employed in the audio codec screening test.
tation order of the triads within each stimulus pair
was randomized double blinded by the test software.
In the following description of test material the task xxx 50 = 50 %, using a JPEG standard 4:2:2
difficulty is indicated by number of stars (*). format. See table 5 for details of the stimulus
parameters.
Speech codec test Stimuli were constructed using
Nokia Multimedia Converter and QuickTime Picture brightness test Images with the size of
Pro. AMR (adaptive multirate) narrowband 800 x 600 pixels were constructed in Corel
codec was used. The file bit rates in kbps Photo Paint X3, as illustrated in Figure 13 and
listed by the Nokia software is noted as the sam- 14. In the center of the images a square of 101
ple name. The original speech sample was ex- x 101 pixels of different brightness was placed
tracted from the original Danish speech intelli- (See examples in Figures 13 - 14. Brightness is
gibility test CD known as Dantale II [8]. See in this test expressed by RGB constants (Red,
table 3 for details of the stimulus parameters. Green, Blue) changes. For example: 100 105
has a background grey color of R = 100, G =
Audio codec test Stimuli were constructed us- 100, B = 100 and the centred smaller square
ing Adobe Audition 1.5 with Fraunhofer MP3 is R = 105, G = 105, B = 105 which is per-
codec. Constant bit rate was used. The origi- ceived as brighter than the surrounding outer
nal music sample was extracted from the com- grey color. See table 6 for details of the stimu-
mercially available AES CD ’Perceptual Audio lus parameters.
Coders: What to Listen For’ [1], Track 92 by
Brian Gilmour. In all 4 groups, one sample was
always the original PCM wave file. The sam- 2.5. Software
pling frequency and compression rate is noted The triangle test software was built in Labview 7.1.
as the sample name. See table 4 for details of Sound stimuli were all in PCM wave format. Picture
the stimulus parameters. stimuli were in JPG format. The cross fade time be-
tween sound switching was 50 ms. The cross fade
Picture compression test Picture compressions processing when switching between stimuli within
of original ITU-R BT.802-1 [18] test material. each triad was done linearly and calculated by the
The test images were JPG compressed using Ir- software in real time. Direct switching was used
fanView version 4.20. The quality is noted as when changing between pictures in the visual tests.

Page 8 of 17
Group Difficulty Sample A Sample B

V1 Intro Clown 20 Clown 100
V2 * Clown 50 Clown 100
V3 ** Clown 65 Clown 100
V4 *** Clown 90 Clown 100
V5 ** Lily Pond 50 Lily Pond 100
V6 *** Lily Pond 65 Lily Pond 100
V7 **** Lily Pond 90 Lily Pond 100
Table 5: Stimulus description of the JPEG image

quality levels in the picture compression screening
test.
Group Difficulty Sample A Sample B

V1 Intro 100 105 100 100
V2 * 100 103 100 100
V3 ** 100 102 100 100
V4 *** 100 101 100 100
V5 * 200 197 200 200 Fig. 2: Example graphical user interface for the
V6 ** 200 198 200 200 audio screening test.
V7 *** 200 199 200 200
Table 6: Stimulus description of the JPEG image

quality levels in the picture compression screening
test.
Randomization of triad presentation order within

each stimulus group was performed double blinded
by the software by using Labview’s built in Random
number generator. Examples of the graphical user
interfaces employed for both the audio and picture
tests are illustrated in Figures 2 and 3 respectively.
The individual output data file contained informa-
tion on:
• Date and time
• Project ID
• Test person name and ID number
• Experimental parameters: soundgroup, triad

ID, sound assigned to button, correct stimulus
identification indication
Fig. 3: Example graphical user interface for the
• Performance parameters: amount of switching visual screening test.
between stimuli for each triad, response time in
seconds, global time.

Page 9 of 17
2.6. Stage II selection criteria per eye in the 1.0 visual acuity test, but correction
The selection criteria for the assessor panel were glasses are considered to compensate.
purely based on performance in the perceptual tests.
The selected assessors audiograms showed equal or
Based on feedback from the persons during Qs2 and
less than 15 dB HL for all frequencies. Person 3, 4,
by inspecting the results it was clear, that the speech
5 and 18 exceeded this level, but by age correcting
and audio codec tests were more challenging than
the results according to ISO 7029 [11] only person
the visual tests. This led to the following selection
number 4 fails on his left ear with a hearing threshold
criteria:
of 40 dB HL but it might have been due to poor
fitting of the headphones.
• Picture compression and brightness tests should
All selected assessors except person 6 and 8 had a
be performed with at least 90 % correct re-
score of minimum 90 % correct for the perceptual
sponses in each test to qualify for visual panel
visual tests. All but person 3 and 18 had a score of
work.
minimum 75 % correct for the sound tests in Qs2.
• Audio and speech codec tests should be per- Persons 1, 2, 4, 5, 7, 9, 10, 13, 14, 16 and 20 fulfilled
formed with at least 75 % correct responses in the criteria for both visual and sound tests based on
each test for audio panel work. Qs2 criteria (See Figure 8).
A summary of the average selected assessor perfor-
The final selection was thus based on the average mance in each test is presented in Figures 9 and 10.
score for each subject in each of the screening tests.
4. CONCLUSION
The performance of the 20 individual assessors eval- This assessor selection procedure defined here, al-
uated in Qs2 is illustrated in Figures 4 - 7. lows for a rapid screening of assessor suitability
For the sound screening tests a wider spread of rat- for listening and viewing quality evaluation tests.
ing can be seen in the audio coding task compared The procedure has improved upon previous meth-
to the speech coding tasks. However, in the audio ods though use of the triangle test method, which
coding test, a few assessors were able to obtain 100 can provide robust statistical analysis and allows for
% correct scores (assessors 2, 4, 7, 16), compared to measures of the assessors repeatability as well as al-
the speech coding task, where the highest scores for lowing for comparison to other assessors. Addition-
the most difficult stimulus set is 83 % (assessors 1, ally, the triangle test eliminated the problems asso-
5, 13, 16, 20, 21, 22). ciated with use of line scales and focuses assessors
of the simple task of finding the different sample of
Of the picture quality tasks the picture brightness three. This latter aspect is beneficial for naive asses-
was the easier of the two, with only the most difficult sors who have potentially never performed a listen-
sample pairs causing most of the misidentification ing or viewing test before. The screening can be per-
and 15 assessors obtaining greater than 80 % correct. formed with 6 assessors in approximately four hours,
which means that this type of screening can be read-
3. FINAL SELECTION CRITERIA ily applied when new assessors are to be found.
To qualify for the selected assessor panel the criteria
Once assessors have passed through this process,
defined from qualifications session I and II should be
then can be categories as selected assessors according
fulfilled.
to ISO 8586-2 [14] and are ready for further training
This means that all selected persons should have and assessment for them to be categorized as expert
Danish as mother tongue, and should be between assessors for any given application domain (speech
18 and 50 years old and motivated and available coding, noise suppression, etc.).
for tests during daytime hours. Further the selected
persons should have normal visual acuity, stereopsis 5. FURTHER WORK
better than 250 seconds of arc and no color blind- From this study it can be seen that the performance
ness. Person 6, 9, 13 and 14 had more than 2 errors of assessors is strongly determined by the stimuli

Page 10 of 17
Fig. 4: Individual assessor (N = 20) performance for speech codec tests.
Fig. 5: Individual assessor (N = 20) performance for audio codec tests.

Page 11 of 17
Fig. 6: Individual assessor (N = 20) performance for picture compression tests.
Fig. 7: Individual assessor (N = 20) performance for picture brightness tests.

Page 12 of 17
Fig. 8: Average assessor performance for each screening tests.
Fig. 9: Average assessor performance for all (15) selected assessor in audio screening tests as a function of
the sound group (H1 - H4), as define in Table 3 and 4.

Page 13 of 17
Fig. 10: Average assessor performance for all (15) selected assessor in visual screening tests as a function
of the visual group (V1 - V7), as define in Table 3 and 4.
(a) JPEG compression quality = 20 (b) JPEG compression quality = 100
Fig. 11: Picture compression examples: Clown [6, 18]
selected and their relation with respect to the just expertise of the experimenter in the visual quality
noticeable difference. In this study it was apparent domain. This finding leads to the view that a more
that the visual stimuli where less critical than the systematic approach to selection of stimuli would be
audio stimuli, potentially due to the lack of high level desirable.

Page 14 of 17
(a) JPEG compression quality = 50 (b) JPEG compression quality = 100
Fig. 12: Picture compression examples: Lily pond [6, 18]
(a) Foreground colour RGB 100:100:100 (b) Foreground colour RGB 105:105:105
Fig. 13: Brightness test: Background colour RGB 100:100:100. Yellow arrow is introduced to indicate the
region of difference for both images.
Additionally, it is observed that whilst expertise is towards a more unique definition of stimuli for as-
defined in both ITU-R BS.1116-1 [19] and more ex- sessor screening, leading to a more absolute measure
tensively in ISO 8586-1 [13], an absolute measure of assessor competence and expertise.
of assessor expertise is lacking within the industry.
The performance of assessors is not only associated
6. ACKNOWLEDGEMENTS
with their sensory acuity, experience and training,
The activities and results reported in this paper have
but also by the stimuli with respect to the thresh-
been co-funded by the Danish Agency for Science,
old of perception. In order to sharpen the definition
Technology and Innovation. The assessors involved
of assessor expertise, it is suggested that we work
in this work thanked for their time and effort for

Page 15 of 17
(a) Foreground colour RGB 200:200:200 (b) Foreground colour RGB 195:195:195
Fig. 14: Brightness test: Background colour RGB 200:200:200. Yellow arrow is introduced to indicate the
region of difference for both images.
participating in this study. [8] Hansen, M., and Ludvigsen, C. Dantale II -

danish hagermann sentences, danish speech au-
7. REFERENCES diometry materials. Tech. rep., Værløse, Den-
mark, 2001.
[1] Audio Engineering Society. Perceptual au-
dio coders: What to listen for. Compact Disc, [9] Hansen, V. Establishing a panel of listeners
2001. at Bang & Olufsen: A report. Proceedings of
the Symoposium on Perception of Reproduced
[2] Bech, S. Selection and training of subjects
Sound (1987), 89–90.
for listening tests on sound-reproducing equip-
ment. Journal of the Audio Engineering Society [10] Isherwood, D., Lorho, G., Mattila, V.-
40, 7/8 (1992), 590–610. V., and Zacharov, N. Augmentation, ap-
plication and verification of the generalized lis-
[3] Bech, S. Training of subjects for auditory ex-
tener selection procedure. In Proceedings of the
periments. Acta Acustica 1 (1993), 89–99.
115th Convention of the Audio Engineering So-
[4] Bech, S., and Zacharov, N. Perceptual Au- ciety (New York, USA, 2003).
dio Evaluation - Theory, method and applica-
[11] ISO. 7029. Acoustics – Threshold of hearing
tion. Wiley, Chichester, England, 2006.
by air conduction as a function of age and sex
[5] CCITT. Handbook of telephonometry. Interna- for otologically normal persons. International
tional Telecommunications Union, 1992. Organization for Standards, 1984.
[6] Fenimore, C. Mastering and archiving [12] ISO. 8253-1. Acoustics – Audiometric methods
uncompressed digital video test materials. – Part 1: Basic pure tone air and bone conduc-
SMPTE Journal (October 2001). tion threshold audiometry. International Orga-
nization for Standards, 1989.
[7] Ghani, J., Ellermeier, W., and Zimmer,
K. A test battery measuring auditory capabil- [13] ISO. 8586-1. Sensory analysis – General guid-
ities of listening panels. In Proceedings of the ance for the selection, training and monitoring
Forum Acusticum 2005 Congress (Budapest, of assessors – Part 1: Selected assessors. Inter-
Hungary, 2005). national Organization for Standards, 1993.

Page 16 of 17
[14] ISO. 8586-2. Sensory analysis – General guid- Telecommunications Standardization Sector,
ance for the selection, training and monitoring 2000.
of assessors – Part 2: Experts. International
Organization for Standards, 1994. [24] Koivuniemi, K., and Zacharov, N. Unrav-
eling the perception of spatial sound reproduc-
[15] ISO. 13299. Sensory analysis - Methodology - tion: Language development, verbal protocol
General guidance for establishing a sensory pro- analysis and listener training. In Proceedings
file. International Organization for Standards, of the Audio Engineering Society 111th Inter-
2003. national Convention (2001), Audio Engineering
Society.
[16] ISO. 4120. Sensory analysis - Methodology -
Triangle test. International Organization for [25] Lorho, G. Individual vocabulary profiling of
Standards, 2004. spatial enhancement systems for stereo head-
phone reproduction. In Proceedings of the 119th
[17] ISO. 6658. Sensory analysis - Methodology - Convention of the Audio Engineering Society
General guidance. International Organization (New York, USA, October 2005).
for Standards, 2005.
[26] Mattila, V.-V., and Zacharov, N. Gen-
[18] ITU-R. Test Pictures and Sequences for Sub- eralized listener selection (GLS) procedure. In
jective Assessments of Digital Codecs Convey- Proceedings of the 110th Convention of the Au-
ing Signals Produced According to Rec. ITU- dio Engineering Society (Amsterdam, Holland,
R BT.601. International Telecommunications 2001).
Union Radiocommunication Assembly, 1994.
[27] Kanehara Trading Inc. Ishihara’s tests for
[19] ITU-R. Recommendation BS.1116-1, Methods colour deficiency - 38 plates edition. Tokyo,
for the subjective assessment of small impair- Japan, 2005.
ments in audio systems including multichan-
nel sound systems. International Telecommuni- [28] Stereo Optical Co., Inc. RANDOT° R
cations Union Radiocommunication Assembly, Stereotests. Chicago, USA, 1995.

1997.
[29] Wickelmaier, F., and Choisel, S. Selecting
[20] ITU-R. Recommendation BS.1534-1, Method participants for listening tests of multichannel
for the subjective assessment of intermediate reproduced sound. In Proceedings of the 118th
quality level of coding systems. International Convention of the Audio Engineering Society
Telecommunications Union Radiocommunica- (Barcelona,Spain, May 28–31 2005).
tion Assembly, 2003.
[30] Zacharov, N. Subjective testing of loud-
[21] ITU-T. Recommendation P.800, Methods for speaker directivity for multichannel audio. Mas-
subjective determination of transmission qual- ter’s thesis, Helsinki University of Technology,
ity. International Telecommunications Union, Laboratory of acoustics and audio signal pro-
Telecommunications Standardization Sector, cessing, 1997.
1996.
[22] ITU-T. Recommendation P.831, Subjec-

tive performance evaluation of network echo
cancellers. International Telecommunications
Union, Telecommunications Standardization
Sector, 1998.
[23] ITU-T. Recommendation P.832, Subjective

performance evaluation of handsfree termi-
nals. International Telecommunications Union,

Page 17 of 17
View publication stats

Assessor selection process for efficient multisensory evaluation

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Assessor selection process for efficient multisensory evaluation

Încărcat de

Drepturi de autor:

Formate disponibile

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Assessor selection process for multisensory applications

Conference Paper · January 2009

The user has requested enhancement of the downloaded file.

Convention Paper 7788

Assessor selection process for multisensory

Correspondence should be addressed to Søren Vase Legarth (SVL@delta.dk)

1. INTRODUCTION ies [17, 15], where a high degree of sensory acuity is

• audiometry, and are referred to in a number of standards. How-

AES 126th Convention, Munich, Germany, 2009 May 7–10

provides some information regarding how assessors

Training in methods and general principles 2. ASSESSOR SELECTION PROCEDURE

Pre selection Basic questionnaire for evaluation of

2.1. Pre selection

AES 126th Convention, Munich, Germany, 2009 May 7–10

Assessor category Definition

• Personal data Several pre-selection criteria had been formulated to

AES 126th Convention, Munich, Germany, 2009 May 7–10

Table 2: Summary of the web questionnaire.

2.2. Qualification stage I • Verbal fluency

AES 126th Convention, Munich, Germany, 2009 May 7–10

• Personal interview oped by Hermann Snellen, 1834-1908). The let-

AES 126th Convention, Munich, Germany, 2009 May 7–10

Vision • Speech codec test

AES 126th Convention, Munich, Germany, 2009 May 7–10

AES 126th Convention, Munich, Germany, 2009 May 7–10

Group Difficulty Sample A Sample B

Table 5: Stimulus description of the JPEG image

Group Difficulty Sample A Sample B

Table 6: Stimulus description of the JPEG image

Randomization of triad presentation order within

• Date and time

• Test person name and ID number

• Experimental parameters: soundgroup, triad

AES 126th Convention, Munich, Germany, 2009 May 7–10

AES 126th Convention, Munich, Germany, 2009 May 7–10

Fig. 4: Individual assessor (N = 20) performance for speech codec tests.

Fig. 5: Individual assessor (N = 20) performance for audio codec tests.

AES 126th Convention, Munich, Germany, 2009 May 7–10

Fig. 6: Individual assessor (N = 20) performance for picture compression tests.

Fig. 7: Individual assessor (N = 20) performance for picture brightness tests.

AES 126th Convention, Munich, Germany, 2009 May 7–10

Fig. 8: Average assessor performance for each screening tests.

AES 126th Convention, Munich, Germany, 2009 May 7–10

(a) JPEG compression quality = 20 (b) JPEG compression quality = 100

Fig. 11: Picture compression examples: Clown [6, 18]

AES 126th Convention, Munich, Germany, 2009 May 7–10

(a) JPEG compression quality = 50 (b) JPEG compression quality = 100

Fig. 12: Picture compression examples: Lily pond [6, 18]

AES 126th Convention, Munich, Germany, 2009 May 7–10

participating in this study. [8] Hansen, M., and Ludvigsen, C. Dantale II -

AES 126th Convention, Munich, Germany, 2009 May 7–10

cations Union Radiocommunication Assembly, Stereotests. Chicago, USA, 1995.

[22] ITU-T. Recommendation P.831, Subjec-

[23] ITU-T. Recommendation P.832, Subjective

AES 126th Convention, Munich, Germany, 2009 May 7–10

View publication stats

S-ar putea să vă placă și