Sunteți pe pagina 1din 120

Board of Assessment Report 2013: Document 110c - Annex B

EFPA REVIEW MODEL FOR


THE DESCRIPTION AND EVALUATION OF
PSYCHOLOGICAL AND EDUCATIONAL TESTS

TEST REVIEW FORM AND NOTES FOR REVIEWERS


VERSION 4.2.6
EFPA MODEL DE REVIZUIRE PENTRU EVALUAREA I DESCRIEREA PSIHOLOGICE I
EDUCATIONALE TESTE DE TESTARE FORMA DE REVIZUIRE I NOTE PENTRU
REFERENI VERSIUNEA 4.2.6

Version 4.2.6 is a major revision of Version 3-42


(2008) by a task force of the Board of Assessment
of EFPA consisting of:
Arne Evers (chair, the Netherlands)
Carmen Hagemeister (Germany)
Andreas Hstmlingen (Norway)
Patricia Lindley (UK)
Jos Muiz (Spain)
Anders Sjberg (Sweden)

EFPA
Users of this document and its contents are required by EFPA to acknowledge this source with the following text:
The EFPA Test Review Criteria were largely modelled on the form and content of the British
Psychological Society's (BPS) test review criteria and criteria developed by the Dutch Committee on
Tests and Testing (COTAN) of the Dutch Association of Psychologists (NIP). EFPA is grateful to the
Test Review Form Version 4.2.6

09-04-2013

Page 1

Board of Assessment Report 2013: Document 110c - Annex B


BPS and the NIP for permission to build on their criteria in developing the European model. All
intellectual property rights in the original BPS and NIP criteria are acknowledged and remain with those
bodies.

Test Review Form Version 4.2.6

09-04-2013

Page 2

Board of Assessment Report 2013: Document 110c - Annex B

CONTENTS
1

Introduction

PART 1

DESCRIPTION OF THE INSTRUMENT

General description

Classification

Measurement and scoring

Computer generated reports

Supply conditions and costs

14

PART 2
7

5
6

EVALUATION OF THE INSTRUMENT

16

Quality of the explanation of the rationale, the presentation and the


information provided

20

7.1 Quality of the explanation of the rationale

23

7.2 Adequacy of documentation available to the user

26

7.3 Quality of procedural instructions provided for the user


8

26

Quality of the test materials

26

8.1 Quality of the test materials of paper-and-pencil tests

28

8.2 Quality of the test materials of Computer Based Tests (CBT) or


Web Based Tests (WBT)
9

31
31

Norms
9.1 Norm-referenced interpretation

33

9.2 Criterion referenced interpretation

33

10 Reliability
11

38

Validity

43

11.1 Construct validity

53

11.2 Criterion validity

54

11.3 Overall validity

58

12 Quality of computer generated reports

61

13 Final evaluation

62

PART 3 BIBLIOGRAPHY

66

APPENDIX An aide memoire of critical points for comment when an


instrument has been translated and/or adapted from a non-local
context

68

Test Review Form Version 4.2.6

09-04-2013

72

Page 3

Board of Assessment Report 2013: Document 110c - Annex B

Test Review Form Version 4.2.6

09-04-2013

Page 4

Board of Assessment Report 2013: Document 110c - Annex B

1 Introduction
The main goal of the EFPA Test Review Model is to provide a description and a detailed and rigorous
assessment of the psychological assessment tests, scales and questionnaires used in the fields of Work,
Education, Health and other contexts. This information will be made available to test users and
professionals in order to improve tests and testing and help them to make the right assessment decisions.
The EFPA Test Review Model is part of the information strategy of the EFPA, which aims to provide
evaluations of all necessary technical information about tests in order to enhance their use (Evers et al.,
2012; Muiz & Bartram, 2007). Following the Standards for Educational and Psychological Testing the
label test is used for any evaluative device or procedure in which a sample of examinees behaviour in
a specified domain is obtained and subsequently evaluated and scored using a standardized process
(American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education, 1999, p. 3). Therefore, this review model applies to all instruments that are
covered under this definition, whether called a scale, questionnaire, projective technique, or whatever.

Introducere Scopul principal al revizuirii modelului EFPA de testare este de a oferi o descriere i
o analiz detaliat i riguroas a-psihologice Evalurii testelor de evaluare, scale i chestionare
utilizate n domeniile de munc, educaie, sntate i alte contexte. Aceste informaii vor fi puse
la dispoziie pentru a testa utilizatorilor i profe-, n scopul de a ca profesionitii din
mbuntirea testelor i de testare i a le ajuta s ia deciziile de evaluare corecte. EFPA Test de
revizuire a modelului face parte din strategia de informare a EFPA, care i propune s ofere
evalurile efectuate-tiile de toate informaiile tehnice necesare cu privire la teste, n scopul de a
spori utilizarea acestora (Evers i colab, 2012;. Muiz & Bartram, 2007). Ca urmare a
standardelor de testare educaionale i psihologice testul eticheta este folosit pentru orice "...
dispozitiv evaluativ sau procedur n care se obine un eantion de comportament candidat la
examen ntr-un domeniu specific-ficate i evaluate ulterior i a marcat printr-un proces
standardizat" (Cercetare american de nvmnt Asociatia, american Psychological Association,
i Consiliul Naional pentru msura-ment n Educaie, 1999, p. 3). Prin urmare, acest model de
revizuire se aplic tuturor instrumentelor care sunt cuprinse n aceast definitie.
The original version of the EFPA test review model was produced from a number of sources,
including the BPS Test Review Evaluation Form (developed by Newland Park Associates Limited, NPAL,
and later adopted by the BPS Steering Committee on Test Standards); the Spanish Questionnaire for the
Evaluation of Psychometric Tests (developed by the Spanish Psychological Association) and the Rating
System for Test Quality (developed by the Dutch Committee on Tests and Testing of the Dutch Association
of Psychologists). Much of the content was adapted with permission from the review proforma originally
developed in 1989 by Newland Park Associates Ltd for a review of tests used by training agents in the UK
(see Bartram, Lindley & Foster, 1990). This was subsequently used and further developed for a series of
BPS reviews of instruments for use in occupational assessment (e.g., Bartram, Lindley, & Foster, 1992;
Lindley et al., 2001).

Versiunea original a modelului de test de revizuire EFPA a fost produs dintr-un numr de surse,
revizuire a testului Formularul de evaluare BPS-ing inclu (dezvoltat de Newland Park Associates
Limited, LANP, i ulterior adoptat de ctre Comitetul de Coordonare BPS privind standardele de
testare); Chestionarul spaniol pentru evaluarea testelor psihometrice (dezvoltat de ctre
Psychological Association spaniol) i sistemul de rating pentru testare a calitii (elaborat de
Comitetul olandez privind testele i Testarea olandeze Associa-TION a psihologilor). O mare
parte din coninutul a fost adaptat cu permisiunea de revizuire proforma origi-nally dezvoltat in
Test Review Form Version 4.2.6

09-04-2013

Page 5

Board of Assessment Report 2013: Document 110c - Annex B

1989 de catre Newland Park Associates Ltd pentru o revizuire a testelor utilizate de ctre agenii
de formare din Marea Britanie (a se vedea Bartram, Lindley & Foster, 1990). Acesta a fost
utilizat ulterior i dezvoltat n continuare pentru o serie de comentarii BPS de instrumente pentru
a fi utilizate n evaluarea ocupaional (de exemplu, Bartram, Lindley, & Foster, 1992; Lindley i
colab., 2001).
Afiai originalul

The first version of the EFPA review model was compiled and edited by Dave Bartram (Bartram,
2002a, 2002b) following an initial EFPA workshop in March 2000 and subsequent rounds of consultation.
A major update and revision was carried out by Patricia Lindley, Dave Bartram, and Natalie Kennedy for
use in the BPS review system (Lindley et al, 2004). This was subsequently adopted by EFPA in 2005
(Lindley et al., 2005) with minor revisions in 2008 (Lindley et al., 2008). The current version of the model
has been prepared by a Task Force of the EFPA Board of Assessment, whose members are Arne Evers
(Chair, the Netherlands), Carmen Hagemeister (Germany), Andreas Hstmlingen (Norway), Patricia
Lindley (UK), Jos Muiz (Spain), and Anders Sjberg (Sweden). In this version the notes and checklist
for translated and adapted tests produced by Pat Lindley and the Consultant Editors of the UK test
reviews have been integrated (Lindley, 2009). The texts of some major updated passages are based on
the revised Dutch rating system for test quality (Evers, Lucassen, Meijer, & Sijtsma, 2010; Evers, Sijtsma,
Lucassen, & Meijer, 2010).).

Prima versiune a modelului de revizuire EFPA a fost compilat i editat de Dave Bartram
(Bartram, 2002a, 2002b), ca urmare a unui atelier de lucru EFPA iniial n martie 2000 i runde
ulterioare de consultare. O actualizare major i revizuire a fost realizat de Patricia Lindley,
Dave Bartram, i Natalie Kennedy pentru a fi utilizate n sistemul de revizuire BPS (Lindley et al,
2004). Acest lucru a fost adoptat ulterior de EFPA n 2005 (Lindley et al., 2005), cu revizuiri
minore n 2008 (Lindley et al., 2008). Versiunea actual a modelului a fost elaborat de ctre un
grup operativ al Consiliului EFPA de evaluare, ai crei membri sunt Arne Evers (preedinte,
rile de Jos), Carmen Hagemeister (Germania), Andreas Hstmlingen (Norvegia), Patricia
Lindley (UK), Jos Muiz (Spania), Anders Sjberg (Suedia). n aceast versiune notele i lista
de verificare pentru testele traduse i adaptate, produse de Pat Lindley i Editors consultant al
testelor din Marea Britanie re-vederi au fost integrate (Lindley, 2009). Textele unor pasaje majore
actualizate sunt bazate pe sistemul de rating olandez revizuit pentru testare a calitii (Evers,
Lucassen, Meijer, & Sijtsma, 2010; Evers, Sijtsma, Lu-Cassen, & Meijer, 2010).).
Afiai originalul

The EFPA test review model is divided into three main parts. In the first part (Description of the
instrument) all the features of the test evaluated are described in detail. In the second part (Evaluation of
the instrument) the fundamental properties of the test are evaluated: Test materials, norms, reliability,
validity, and computer generated reports, including a global final evaluation. In the third part
(Bibliography), the references used in the review are included.

Modelul de test de revizuire EFPA este mprit n trei pri principale. n prima parte (Descrierea
in strument) toate caracteristicile testului evaluate sunt descrise n detaliu. n a doua parte
(Evaluarea instrumentului) sunt evaluate proprietile fundamentale ale testului: Materiale de
Test Review Form Version 4.2.6

09-04-2013

Page 6

Board of Assessment Report 2013: Document 110c - Annex B

ncercare, norme, fiabilitate, va-bilitate, generate de calculator i rapoarte, inclusiv o evaluare


final la nivel mondial. n a treia parte (Bibliografie), sunt incluse referinele utilizate n cadrul
reexaminrii.

As important as the model itself is the proper implementation of the model. The current version of the
model is intended for use by two independent reviewers, in a peer review process similar to the usual
evaluation of scientific papers and projects. A consulting editor will oversee the reviews and may call in a
third reviewer if significant discrepancies between the two reviews are found. Some variations in the
procedure are possible, whilst ensuring the competence and independence of the reviewers, as well as
the consulting editor. EFPA recommends that the evaluations in these reviews are directed towards
qualified practising test users, though they should also be of interest to academics, test authors and
specialists in psychometrics and psychological testing.

La fel de important ca i modelul n sine este punerea n aplicare corect a modelului. Versiunea
actual a modelului este destinat utilizrii de ctre doi recenzori independeni, ntr-un proces de
evaluare inter pares similar cu evaluarea obinuit a documentelor i proiectelor tiinifice. Un
editor de consultanta va supraveghea comentarii i pot apela ntr-un al treilea examinator n cazul
n care se constat discrepane semnificative ntre cele dou reexaminri. Anumite variaii ale
pro-ceduri sunt posibile, s asigure competena i independena recenzorilor, precum editorul de
consultan n acelai timp. EFPA recomand ca evalurile n aceste evaluri sunt direcionate
ctre utilizatori de testare care practic calificat, dei acestea ar trebui s fie, de asemenea, de
interes pentru academicieni, autori de testare i specialiti n domeniul psihometriei i testarea
psihologic.
Afiai originalul

Another key issue is the publication of the results of a tests evaluation. The results should be
available for all professionals and users (either paid or for free). A good option is that results are available
on the website of the National Psychological Association, although they could also be published by third
parties or in other media such as journals or books.
The intention of making this model widely available is to encourage the harmonisation of review
procedures and criteria across Europe. Although harmonisation is one of the objectives of the model,
another objective is to offer a system for test reviews to countries which do not have their own review
procedures. It is realized that local issues may necessitate changes in the EFPA Test Review Model or in
the review procedures when countries start to use the Model. Therefore, the Model is called a Model to
stress that local adaptations are possible to guarantee a better fit with local needs.
Comments on the EFPA test review model are welcomed in the hope that the experiences of users
will be instrumental in improving and clarifying the processes.

O alt problem-cheie este publicarea rezultatelor evalurii unui test de. Rezultatele ar trebui s
fie disponibile pentru toi profesionitii i utilizatorii (fie pltit sau gratuit). O opiune bun este
c rezultatele sunt disponibile pe website-ul Asociaiei Naionale de Psihologie, cu toate c
Test Review Form Version 4.2.6

09-04-2013

Page 7

Board of Assessment Report 2013: Document 110c - Annex B

acestea ar putea fi, de asemenea, publicate de ctre teri sau n alte medii, cum ar fi reviste sau
cri. Intenia de a face acest model disponibil pe scar larg este de a ncuraja armonizarea
procedurilor de revizuire i a criteriilor n ntreaga Europ. Cu toate c armonizarea este unul
dintre obiectivele modelului, un alt obiectiv este acela de a oferi un sistem de comentarii de
testare pentru rile care nu au propriile lor proceduri de revizuire. Se realizeaz c problemele
locale pot necesita modificri ale EFPA Test de revizuire a modelului sau n procedurile de atac,
atunci cnd rile ncep s utilizeze modelul. Prin urmare, modelul este numit un model pentru a
sublinia faptul c adaptri locale sunt posibile pentru a garanta o mai bun potrivire cu nevoile
locale. Comentarii cu privire la modelul de test de revizuire EFPA sunt binevenite, n sperana c
experienele utilizatorilor vor avea un rol esenial n mbuntirea i clarificarea proceselor.
Afiai originalul

Test Review Form Version 4.2.6

09-04-2013

Page 8

Board of Assessment Report 2013: Document 110c - Annex B

PART 1

Test Review Form Version 4.2.6

DESCRIPTION OF THE INSTRUMENT

09-04-2013

Page 9

Board of Assessment Report 2013: Document 110c - Annex B

2 General description
This section of the form should provide the basic information needed to identify the instrument and where
to obtain it. It should give the title of the instrument, the publisher and/or distributor, the author(s),
the date of original publication and the date of the version that is being reviewed.
The questions 2.1.1 through 2.7.3 should be straightforward. They are factual information, although some
judgment will be needed to complete information regarding content domains.

PARTEA 1 DESCRIEREA INSTRUMENT 2 Descrierea general Aceast seciune a


formularului ar trebui s furnizeze informaiile de baz necesare pentru identificarea
instrumentului i unde s-l obin. Acesta ar trebui s dea titlul instrumentului, editorul i / sau
distribuitor, autorul (autorii), data publicrii iniiale i data versiunii care este n curs de
examinare. Cu privire la ntrebrile 2.1.1 prin intermediul 2.7.3 ar trebui s fie simpl. Ele sunt
informaii concrete, cu toate c unele hotrri vor fi necesare pentru a finaliza informaii cu
privire la domeniile de coninut.

Reviewer1

RECENZOR

Date of current review

DATA CURENTA

Date of previous review (if applicable)2

DATA REVIZUIRII ANTERIOARE

2.1.1

Instrument name (local version)

NUMELE INSTRUMENTULUI

2.1.2

Shortname of the test (if applicable)

2.2

Original test name (if the local version is an


adaptation)

PRESCURTAREA TESTULUI(DACA SE
APLICA)
NUMELE ORIGINAL AL TESTULUI (DACA
VERSIUNEA LOCALA ESTE O ADAPTARE)

2.3

Authors of the original test

AUTORII TESTULUI ORIGINAL

2.4

Authors of the local adaptation

AUTORII ADAPTARII LOCALE

2.5

Local test distributor/publisher

EDITORUL VERSIUNII LOCALE

2.6

Publisher of the original version of the test (if


different to current distributor/publisher)

2.7.1

Date of publication of current revision/edition

EDITORUL VERSIUNII ORIGINALE(IN


CAZUL IN CARE ESTE DIFERIT DE CEL
FOLOSIT CURENT)
DATA PUBLICARII REVIZUIRII CURENTE

2.7.2

Date of publication of adaptation for local use

2.7.3

Date of publication of original test

DATA PUBLICARII ADAPTARII PENTRU


VERSIUNEA LOCALA
DATA PUBLICARII TESTULUI ORIGINAL

Each country can decide either to publish the reviewers names when the integrated review is published or to opt for
anonymous reviewing.
2
This information should be filled in by the editor or the administration.
Test Review Form Version 4.2.6

09-04-2013

Page 10

Board of Assessment Report 2013: Document 110c - Annex B

Test Review Form Version 4.2.6

09-04-2013

Page 11

Board of Assessment Report 2013: Document 110c - Annex B

General description of the instrument Short stand-alone non-evaluative description (200-600 words)
A concise non-evaluative description of the instrument should be given here. The description should
provide the reader with a clear idea of what the instrument claims to be - what it contains, the scales it
purports to measure etc. It should be as neutral as possible in tone. It should describe what the
instrument is, the scales it measures, its intended use, the availability and type of norm groups, general
points of interest or unusual features and any relevant historical background. This description may be
quite short (200-300 words). However, for some of the more complex multi-scale instruments, it will need
to be longer (300-600 words). It should be written so that it can stand alone as a description of the
instrument. As a consequence it may repeat some of the more specific information provided in response
to sections 2 6. It should outline all versions of the instrument that are available and referred to on
subsequent pages.
This item should be answered from information provided by the publisher and checked for accuracy by
the reviewer.

Test Review Form Version 4.2.6

09-04-2013

Page 12

Board of Assessment Report 2013: Document 110c - Annex B

Descrierea generala a insrtumentului o scurta descriere non-evaluativa (200-600 de cuvinte)


Trebuie sa contina O descriere scurta,concisa non-evaluativa a instrumentului. Descrierea
trebuie sa-i furnizeze cititorului o idee clara a instrumentului si ce pretinde a fi ce contine,
scalele folosite si la ce ar trebui sa fie folosite. Trebuie sa fie cat se poate de neutra. Ar trebui sa
descrie care este instrumentul,ce scale ar trebui sa masoare,utilizarea prevazuta,disponibilitatea
si tipul de norme ale grupurilor,puncte de interes general sau caracteristici neobisnuite sau orice
istoric relevant. Pentru instrumentele cu scale mai complexe sunt necesare 300-600 de cuvinte.

Ar trebui s fie scris astfel nct s poate fi informatie de sine stttoare, ca o descriere tablou.
In consecinta, pot repeta unele din mai multe informaii specifice prezente in sectiunile 2-6.
Ar trebui sa sublinieze toate versiunile de tabloul care sunt disponibile i menionate pe paginile
urmtoare.
Acest element trebuie s se rspund la informaiile furnizate de productorul produsului i
verificate pentru precizia prin referentului.

Test Review Form Version 4.2.6

09-04-2013

Page 13

Board of Assessment Report 2013: Document 110c - Annex B

3 Classification
3.1

Content domains (select all that apply)


You should identify the content domains
specified by the publisher. Where these
are not clear, this should be indicated and
you should judge from the information
provided in the manual (standardisation
samples, applications, validation etc.)
what the most appropriate answers are for
3.1.

Domenii de coninut (selectai toate


care
se
aplic)
trebuie s identifice domeniile de
coninut specificate de editor. Daca
acestea nu sunt clare, aceasta trebuie
s fie indicate i trebuie s judece din
informaiile furnizate de manual
(eantioane de standardizare, aplicaii,
validare etc.) ce rspunsurile sunt cele
mai adecvate pentru 3.1.

3.2

Intended or main area(s) of use (please


select those that apply)
You should identify the intended areas of
uses specified by the publisher. Where
these are not clear, this should be
indicated and you should judge from the
information provided in the manual
(standardisation samples, applications,
validation etc) what the most appropriate
answers are for 3.2.

Destinaia lor sau zon (e) principal de


utilizare (v rugm s selectai cele
Test Review Form Version 4.2.6

Ability General abilitate generala


Ability - Manual skills/dexterity- aptitudini
manuale/dexteritate
Ability Mechanical- mecanica
Ability Learning/memory-invatare/memorie
Ability - Non-verbal/abstract/inductive-verbalanon verbala/abstracta/inductiva
Ability Numerical - numeric
Ability - Perceptual speed/checking - viteza
Ability Sensorimotor senzo-motorie
Ability Spatial/visual spatiala/vizuala
Ability Verbal- verbala
Attention/concentration- atentie/concentare
Beliefs - convingeri
Cognitive styles tipuri cognitive
Disorder and pathology tulburarea si
patologia
Family function activitatea/functia familiei
Group function activitatea/functiei in grup
Interests - interese
Motivation - motivatii
Organisational function, aggregated measures,
climate etc
Personality Trait personalitate- trasatura
Personality Type tip
Personality State - statura
Quality of life calitatea vietii
Scholastic achievement (educational test)realizari scolare ( test educational)
School or educational function (scoala sau
functia educationala)
Situational judgment judecata/aprecierea
Stress/burnout solicitarea la stres
Therapy outcome - rezultatul terapiei
Values valori
Well-being - bunastare
Other (please describe): altele (va rugam
descrieti)

Clinical clinic
Advice, guidance and career choice
sfat,ghidare sau alegerea carierei
Educational - educational
Forensic - juridic
General health, life and well-being
Neurological - neurologic
Sports and Leisure - sport i petrecere

timpului liber
Work and Occupational loc de munca si
ocupatie
Other (please describe):- altele (va rugam
descrieti)

09-04-2013

Page 14

Board of Assessment Report 2013: Document 110c - Annex B

care se aplic) Tu ar trebui s


identifice zonele destinate utilizrilor
specificate de ctre editor. n cazul n
care acestea nu sunt clare, acest lucru
trebuie indicat i ar trebui s judece
din informaiile furnizate n manualul
(probe de standardizare, aplicatii,
validare etc) ce rspunsurile sunt cele
mai potrivite pentru 3.2.

3.3

Description of the populations for


which the test is intended
This item should be answered from
information provided by the publisher.
For some tests this may be very general
(e.g. adults), for others it may be more
specific (e.g. manual workers, or boys
aged 10 to 14). Only the stated
populations should be mentioned here.
Where these may seem inappropriate,
this should be commented on in the
Evaluation part of the review.

3.4

Number of scales and brief description


of the variable(s) measured by the
instrument
This item should be answered from
information provided by the publisher.
Please indicate the number of scales (if
more than one) and provide a brief
description of each scale if its meaning is
not clear from its name. Reviews of the
instrument should include discussion of
other derived scores where these are
commonly used with the instrument and
are
described
in
the
standard
documentation - e.g. primary trait scores
as well as Big Five secondary trait scores
for a multi-trait personality test, or subtest,
factor and total scores on an intelligence
test.

3.5

Descrierea populaiilor pentru care testul este


destinat Acest produs trebuie s se rspund
la infor-informaii furnizate de ctre editor.
Pentru unele teste acest lucru poate fi foarte
general (de exemplu, aduli), pentru alii
poate fi mai specific (de exemplu, muncitori,
sau biei cu vrste cuprinse ntre 10 i 14).
Trebuie declarati numai cei mentionati. n
cazul n care acestea pot prea nepotrivite,
acest lucru ar trebui s fie comentat n partea
de evaluare a revizuirii.

Numrul de scale i o scurt descriere a


variabilei (e) msurat de instrument
Acest produs trebuie s se rspund la
informaiile furnizate de ctre editor. V
rugm s indicai numrul de scale (n cazul
n care mai mult de unul) i s furnizeze o
scurt descriere a fiecrei scri, dac sensul
su nu este clar din numele su. Comentariile
instrumentului ar trebui s includ discuie
despre alte scoruri derivate n cazul n care
acestea sunt utilizate n mod obinuit cu
instrumentul i sunt descrise n documentaia
standard, - de exemplu, scorurile trasatura
primare precum Big Five scoruri trstur
secundar pentru un test de personalitate
multi-trstur, sau subtest, factor i scorurile
totale pe un test de inteligen.

Response mode
This item should be answered from
information provided by the publisher.
If any special pieces of equipment (other
Test Review Form Version 4.2.6

Oral interview interviu oral


Paper & pencil foaie si creion
Manual (physical) operations operatii
manuale

09-04-2013

Page 15

Board of Assessment Report 2013: Document 110c - Annex B


than those indicated in the list of options,
e.g. digital recorder) are required, they
should be described here. In addition, any
special testing conditions should be
described. 'Standard testing conditions'
are assumed to be available for
proctored/supervised assessment. These
would include a quiet, well-lit and wellventilated room with adequate desk-space
and
seating
for
the
necessary
administrator(s) and candidate(s).

Direct observation observatii directe


Computerised - computerizat
Other (indicate): altele (indicate)

Mod de rspuns
Acest produs trebuie s se rspund la
informaiile furnizate de ctre editor.
Daca sunt necesare piese speciale de
echipamente (altele dect cele indicate
n lista de opiuni, de exemplu,
recorder digital), acestea ar trebui s
fie descrise aici. In plus, orice condiii
speciale de testare ar trebui s fie dedescrise. "Condiii de testare standard"
se presupune c sunt disponibile
pentru evaluarea tored-proc /
monitorizat. Acestea ar include o
camer linitit, bine luminat i bine
ventilat cu birou spaiu i scaune
adecvate necesare ad-ministrator (e) i
a candidatului (e).

3.6

Demands on the test taker


This item should be answered from
information provided by the publisher.
Which capabilities and skills are
necessary for the test taker to work on the
test as intended and to allow for a fair
interpretation of the test score? It is
usually clear if a total lack of some
prerequisite impairs the ability to complete
the test (such as being blind and being
given a normal paper-and-pencil test) but
the requirements listed should be
classified as follows:
Irrelevant / not necessary means that
this skill is not necessary at all such as
manual capabilities to answer oral
questions verbally.
Necessary information given means
that the possible amount of limitation is
stated.
Information missing means that there
Test Review Form Version 4.2.6

Manual capabilities (select one)


irrelevant / not necessary
necessary information given
information missing

capabiliti manuale (selectai unul)


relevant / nu este necesar
informaii necesare date irelevante
informaii lips
Handedness (select one)
irrelevant / not necessary irrelevant/ nu este
necesar
necessary information given informatii necesare
date
information missing informatii lipsa
Vision (select one) viziune
irrelevant / not necessary irelevant/ nu este
necesar
necessary information given informatii neceate
dat

09-04-2013

Page 16

Board of Assessment Report 2013: Document 110c - Annex B


might be limitations on test users without
the specific capability or skill (known
from theory or empirical results) but this
is not clear from information provided by
the test publisher e.g. if the test uses
language that is not the test takers first
language.

Solicitrile la testul taker


Acest produs trebuie s se rspund la
informaiile furnizate de ctre editor.
Care capacitile i competenele sunt
necesare pentru ncercare de a lucra
taker la testul conform destinaiei i
pentru a permite o interpretare corect
a punctajului testului? Este clar, de
obicei, dac o lips total de o anumit
condiie afecteaz capacitatea de a
finaliza testul (cum ar fi orb i de a fi
dat un test normal de hrtie i creion),
dar cerinele enumerate ar trebui s fie
clasificate dup cum urmeaz:
"nerelevante / nu necesar "nseamn
c aceast abilitate nu este deloc
necesar - cum ar fi capacitile
manuale pentru a rspunde la
ntrebrile orale verbal. "Informaiile
necesare avnd n vedere" nseamn c
valoarea posibil de prescripie este
declarat. "Informaii lips" nseamn
c ar putea exista limitri ale
utilizatorilor de test, fr capacitate
sau aptitudini specifice (cunoscut din
teorie sau rezultatele empirice), dar
acest lucru nu este clar din informaiile
furnizate de test editorul de ex: n
cazul n care testul folosete un limbaj
care nu este Primul test de limba
taker-lui.
3.7

information missing informati lipsa


Hearing (select one) auzul
irrelevant / not necessary irelevat/ nu este
necesar
necessary information given informatii necesare
data
information missing informatii lipsa
Command of test language (understanding and
speaking) (select one)
irrelevant / not necessary
necessary information given
information missing
Reading (select one) cititul
irrelevant / not necessary irrelevant/ nu este
necesar
necessary information given informatii necesare
date
information missing informatii lipsa
Writing (select one) scrisul
irrelevant / not necessary irrelevant/ nu este
necesar
necessary information given informatii necesare
date
information missing informatii lipsa

Items format (select one)


This item should be answered from
information provided by the publisher.
Two types of multiple choice formats are
differentiated. The first type concerns
tests in which the respondent has to
select the right answer from a number of
alternatives as in ability testing (e.g., a
figural reasoning test). The second type
deals with questionnaires in which there is
no clear right answer. This format requires
Test Review Form Version 4.2.6

Multiple choice (ability testing, or right/wrong)


Number of alternatives: ....
Multiple choice (mixed scale alternatives)
Number of alternatives: .
Likert scale ratings
Number of alternatives: .
Open
Other (please describe)

Variante multiple (testarea capacitii, sau

09-04-2013

Page 17

Board of Assessment Report 2013: Document 110c - Annex B


test takers to make choices between sets
of two or more items drawn from different
scales (e.g., scales in a vocational interest
inventory or a personality questionnaire).
This
format
is
also
called
multidimensional,
because
the
alternatives belong to different scales or
dimensions. In this case it is possible that
the statements have to be ranked or the
most- and least-like-me options be
selected. This format may result in
ipsative scales (see question 3.8).
In Likert scale ratings the test taker also
has to choose from a number of
alternatives, but the essential difference
with the multiple choice format is that the
scales used are unidimensional (e.g.,
ranging from never to always or from
very unlikely to very likely) and that the
test taker does not have to choose
between alternatives from different
dimensions. A scale should also be
marked as a Likert scale when there are
only two alternatives on one dimension
(e.g., yes/no or always/never).

greit dreapta /) Numrul de alternative: ....


Variante multiple (alternative la scar mixte)
Numr de alternative: .... evaluri la scara
Likert Numrul de alternative: .... deschis
Altele (v rugm s descriei)

Format articole (selectai unul) Acest


produs trebuie s se rspund la inforinformaii furnizate de ctre editor.
Exist dou tipuri de formate multiple
variante sunt difereniate. Primul tip se
refer la testele n care prtul trebuie
s se rspund-lect chiar de la un
numr de alternative ca n testarea
capacitii (de exemplu, un test de
raionament figural). Al doilea tip cu
chestionare n care nu exist nici un
rspuns corect clar. Acest format
necesit factorii de ncercare de a face
alegeri ntre seturi de dou sau mai
multe elemente extrase din diferite
scri (de exemplu, cntare ntr-un
profesional nventar interes sau un
chestionar de personalitate). Acest
format este, de asemenea, numit
"multidimensionale", deoarece
Alterna-tive aparin diferitelor scale
sau distanele-SION. n acest caz, este
posibil ca declaraiile trebuie s fie
clasate sau most- i cel mai puin
asemntoare pe mine opiuni s fie
se-tate. Acest format poate avea ca
rezultat scale ipsative (a se vedea
Test Review Form Version 4.2.6

09-04-2013

Page 18

Board of Assessment Report 2013: Document 110c - Annex B

ntrebarea 3.8). n Likert evaluri la


scar taker test are, de asemenea, de a
alege dintr-un numr de Alterna-tani
dar diferena esenial cu formatul de
alegere multipl este c scalele
utilizate sunt unidimensional (de
exemplu, variind de la "niciodat" la
"ntotdeauna" sau de la " foarte neprobabil "la" foarte probabil ") i c
testul tak-er nu trebuie s aleag ntre
Al-ternative din dimensiuni diferite. O
scal ar trebui s fie, de asemenea,
marcat ca fiind o scal Likert, atunci
cnd exist doar dou alternative pe o
singur dimensiune (de exemplu, da /
nu sau al-cai / niciodat).
Afiai originalul

3.8

Ipsativity
As mentioned in 3.7 multiple choice mixed
scale alternatives may result in ipsative
scores. Distinctive for ipsative scores is
that the score on each scale or dimension
is constrained by the scores on the other
scales or dimensions. In fully ipsative
instruments the sum of the scale scores is
constant for each person. Other scoring
procedures can result in ipsativity (e.g.
subtraction of each persons overall mean
from each of their scale scores)

Yes, multiple choice mixed scale alternatives


resulting in partially or fully ipsative scores
Yes, other item formats with scoring
procedures resulting in partially or fully ipsative
scores
No, multiple choice mixed scale alternatives
NOT resulting in ipsative scores
Not relevant

Da, mai multe variante de amestecat


alternative la scar ca rezultat parial sau
total nscris de ipsative
Dup cum sa menionat n 3.7 multiple Da, alte formate cu elemente de notare
prel-durile care rezult parial sau integral
variante alternative la scar mixt
scorurile ipsative
poate avea ca rezultat scoruri ip sative.

Nu,
multiple variante alternative la scar
Distinctiv este nscris de ipsative c
mixte care nu conduc la rezultate ipsative
scorul de pe fiecare scal sau
nu
relevant
dimensiune este limitat de scorurile
la celelalte scale sau dimensiuni. n
instrumente complet ipsative suma
scorurilor de scal este constant
pentru fiecare persoan. Alte proceduri
de notare pot duce la ipsativity (de
exemplu, scderea medie global a
fiecrei persoane din fiecare dintre
scorurile lor la scar)
3.9

Total number of test items and number


of items per scale or subtest

Test Review Form Version 4.2.6

Numrul total de elemente de testare i


numrul de elemente pe scar sau subtest
Acest produs trebuie s se rspund la infor-

09-04-2013

Page 19

Board of Assessment Report 2013: Document 110c - Annex B

3.10

This item should be answered from


information provided by the publisher.
If the instrument has several scales or
subtests, indicate the total number of
items and the number of items for each
scale or subtest. Where items load on
more than one scale or subtest, this
should be documented.

informaii furnizate de ctre editor. n cazul n


care instrumentul are mai multe scale sau
subteste, se indic numrul total de articole i
numrul de elemente pentru fiecare scal sau
subtest. n cazul n care elementele pe
ncrctur mai mult de o scal sau subtest,
acest lucru ar trebui s fie documentate.

Intended mode of use (conditions


under which the instrument was
developed and validated) (select all that
apply)

Open mode: Where there is no direct human


supervision of the assessment session and
hence there is no means of authenticating the
identity of the test-taker. Internet-based tests
without any requirement for registration can be
considered an example of this mode of
administration.

This item is important as it identifies


whether the instrument has been
designed with the intention of it being
used in unsupervised or uncontrolled
administration conditions. Note that usage
modes may vary across versions of a tool.
This item should be answered from
information provided by the publisher and
checked for accuracy.
Note. The four modes are defined in the
International Guidelines on ComputerBased and Internet Delivered Testing
(International Test Commission, 2005, pp.
5-6).

Controlled mode: No direct human supervision


of the assessment session is involved but the
test is made available only to known testtakers. Internet tests will require test-takers to
obtain a logon username and password.
These often are designed to operate on a onetime-only basis.
Supervised (proctored) mode: Where there is a
level of direct human supervision over testtaking conditions. In this mode test-taker
identity can be authenticated. For Internet
testing this would require an administrator to
log-in a candidate and confirm that the test
had been properly administered and
completed.

Modul de utilizare conform destinaiei


(condiiile n care s-a dezvoltat
instrumentul i validate) (selectai
toate care se aplic)
Managed mode: Where there is a high level of
human supervision and control over the testAcest produs este important deoarece
taking environment. In CBT testing this is
identific dac instrumentul a fost
normally achieved by the use of dedicated
proiectat cu intenia de a fi folosit n
testing centres, where there is a high level of
control over access, security, the qualification
condiii
de
admin-admiof test administration staff and the quality and
nesupravegheati sau necontrolate.
technical specifications of the test equipment.
Reinei c modurile de utilizare pot
varia n funcie de versiuni ale unui Mod deschis: n cazul n care nu exist o
instrument. Acest produs trebuie s se supraveghere uman direct a sesiunii de
rspund la informaiile furnizate de evaluare i, prin urmare, nu exist nici un
ctre editor i verificate pentru mijloc de a autentificrii identitatea testuluiacuratee. Not. Cele patru moduri taker. teste bazate pe Internet, fr nici o
sunt definite n liniile directoare cerin de nregistrare poate fi considerat un
internaionale
privind
calculator exemplu al acestui mod de administrare.
bazate pe Internet i Testare Mod de controlat: Nici o supraveghere uman
Pronunat (In-ternaional Comisia de direct a sesiunii de evaluare este implicat,
testare, 2005, pp5-6)
dar testul este pus la dispoziie numai
cunoscute factorii de ncercare. Testele de
internet vor necesita factorii de ncercare
pentru a obine un nume de utilizator i o
parol de conectare. The-SE de multe ori sunt
Test Review Form Version 4.2.6

09-04-2013

Page 20

Board of Assessment Report 2013: Document 110c - Annex B

proiectate s funcioneze pe o baz numai o


singur dat.
Modul supraveghiat (protejat) :n cazul n
care exist un nivel de supraveghere uman
direct asupra condiiilor de efectuare de teste.
n acest mod de identitate de test-taker poate
fi autentificat. Pentru testarea pe Internet
acest lucru ar avea nevoie de un administrator
s v conectai ntr-un candidat i confirm
faptul c testul a fost administrat i trebuie
completat corect
Mod de gestionat: n cazul n care exist un
nivel ridicat de supraveghere uman i
controlul asupra mediului de luare de
testare. n testarea TCC acest lucru se
realizeaz n mod normal, prin utilizarea
unor centre de testare dedicate, n cazul n
care exist un nivel ridicat de control
asupra accesului, securitate, calificarea
personalului din administraia de testare i
de calitate i tehnice specificaiile testului
echipamentelo
3.11

Administration mode(s) (select all that


apply)
This item should be answered from
information provided by the publisher.
If any special pieces of equipment (other
than those indicated in the list of options,
e.g. digital recorder) are required, they
should be described here. In addition, any
special testing conditions should be
described. 'Standard testing conditions'
are assumed to be available for
proctored/ supervised assessment. These
would include a quiet, well-lit and wellventilated room with adequate desk-space
and
seating
for
the
necessary
administrator(s) and candidate(s).

3.12

Time required for administering the


instrument (please specify for each
administration mode)
This item should be answered from
information provided by the publisher.
The response to this item can be broken
down into a number of components. In
most cases, it will only be possible to
provide general estimates of these rather
than precise figures. The aim is to give
the potential user a good idea of the time
Test Review Form Version 4.2.6

Interactive individual administration


Supervised group administration
Computerised locally-installed application
supervised/proctored
Computerised web-based application
supervised/proctored
Computerised locally-installed application
unsupervised/self-assessment
Computerised web-based application
unsupervised/self-assessment
Other (indicate):

Timpul necesar pentru administrarea instrumentului


(v rugm s specificai pentru fiecare mod de
administrare) Acest produs trebuie s se rspund la
informaiile furnizate de ctre editor. Raspunsul la
acest articol poate fi defalcate ntr-un numr de
componente. In cele mai multe cazuri, va fi posibil
numai pentru a furniza estimri generale ale acestora,
mai degrab dect cifre exacte. Scopul este de a oferi
utilizatorului potenial o idee bun a investiiei de timp
asociat cu utilizarea acestui instrument. NU includ
timpul necesar pentru a se familiariza cu instrumentul
n sine. S presupunem c utilizatorul este calificat i

09-04-2013

Page 21

Board of Assessment Report 2013: Document 110c - Annex B


investment associated with using this
instrument. Do NOT include the time
needed to become familiar with the
instrument itself. Assume the user is
experienced and qualified.
Preparation time (the time it takes the
administrator to prepare and set out the
materials for an assessment session;
access and login time for an online
administration).
Administration time per session: this
includes the time taken to complete all the
items and an estimate of the time required
to give instructions, work through example
items and deal with any debriefing
comments at the end of the session.
Scoring: the time taken to obtain the
raw-scores. In many cases this may be
automated.
Analysis: the time taken to carry out
further work on the raw scores to derive
other measures and to produce a
reasonably comprehensive interpretation
(assuming you are familiar with the
instrument).
Again,
this
may
be
automated.
Feedback: the time required to prepare
and provide feedback to a test taker and
other stakeholders.
It is recognised that time for the last two
components could vary enormously depending on the context in which the
instrument is being used. However, some
indication or comments will be helpful.

experimentat.

Preparation:
pregatirea

Administration:
administrarea

Scoring:
punctajul
Analysis:
analiza

Feedback:
Feedback

Timp de preparare (timpul necesar


administratorului s se pregteasc i
s stabileasc materialele necesare
pentru o sesiune de evaluare; AC i
timp de proces de conectare pentru un
admin-istration on-line).
timp de administrare pentru fiecare
sesiune: aceasta include timpul
necesar pentru a finaliza toate
elementele i o estimare a timpului
necesar pentru a da instruciuni, de
lucru prin exemplul elemente i de a
face cu orice comentarii debriefing la
finalul sesiunii.
Punctajul: timpul necesar pentru a
obine prime scoruri. In multe cazuri,
Test Review Form Version 4.2.6

09-04-2013

Page 22

Board of Assessment Report 2013: Document 110c - Annex B

acest lucru poate fi automatizat.


Analiza: timpul necesar pentru a
efectua lucrri suplimentare cu privire
la scorurile brute pentru a obine alte
msuri i pentru a produce o
interpretare cuprinztoare motiv abil
(ca-v familiarizai necesita destul cu
instru-ment). Din nou, acest lucru
poate fi automatizat.
Feedback: timpul necesar pentru a
pregti i de a oferi feedback la un test
de taker i a altor pri interesate
Este recunoscut faptul c timpul
pentru ultimele dou componente pot
varia foarte mult - de-pendinte la
contextul n care este utilizat instrument. Cu toate acestea, unele
indicaii sau comentarii vor fi de
ajutor.

3.13

Indicate whether different forms of the


instrument are available and which
form(s) is (are) subject of this review
Report whether or not there are
alternative versions (genuine or pseudoparallel
forms,
short
versions,
computerised versions, etc.) of the
instrument available and describe the
applicability of each form for different
groups of people. In some cases, different
forms of an instrument are meant to be
equivalent to each other - i.e. alternative
forms. In other cases, various forms may
exist for quite different groups (e.g. a
children's form and an adult's form).
Where more than one form exists,
indicate
whether
these
are
equivalent/alternate forms, or whether
they are designed to serve different
functions - e.g. short and long version;
ipsative and normative version. Also
describe whether or not parts of the whole
test can be used instead of the whole
instrument. If computerised versions do
exist, describe briefly the software and
hardware
requirements.
Note
that
standalone computer based tests (CBT)
and online packages, if available, should
Test Review Form Version 4.2.6

se indic dac diferitele forme ale


instrumentului sunt disponibile i care
formeaz (e) este (sunt) obiectul prezentului
raport de revizuire dac este sau nu exist
versiuni alternative (forme originale sau
pseudo paralele, versiuni scurte
computerizate, ver-SION, etc.) a
instrumentului disponibil i s descrie
aplicabilitatea fiecrei forme pentru diferite
grupuri de oameni. n unele cazuri, diferitele
forme ale unui instrument sunt menite a fi
echivalente ntre ele - adic forme alternative.
n alte cazuri, diferite forme pot exista in
foarte diferite grupuri (de exemplu,
formularul pentru copii i forma unui adult).
n cazul n care exist mai mult de o form,
s indice dac acestea sunt forme
mprumutate-echiva / alternative, sau dac
acestea sunt proiectate pentru a servi diferite
funcii - de exemplu, versiunea scurt i lung
durat; ipsative i versiunea normativ. De
asemenea, descrie dac sunt sau nu pri ale
ntregului test poate fi folosit n loc de
ntregul instrument. n cazul n care nu exist
com-puterised versiuni, descrie pe scurt

09-04-2013

Page 23

Board of Assessment Report 2013: Document 110c - Annex B


be indicated.

Test Review Form Version 4.2.6

software i hardware re-cerine pe. Reinei c


testele pe baz de sine stttoare comcalculator (CBT) i on-line pack-vrstele,
dac acestea sunt disponibile, ar trebui s fie
indicate.

09-04-2013

Page 24

Board of Assessment Report 2013: Document 110c - Annex B

4 Measurement and scoring


Msurare i notare

4.1

Scoring procedure for the test (select


all that apply)
This item should be completed by
reference to the publishers information
and the manuals and documentation.
Bureau services are services provided by
the supplier - or some agent of the
supplier - for scoring and interpretation. In
general these are optional services. If
scoring and/or interpretation can be
carried out ONLY through a bureau
service, then this should be stated in the
review - and the costs included in the
recurrent costs item.

Procedura de notare pentru test


(selectai toate care se aplic) Acest
produs ar trebui s fie completat prin
refer-ence la informaiile editorului i
manualele i documentare. serviciile
Biroului sunt serviciile oferite de
furnizor - sau un agent al sup-cletelui
- pentru notare i interpretare. n
general, acestea sunt serviciile
optionale. n cazul n gol i / sau
interpretare poate fi auto-tuate DOAR
printr-un serviciu de birou de, atunci
acest lucru ar trebui precizat n cadrul
reexaminrii - i costurile incluse n
elementul costuri recurente.
4.2

Scores
This item should be completed by
reference to the publishers information
and the manuals and documentation.
Brief description of the scoring system to
obtain global and partial scores,
correction for guessing, qualitative
interpretation aids, etc).

4.3

Scales used (select all that apply)


This item should be completed by
reference to the publishers information
and the manuals and documentation.
Test Review Form Version 4.2.6

Computer scoring with direct entry of


responses by test taker
Computer scoring by Optical Mark Reader
entry of responses from the paper response
form
Computer scoring with manual entry of
responses from the paper response form
Simple manual scoring key clerical skills only
required
Complex manual scoring requiring training in
the scoring of the instrument
Bureau-service e.g. scoring by the company
selling the instrument
Other (please describe):

punctaj calculator cu intrare direct a


rspunsurilor prin testul taker notare
calculator de intrare optic Mark Reader a
rspunsurilor de rspuns de hrtie formular
calculator marcat cu introducerea manual a
rspunsurilor din forma de rspuns de hrtie
cheie notare manual simplu - competene
de birou numai necesar Complex manual de
notare - formare care necesit n punctaj
instrumentului Bureau-service - de
exemplu, notare de societatea care vinde
instrumentul Altele (v rugm s descriei):

Scorurile
Acest articol trebuie s fie completate prin
referinta la informaiile editorului i
manualele i documentare.
Scurt descriere a sistemului de notare pentru
a obine scoruri globale i pariale, corectia
pentru ghicirea, mijloace auxiliare de
interpretare calitativ, etc).
Percentile Based Scores
Centiles
5-grade classification: 10:20:40:20:10 centile
splits
Deciles

09-04-2013

Page 25

Board of Assessment Report 2013: Document 110c - Annex B

Baremurile utilizate (selectai toate


variantele valabile)
Acest produs ar trebui s fie
completate
prin
refer-ence
la
informaiile editorului i manualele i
documentare

Other (please describe):

Bazat Scoruri avnd percentila


Centiles Clasificare 5 grade: 10: 20: 40: 20:
10 Decile se desparte statistic de cretere
Altele (v rugm s descriei):
Standard Scores
Z-scores
IQ deviation quotients etc (e.g. mean 100,
SD=15 for Wechsler or 16 for Stanford-Binet)
College Entrance Examination Board (e.g. SAT
mean=500, SD=100)
Stens
Stanines, C-scores
T-scores
Other (please describe):
Critical scores, expectancy tables or other
specific decision oriented indices
Raw score use only
Other (please describe):

Scorurile standard scorurile Z IQ


coeficienti de deviere, etc (de exemplu, medie
100, SD = 15 pentru Wechsler sau 16 pentru
Stanford-Binet) Colegiul de admitere de
examinare (de exemplu, SAT medie = 500,
SD = 100) Stens Stanines, C -scores Tscoruri altele (descriei): scorurile critice,
tabele sau sperana de ali indici de decizie
orientate-spe cific
utilizare scor brut Altele numai (descriei):

4.4

Score transformation for standard


scores

transformare scor pentru scorurile


standard,

Normalised standard scores obtained by use


of normalisation look-up table
Not-normalised standard scores obtained by
linear transformation
Not applicable

Normalised - Scorurile standard obinute


prin utilizarea de normalizare look-up table
Nu-normalizat - Scorurile standard obinute
prin transformare liniar Nu se aplic

Test Review Form Version 4.2.6

09-04-2013

Page 26

Board of Assessment Report 2013: Document 110c - Annex B

Test Review Form Version 4.2.6

09-04-2013

Page 27

Board of Assessment Report 2013: Document 110c - Annex B

5 Computer generated reports


Note that this section is purely descriptive. Evaluations of the reports should be given in the
Evaluation part of the review
For instances where there are multiple generated reports available please complete items 5.2 5.13 for
each report or substantive report section (copy pages as necessary). This classification system could be
used to describe two reports provided by a system, for example, Report 1 may be intended for the test
taker or other un-trained users, and Report 2 for a trained user who is competent in the use of the
instrument and understands how to interpret it.

Rapoartele generate de computer Reinei c aceast seciune este pur descriptiv. Evaluri ale
rapoartelor ar trebui furnizate n partea de evaluare a revizuirii Pentru cazurile n care exist mai
multe rapoarte generate disponibile v rugm s completai articolele 5.2 - 5.13 pentru fiecare
raport sau seciune raport de fond (pagini de copiere, dup caz). Acest sistem de clasificare ar
putea fi utilizat pentru a descrie dou rapoarte furnizate de un sistem, de exemplu, raportul 1 pot
fi destinate taker de testare sau de ali utilizatori ONU pregtii, i Raportul 2 pentru un utilizator
instruit, care este competent n utilizarea instrumentului i nelege cum s-l interpreteze.

5.1

Are computer generated reports


available with the instrument?
If the answer to 5.1 is 'YES' then the
following classification should be used to
classify the types of reports available. For
many instruments, there will be a range of
reports available. Please complete a
separate form for each report

Yes (complete items below) daca da,completati


No (move to item 6.1) nu,mergeti la punctual
6.1

Sunt generate pe calculator rapoarte cu


folos n msur instrumentul?
n cazul n care rspunsul la 5.1 este
"DA", atunci clasificarea-ing ulterioare
ar trebui s fie utilizate pentru a
clasifica tipurile de rapoarte disponibile.
Pentru mai multe instrumente, va exista
o serie de rapoarte disponibile. V
rugm s completai un formular separat
pentru fiecare raport
5.2

Name or description of report


(see introduction to this section)

5.3

Media (select all that apply)

Numele sau descrierea raportului (a se vedea


introducerea la aceast seciune)

Reports may consist wholly of text or


contain text together with graphical or
tabular representations of scores (e.g. sten
profiles). Where both text and data are
presented, these may simply be presented
in parallel or may be linked, so that the
Test Review Form Version 4.2.6

Text only doar text


Unrelated text and graphics - Text

i grafic

fr legtur

09-04-2013

Page 28

Board of Assessment Report 2013: Document 110c - Annex B


relationship between text statements and
scores is made explicit.

Integrated text and graphics texte integrate si


grafice

Media (selectai toate care se aplic)


Graphics only doar grafic
Rapoartele pot consta n ntregime din
text sau invenie-con de text, mpreun
cu reprezentri grafice sau tabele ale
scorurilor (de exemplu, Sten pro-files).
n cazul n care att text, ct i datele
sunt pre-zentate, acestea pot fi pur i
simplu, prezentate n paralel sau poate fi
legat, astfel nct RelA-aduc ntre
declaraiile de text i scoruri este fcut
n mod explicit.
5.4

Complexity (select one)


Some reports are very simple, for example Simple (For example, a list of paragraphs
just substituting a text unit for a sten score
giving scale descriptions)
in a scale-by-scale description. Others are Medium (A mixture of simple descriptions and
more complex, involving text units which
some configural descriptions)
relate to patterns or configurations of scale Complex (Contains descriptions of patterns
scores and which consider scale interaction
and configurations of scale scores, and scale
effects.
interactions)

5.5

Complexitatea (selectai unul) Unele


rapoarte sunt foarte simple, de exemplu,
doar substituind o unitate de text pentru
un scor Sten ntr-o descriere tip grilby-scar. Alii sunt mai complexe, care
implic uniti de text care se refer la
modele sau configuraii ale scorurilor la
scar i care iau n considerare efectele
de interaciune pe scar.

simpl (De exemplu, o list de paragrafe


GIV-ing descrieri la scar)

Report structure (select one)

Scale based where the report is built around


the individual scales.
Factor based where the report is constructed
around higher order factors - such as the 'Big
Five' for personality measures.
Construct based where the report is built
around one or more sets of constructs (e.g. in
a work setting these could be such as team
types, leadership styles, or tolerance to stress;
in a clinical setting these could be different
kinds of psychopathology; etc.) which are
linked to the original scale scores.
Criterion based where the reports focuses on
links with empirical outcomes (e.g. school

Structure is related to complexity.

Structura raportului (selectai una)


Structura este legat de complexitate.

Test Review Form Version 4.2.6

Mediu (Un amestec de descrieri simple i


unele descrieri configural)
Complex (Conine descrieri ale modelelor i
configuraii ale scorurilor la scar, i
interaciuni la scar)

09-04-2013

Page 29

Board of Assessment Report 2013: Document 110c - Annex B


performance, therapy outcome, job
performance, absenteeism etc).
Other (please describe):

Scala de baza - n cazul n care raportul este


construit n jurul scalei individuale.
Factorul pe baz - n cazul n care raportul
este construit n jurul valorii de factori de
ordin superior - cum ar fi "Big Five" pentru
msuri de personalitate. Constructia de
baza- n cazul n care raportul este construit
n jurul unuia sau mai multor seturi de
constructe (de exemplu, ntr-o lucrare de
stabilire acestea ar putea fi, cum ar fi tipuri de
echipa, stiluri de conducere sau de toleran
la stres, ntr-un cadru clinic acestea ar putea fi
folosite diferite tipuri de psihopatologie; etc.),
care sunt legate de scoruri originale.
Criteriul bazat n cazul n care rapoartele se
concentreaz pe legturile cu rezultate
empirice (de exemplu, coal per-forman,
rezultatul terapiei, performanta de locuri de
munc, absenteism etc.). Altele (v rugm
s descriei):

5.6

Sensitivity to context (select one)


When people write reports they tailor the
language, form and content of the report to
the person who will be reading it and take
account of the purpose of the assessment
and context in which it takes place. In a
work and organizational context a report
produced for selection purposes will be
different from one written for guidance or
development; a report for a middle-aged
manager will differ from that written for a
young person starting out on a training
scheme and so on. In an educational
context a report produced for evaluation of
a students global ability to learn and
function in a learning environment will be
different from a report produced to assess
whether or not a student has a specific
learning disorder. A report directed to other
professionals suggesting learning goals and
interventions will differ from reports directed
to parents informing them of their childs
strengths and weaknesses. In a clinical
context a report produced for diagnostic
purposes will be different from a report
evaluating a patients potential for riskTest Review Form Version 4.2.6

One version for all contexts


Pre-defined context-related versions;
number of contexts: ....
User definable contexts and editable reports

O versiune pentru toate contextele


versiunile predefinite legate de context;
Numr de contexte: .... contexte care pot fi
definite de utilizator i rapoarte care se pot
modifica

09-04-2013

Page 30

Board of Assessment Report 2013: Document 110c - Annex B


taking behaviour. A report produced with the
purpose of providing feedback to patients
will be different from a report produced with
the purpose of informing authorities
whether or not it is safe to release a patient
from involuntary treatment.

Sensibilitate la context (selectai unul)


Atunci cnd oamenii scriu rapoarte pe
care le adapta limba, forma i coninutul
raportului persoanei care va fi citit-o i
ine seama de scopul evalurii i
contextul n care are loc. ntr-un context
de lucru i organizaional un raport
realizat n scopuri de selecie va fi DIFferent de la una scris pentru ndrumare
sau de-voltare; un raport pentru un om
ager de vrst mijlocie va fi diferit de
cea scris pentru o persoan tnr
pornind de la un program de formare i
aa mai departe. ntr-un context
educaional o re-port de produse pentru
evaluarea capacitii unui studenilor la
nivel mondial de a nva i de funcia
ntr-un mediu de nvare va fi diferit de
la un raport realizat pentru a evalua
dac este sau nu un student are o
tulburare specific de nvare.
Un raport direcionat ctre ali
profesioniti care sugereaz obiective de
nvare i intervenii vor fi diferite de
rapoarte direcionate ctre prini care le
informa despre punctele forte si
punctele slabe ale copilului. ntr-un
context clinic un raport realizat n scop
de diagnosticare va fi diferit de la un
raport de evaluare a potenialului unui
pacient pentru asumarea de riscuri de
comportament. Un raport elaborat cu
scopul de a oferi feedback-ul pacienilor
va fi diferit de la un raport realizat cu
scopul de a informa autoritile-ing dac
este sau nu este sigur pentru a elibera un
pacient de la involuntar tratament.
5.7

Clinical-actuarial (select all that apply)


Most report systems are based on clinical
judgment. That is, one or more people who
are 'expert-users' of the instrument in
question will have written the text units. The
Test Review Form Version 4.2.6

Based on clinical judgment of one expert


Based on clinical judgment of group of experts

09-04-2013

Page 31

Board of Assessment Report 2013: Document 110c - Annex B


reports will, therefore, embody their
particular interpretations of the scales.
Some systems include actuarial reports
where the statements are based on
empirical validation studies linking scale
scores to, for example, job performance
measures, clinical classification, etc.

Based on empirical/actuarial relationships

Pe baza opiniei clinice a unui expert


Bazat pe aprecierea clinic a grupului de
experi Pe baza relaiilor empirice /
actuariale

Clinice-actuariale (selectai toate care se


aplic) Majoritatea sistemelor de raport
se bazeaz pe judecata clinic. Aceasta
este, unul sau mai muli oameni care
sunt "utilizatorii de experi" ale
instrumentului n cauz vor fi scrise
unitile de text. Rapoartele vor, prin
urmare, ntruchipeaz particu-lar lor
interpretri ale scalelor. Unele sisteme
includ rapoartele actuariale n cazul n
care situaiile se bazeaz pe studii
empirice valida constituirea unor
sisteme care leag scoruri la scar,
pentru ex-amplu, masuri de performanta
de locuri de munc, clasificarea clinic,
etc.
5.8

Modifiability (select one)


The report output is often fixed. However,
some systems will produce output in the
form of a file that can be processed by the
user. Others may provide online interactive
access to both the end user and the test
taker.

Modifiability (selectai unul)


Ieirea unui raport este adesea fixat. Cu
toate acestea, unele sisteme vor produce
rezultate sub forma unui fiier care
poate fi procesat de ctre utilizator.
Altele pot oferi acces interactiv on-line
att utilizatorul final i de testare taker.

5.9

Not modifiable (fixed print-only output)


Limited modification (limited to certain areas,
e.g. biodata fields)
Unlimited modification (e.g. through access to
Word processor document file)
Interactive report which provides test taker
with an opportunity to insert comments or
provides ratings of accuracy of content (e.g.
through shared online access to an interactive
report engine)

Nu este modificabil (numai ieire tipar fix)


modificare limitat (limitat la anumite
zone, de exemplu, date de identitate
cmpuri) modificri nelimitate (de
exemplu, prin accesul la Word fiier
document procesor) raport interactiv,
care ofer taker de testare, cu posibilitatea
de a insera comentarii sau pro -vides rating
de precizie a coninutului (de exemplu,
prin accesul partajat online, la un motor cu
raport interactiv)

Degree of finish (select one)


Extent to which the system is designed to
generate integrated text - in the form of a
ready-to-use report - or a set of notes,
Test Review Form Version 4.2.6

Publication quality calitatea publicatiei


Draft quality calitatea schitei

09-04-2013

Page 32

Board of Assessment Report 2013: Document 110c - Annex B


comments, hypotheses etc..

Gradul de finisare (selectai unul)


Msura n care sistemul este proiectat
pentru a genera text integrat - sub forma
unui raport gata de utilizare - sau un set
de "note", comentarii, ipoteze etc. ..
5.10

Transparency (select one)


Systems differ in their openness or
transparency to the user. An open system is
one where the link between a scale score
and the text is clear and unambiguous.
Such openness is only possible if both text
and scores are presented and the links
between them made explicit. Other systems
operate as 'black boxes', making it difficult
for the user to relate scale scores to text.

Transparena (selectai unul) Sistemele


difer prin deschiderea lor sau trans
paren ctre utilizator. Un sistem deschis
este unul n cazul n care legtura dintre
o scal scor, iar textul este clar i lipsit
de ambiguitate. O astfel de deschidere
este posibil doar n cazul n care att
text, ct i scorurile sunt prezentate i
legturile dintre acestea n mod explicit.
Alte sisteme funcioneaz ca "cutiile
negre", ceea ce face dificil pentru ca
utilizatorul s se refere scoruri la scara
la text.
5.11

Clear linkage between constructs, scores and


text
Concealed link between constructs, scores and
text
Mixture of clear/concealed linkage between
constructs, scores and text

legtur clar ntre Construciile scoruri i


text ncastrat legtur ntre
Construciile scoruri i textul Amestec
de legtur clar / ascuns ntre
Construciile scoruri i text

Style and tone (select one)


Systems also differ in the extent to which
they offer the report reader guidance or
direction. In a work and organizational
context a statement as Mr X is very shy
and will not make a good salesman... is
stipulative, whereas other statements are
designed to suggest hypotheses or raise
questions, such as From his scores on
scale Y, Mr X appears to be very shy
compared to a reference group of
salespersons. If this is the case, he could
find it difficult working in a sales
environment. This needs to be explored
further with him. In an educational context
a stipulative statement might be: The
results show that Xs mathematical skills
are two years below the average of his
peers, whereas a statement designed to
suggest hypotheses might be: The results
Test Review Form Version 4.2.6

Directive/stipulative
Guidance/suggests hypotheses
Other (please describe):

Directiva / stipulative Orientare /


sugereaz ipoteze Altele (v rugm s
descriei):

09-04-2013

Page 33

Board of Assessment Report 2013: Document 110c - Annex B


indicate X is easily distracted by external
stimuli while performing tasks. Behavioural
observations during testing support this.
This should be taken under consideration
when designing an optimal learning
environment for X. In a clinical context a
stipulative statement might be: Test scores
indicate the patient has severe visual
neglect, and is not able to safely operate a
motor vehicle, whereas a statement
designed to suggest hypotheses might be:
Mrs Xs test scores indicate she may have
problems establishing stable emotional
relationships. This should be explored
further before a conclusion regarding
diagnosis is drawn.

Stilul i tonul (selectai unul) Sistemele


difer i n msura n care acestea ofer
ndrumarea cititorului raportul sau
direcie. ntr-o lucrare i de organizare
con text o declaraie ca fiind "domnul X
este foarte timid i nu va face un
vnztor bun ..." este stipulative, n timp
ce alte declaraii sunt concepute pentru
a sugera ipoteze sau de a ridica
probleme, cum ar fi "Din scorurile sale
pe scar Y, dl X pare a fi foarte timid
com-comparaie cu un grup de referin
de salespersons. n cazul n care acesta
este cazul, el ar putea gsi de lucru greu,
ntr-un mediu de vnzare. Acest lucru
trebuie s fie analizate n continuare cu
el ". ntr-un context educaional o
declaraie stipulative ar putea fi:
Rezultatele arat c abilitile
matematice X sunt doi ani sub media
colegilor si ", n timp ce o declaraie
proiectat s sugereze ipoteze ar putea
fi:" re-zultatele indica X este usor de
distras de stimuli externi n timpul
ndeplinirii sarcinilor. observaii
Comportamentale n timpul testrii
sprijini acest lucru. Acest lucru trebuie
luat n calcul la proiectarea unui mediu
optim de nvare pentru X ". ntr-un
context clinic o declaraie stipulative ar
putea fi: "Rezultatele testelor indica
pacientul are neglijare vizual sever, i
nu este n msur s funcioneze n
condiii de siguran un vehicul cu
motor", n timp ce un stat-ment
Test Review Form Version 4.2.6

09-04-2013

Page 34

Board of Assessment Report 2013: Document 110c - Annex B

proiectat s sugereze ipoteze ar putea fi:


"rezultatele testelor doamnei X- indic
faptul c ar putea avea probleme de
stabilirea unor relaii emoionale stabile.
Acest lucru ar trebui s fie ex plored n
continuare nainte de a se face o
concluzie cu privire la diagnostic "
5.12

Intended recipients (select all that apply)


Reports are generally designed to address
the needs of one or more categories of
users. Users can be divided into four main
groups:
a) Qualified test users. These are people
who are sufficiently knowledgeable and
skilled to be able to produce their own
reports based on scale scores. They should
be able to make use of reports that use
technical psychometric terminology and
make explicit linkages between scales and
descriptions. They should also be able to
customize and modify reports.
b) Qualified system users. While not
competent to generate their own reports
from a set of scale scores, people in this
group are competent to use the outputs
generated by the system. The level of
training required to attain this competence
will vary considerably, depending on the
nature of the computer reports (e.g. traitbased versus competency-based, simple or
complex) and the uses to which its reports
are to be put (low stakes or high stakes).
c) Test Takers. The person who takes the
instrument will generally have no prior
knowledge of either the instrument or the
type of report produced by the system.
Reports for them will need to be in
language that makes no assumptions about
psychometric or instrument knowledge.
d) Third parties. These include people other than the candidate - who will be privy
to the information presented in the report or
who may receive a copy of the report. They
may include potential employers, a person's
manager or supervisor or the parent of a
young person receiving careers advice.
The level of language required for people in
this category would be similar to that
required for reports intended for Test
Takers.

Qualified test users utilizatori calificati ai


testului

Qualified system users utilizatori calificati ai


sistemului

Test takers factorii de testare

Third Parties a treia parte

destinatari (selectai toate care se aplic)


Rapoartele sunt, n general, concepute
Test Review Form Version 4.2.6

09-04-2013

Page 35

Board of Assessment Report 2013: Document 110c - Annex B

pentru a rspunde nevoilor uneia sau


mai multor categorii de utilizatori.
Utilizatorii pot fi mprite n patru
grupe principale:
a) utilizatorii de test calificat. Acetia
sunt oameni care sunt suficient de
cunotine i de specialitate pentru a fi
capabil s produc propriile lor reporturi pe baza scorurilor la scar.
Acestea ar trebui s fie n msur s
fac uz de rapoarte care utilizeaz
terminologia tehnic psihometric i de a
face legturi explicite ntre cntare i
descrieri. Acestea ar trebui s fie, de
asemenea, posibilitatea de a personaliza
i de a modifica rapoartele
b) utilizatori ai sistemului calificat. Dei
nu este com-petente pentru a genera
propriile rapoarte de la un set de scoruri
la scar, persoanele din acest grup sunt
competente s utilizeze ieirile generate
de sistem. Nivelul de pregtire necesar
pentru realizarea acestei competene va
varia n mod considerabil, n funcie de
natura rapoartelor de calculator (de
exemplu, ca trstur pe baz versus
competen pe baz, simple sau
complexe), precum i utilizrile pentru
care rapoartele sale urmeaz s fie puse
(pe mize mici sau mize mari).
c) Testul Takers. Persoana care ia
instrumentul va avea, n general, nici o
cunoatere prealabil, fie a
instrumentului sau de tipul de raport
produs de sistem. Rapoartele pentru ei
vor trebui s fie n lan-guage care face
presupuneri cu privire la cunotinele
psihometrice sau un instrument
d) tere pri. Printre acestea se numr
oameni - altele dect candidatul - care
va fi ascuns la informaiile prezentate n
raport sau care poate s primeasc o
copie a raportului. Ele pot include
poteniali angajatori, manager sau
supervizor unei persoane sau printele
unei persoane tinere care primesc sfaturi
cariere. Nivelul de limb necesar pentru
persoanele din aceast categorie ar fi
similar cu cea necesar pentru
Test Review Form Version 4.2.6

09-04-2013

Page 36

Board of Assessment Report 2013: Document 110c - Annex B

rapoartele destinate testelor Takers


5.13

Do distributors offer a service to modify


and/or
develop
customised
computerised reports? (select one)

Yes da
No nu

distribuitorii ofer un serviciu pentru a


modifica i / sau de a dezvolta rapoarte
personalizate ised calculator? (alege
unul)

Test Review Form Version 4.2.6

09-04-2013

Page 37

Board of Assessment Report 2013: Document 110c - Annex B

6 Supply conditions and costs


This defines what the publisher will provide, to whom, under what conditions and at what costs. It defines
the conditions imposed by the supplier on who may or may not obtain the instrument materials. If one of
the options does not fit the supply conditions, provide a description of the relevant conditions

condiiile de aprovizionare i costuri Aceasta definete editorul va oferi, cui, n condiii ce i la ce


costuri. El definete condiiile impuse de furnizor pe care poate sau nu poate obine materialele
instrumentului. n cazul n care una dintre opiuni nu se ncadreaz n condiiile de aprovizionare,
furnizeaz o descriere a condiiilor relevante

6.1

Documentation
provided
by
the
distributor as part of the test package
(select all that apply)
Documentaia furnizat de di-tributor
ca parte din pachetul de testare (selecteaza toate care se aplic)

User Manual
Technical (psychometric) manual
Supplementary technical information and
updates (e.g. local norms, local validation
studies etc.)
Books and articles of related interest

Manual de utilizare Tehnic (psihometrice)


Manual Informaii suplimentare tehnice i
up-date (de exemplu, normele locale, studii
de validare locale, etc.) cri i articole de
interes aferente
6.2

Methods of publication (select all that


apply)
For example, technical manuals may be
kept up-to-date and available for
downloading from the Internet, while user
manuals are provided in paper form or on
a CD/DVD.

Paper hartie
CD or DVD cd sau dvd
Internet download descarcate de pe internet
Other (specify): altele (specificati)

Metodele de publicare (selectai toate


variantele valabile) De exemplu,
manuale tehnice pot fi actualizate la zi
i disponibile pentru descrcare de pe
Internet, n timp ce manualele de
utilizare sunt furnizate sub form de
hrtie sau pe un CD / DVD.
Items 6.3 - 6.5 cover costs. This information is likely to be the most quickly out of date. It is
recommended that the supplier or publisher is contacted as near the time of publication of the review as
possible, to provide current information for these items.

nregistrrile 6.3 - 6.5 costurile de acoperire. Aceast informaie este probabil s fie cel mai
rapid din data. Este reco-reparat ca furnizorul sau editorul este contactat aproape n momentul
publicrii revizuirii ca posi-ble, pentru a furniza informaii actuale pentru aceste elemente.)

Test Review Form Version 4.2.6

09-04-2013

Page 38

Board of Assessment Report 2013: Document 110c - Annex B


6.3.1

Start-up costs
Price of a complete set of materials (all
manuals and other material sufficient for
at least one sample administration).
Specify how many test takers could be
assessed with the materials obtained for
start-up costs, and whether these costs
include
materials
for
recurrent
assessment.
This item should try to identify the 'set-up'
cost. That is the costs involved in
obtaining a full reference set of materials,
scoring keys and so on. It only includes
training costs if the instrument is a 'closed'
one - where there will be an unavoidable
specific training cost, regardless of the
prior qualification level of the user. In such
cases, the training element in the cost
should be made explicit. The initial costs
do NOT include costs of general-purpose
equipment (such as computers, DVD
players and so on). However, the need for
these should be mentioned. In general,
define: any special training costs; costs of
administrator's
manual;
technical
manual(s); specimen or reference set of
materials; initial software costs, etc.

6.3.2

Recurrent costs
Specify, where appropriate, recurrent
costs of administration and scoring
separately from costs of interpretation
(see 6.4.1 6.5).
This item is concerned with the on-going
cost of using the instrument. It should give
the cost of the instrument materials
(answer sheets, non-reusable or reusable
question
booklets,
profile
sheets,
computer usage release codes or dongle
units, etc.) per person per administration.
Note that in most cases, for paper-based
administration such materials are not
available singly but tend to be supplied in
packs of 10, 25 or 50.
Itemise any annual or per capita licence
fees (including software release codes
where relevant), costs of purchases or
leasing re-usable materials, and per
candidate
costs
of
non-reusable

Test Review Form Version 4.2.6

Punerea n funciune a costurilor Pretul unui set


complet de materiale (toate manualele i alte
materiale suficiente pentru cel puin o
administrare de prob). Spec-cat de multe
factorii touchpad de test ar putea fi la fel deevaluat cu materialele obinute pentru
costurile de start-up, i dac aceste costuri
includ materiale pentru recurente evalua-ment.
Acest produs ar trebui s ncerce s identifice
costul de "set-up". Aceasta este costurile
implicate n obinerea unui set-ing referin plin
de materiale, chei-ing SCOR i aa mai departe.
Acesta include numai costurile-ing tren n cazul
n care instrumentul este "nchis" one - n cazul
n care va exista un cost de formare specific
inevitabil, indiferent de nivelul de calificare
prealabil al utilizatorului. n astfel de cazuri,
elementul de formare n costul ar trebui s fie
fcut n mod explicit. Costurile iniiale nu
includ costurile de echipamente de uz general
(cum ar fi calculatoare, playere DVD i aa mai
departe). Cu toate acestea, ar trebui menionat
necesitatea acestora. n general, definesc: orice
costuri speciale de formare; costurile manualul
administratorului; manual tehnic (e); specimen
sau set de referin de materiale; costurile de
software iniiale, etc.
Costurile recurente Se specific, dac este
cazul, costurile recurente de administrare i de
notare sepa-rat din costurile de interpretare
(vezi 6.4.1 - 6.5). Acest post este preocupat de
costul n curs de desfurare a utilizrii
instrumentului. Acesta ar trebui s dea costul
materialelor instrumentului (foi de-, non-nonrspuns reutilizabile sau reutilizabile brouri
ntrebare, foi de profil, codurile de lansare de
utilizare com-calculator sau uniti "dongle",
etc.) per persoan per administrare. Reinei c,
n majoritatea cazurilor, pentru administrare pe
suport de hrtie astfel de materiale nu sunt
disponibile individual, dar tind s fie furnizate
n cutii cu 10, 25 sau 50. Itemise orice anuale
sau taxe de licen cap (inclusiv codurile de
lansare de software acolo unde este cazul),
costurile per achiziii sau materiale de leasing
reutilizabile, iar costurile pe-can can- din
materiale care nu sunt reutilizabile
Afiai originalul

09-04-2013

Page 39

Board of Assessment Report 2013: Document 110c - Annex B


materials.
6.4.1

Prices for reports generated by user


installed software

Preturi pentru rapoartele generate de


software instalate de utilizator
6.4.2

Prices for reports generated


postal/fax bureau service

by

Preturi pentru rapoartele generate de


serviciul potal biroului / fax
6.4.3

Prices for reports by Internet service

Preturi pentru rapoartele generate de


servicii de internet
6.5

Prices for other bureau services:


correcting or developing automatic
reports

6.6

Test-related qualifications required by


the supplier of the test (select all that
apply)
This item concerns the user qualifications
required by the supplier. For this item,
where the publisher has provided user
qualification information, this should be
noted against the categories given.
Where the qualification requirements are
not clear this should be stated under
'Other' not under 'None'. 'None' means
that there is an explicit statement
regarding the lack of need for
qualification.
For details of the EFPA Level 2 standard,
consult the latest version of these on the
EFPA website.

calificri legate de testare cerute de


furnizorul testului (selectai toate
variantele valabile) Acest produs se
refer la calificrile cerute de utilizator
de ctre furnizor. Pentru acest articol,
n cazul n care editorul a furnizat
informaii de calificare utilizator, acest
lucru ar trebui notat n raport cu
categoriile de date. n cazul n care
cerinele de calificare nu sunt clare
acest lucru trebuie precizat la rubrica
Test Review Form Version 4.2.6

None
Test specific accreditation
Accreditation in general achievement testing:
measures of maximum performance in
attainment (equivalent to EFPA Level 2)
Accreditation in general ability and aptitude
testing: measures of maximum performance in
relation to potential for attainment (equivalent to
EFPA Level 2)
Accreditation in general personality and
assessment: measures of typical behaviour,
attitudes and preferences (equivalent to EFPA
Level 2)
Other (specify):

Nu exist Testul specific de acreditare


Acreditarea n testarea general realizare:
msuri de performane maxime n atingerea
mpcrii (echivalent cu EFPA Level 2)
Acreditarea n capacitatea general i
aptitudini de test-ing: msuri de performane
maxime n RelA-TION la potenialul de
realizare (echivalent cu EFPA Level 2)
Acreditarea n personalitatea general i caEvalurii: msuri de comportament tipic,
Atti-i preferine adoptm (echivalent cu
EFPA Level 2) Altele (precizai):

09-04-2013

Page 40

Board of Assessment Report 2013: Document 110c - Annex B

"Altele" nu se afl sub "Niciuna".


"Nimic" nseamn c exist o
declaraie explicit privind lipsa-ing
nevoii de calificare. Pentru detalii ale
standardului EFPA Level 2, consultai
cea mai recent versiune a acestora pe
site-ul EFPA.
6.7

Professional qualifications required for


use of the instrument (select all that
apply)
This item concerns the user qualifications
required by the supplier. For this section,
where the publisher has provided user
qualification information, this should be
noted against the categories given.
Where the qualification requirements are
not clear this should be stated under
'Other' not under 'None'. 'None' means
that there is an explicit statement
regarding the lack of need for
qualification.
For details of the EFPA user standards,
consult the latest version of these on the
EFPA website.

calificrile profesionale necesare


pentru utilizarea instrumentului
(selectai toate variantele valabile)
Acest produs se refer la calificrile
cerute de utilizator de ctre furnizor.
Pentru aceast seciune, n cazul n
care editorul a furnizat informaii de
calificare utilizator, acest lucru ar
trebui notat n raport cu categoriile de
date. n cazul n care cerinele de
calificare nu sunt clare acest lucru
trebuie precizat la rubrica "Altele" nu
se afl sub "Niciuna". "Nimic"
nseamn c exist o declaraie
explicit n ceea ce privete lipsa
nevoii de calificare. Pentru detalii
privind standardele de utilizare EFPA,
consultai cea mai recent versiune a
acestora pe site-ul EFPA.

Test Review Form Version 4.2.6

None
Practitioner psychologist with qualification in the
relevant area of application
Practitioner psychologist
Research psychologist
Non-psychologist academic researcher
Practitioner in relevant related professions
(therapy, medicine, counselling, education,
human resources etc.). Specify:
EFPA Test User Qualification Level 1 or national
equivalent
EFPA Test User Qualification Level 2 or national
equivalent
Specialist qualification equivalent to EFPA Test
User Standard Level 3
Other (indicate):

Nici unul Practicantul psiholog cu


calificare n domeniul de aplicare relevant
Practicantul psiholog psiholog de cercetare
cercettor academic non-psiholog
Practicantul n profesiile conexe relevante
(Ther-APY, medicin, consiliere, educatie,
resurse umane etc.). A se preciza: ..................
EFPA Test de calificare utilizator Nivelul 1 sau
echivalent naional EFPA test de calificare
utilizator Nivelul 2 sau echivalent naional
Specialist calificare echivalent Utilizator
EFPA Test de nivel standard 3 Altele
(indicai):

09-04-2013

Page 41

Board of Assessment Report 2013: Document 110c - Annex B

PART 2

Test Review Form Version 4.2.6

EVALUATION OF THE INSTRUMENT

09-04-2013

Page 42

Board of Assessment Report 2013: Document 110c - Annex B


Sources of information
Potentially there are four sources of information that might be consulted in carrying out this evaluation:

PARTEA 2 EVALUAREA Sursele instrument de informare cu potenial exist patru surse de


informaii care pot fi consultate n efectuarea acestei evaluri:
1. The manual and /or reports that are supplied by the publisher for the user:
These are always supplied by the publisher /distributor before the instrument is accepted by the
reviewing organisation and form the core materials for the review.

1. Manualul i / sau rapoarte care sunt furnizate de ctre editor pentru utilizator: Acestea sunt
ntotdeauna furnizate de ctre editorul / distribuitorul nainte de a instrumentului este acceptat
de ctre organizaia recenzare i formeaz materialele de baz pentru revizuire.
2. Open information that is available in the academic or other literature:
This is generally sourced by the reviewer and the reviewer may make use of this information in the
review and the instrument may be evaluated as having (or having not) made reference to the
information in its manual.

. informaii deschise, care sunt disponibile n literatura de specialitate sau a altor:


Aceasta este, n general, obinute de examinator i examinator poate face uz de aceste
informaii n cadrul revizuirii i instrumentul poate fi evaluat ca avnd (sau avnd nu) a fcut
referire la informaii n manualul su.
3. Information held by the publisher that is not formally published or distributed:
The distributor/publisher may make this available at the outset or may send it when the review is sent
back to the publisher to check for factual accuracy. The reviewer should make use of this information
but note very clearly at the beginning of the comments on the technical information that the starred
rating in this review refers to materials held by the publisher/distributor that is not [normally] supplied to
test users. If these contain valuable information, the overall evaluation should recommend that the
publisher publishes these reports and/or make them available to test purchasers.

Informaiile deinute de ctre editor, care nu este publicat sau distribuit n mod oficial:
distribuitorului / editorul poate face acest lucru disponibile de la nceput sau poate trimite n
cazul n care revizuirea este trimis napoi la editor pentru a verifica acurateea faptelor.
Examinatorul ar trebui s utilizeze aceste informaii, dar reinei foarte clar la nceputul
observaiilor asupra informaiilor tehnice pe care "rating-ul a jucat n aceast reexaminare se
refer la materialele care sunt deinute de ctre editorul / distribuitorul care nu este [n mod
normal] furnizat pentru a testa utilizatorilor" . n cazul n care acestea conin informaii
valoroase, evaluarea global ar trebui s recomande editorul public aceste rapoarte i / sau de
a le face disponibile pentru a testa cumprtori.
4. Information that is commercial in confidence:
In some instances, publishers may have technically important material that they are unwilling to make
public for commercial reasons. In practice there is very little protection available for intellectual property
to test developers (copyright law being about the only recourse). Such information could include
reports that cover the development of particular scoring algorithms, test or item generation procedures
and report generation technology. Where the content of such reports might be important in making a
Test Review Form Version 4.2.6

09-04-2013

Page 43

Board of Assessment Report 2013: Document 110c - Annex B


judgment in a review, the association or organization responsible for the review could offer to
undertake to enter into a non-disclosure agreement with the publisher. This agreement would be
binding on the reviewers and editor. The reviewer could then evaluate the information and comment on
the technical aspects and the overall evaluation to the effect that the starred rating in this review refers
to materials held by the publisher/ distributor that have been examined by the reviewers on a
commercial in confidence basis. These are not supplied to end users.

4. Informaiile care fac comercial ncredere: n unele cazuri, editorii pot avea materiale
importante punct de vedere tehnic ca acestea nu sunt dispui s fac publice din motive
comerciale. n practic, exist o protecie foarte puin disponibile pentru proprietate intelectual
pentru a testa dezvoltatorilor (legea privind drepturile de autor fiind despre singurul recurs).
Astfel de informaii ar putea include rapoarte care acoper dezvoltarea anumitor algoritmi de
notare, procedurile de testare sau de generare element i tehnologie de generare a rapoartelor. n
cazul n care coninutul acestor rapoarte ar putea fi important n a face o hotrre ntr-un
comentariu, asociaia sau organizaia responsabil pentru revizuirea ar putea oferi s se angajeze
s intre ntr-un acord de nedivulgare cu editorul. Acest acord ar fi obligatoriu pentru recenzorii i
editor. Dupa analiza ar putea evalua informaiile i comentariu cu privire la aspectele tehnice i
evaluarea general a efectului pe care "rating-ul a jucat n aceast reexaminare se refer la
materialele care sunt deinute de ctre editorul / distribuitorul care au fost examinate de ctre
recenzori pe o reclam n baza de ncredere . Acestea nu sunt furnizate utilizatorilor finali. "
Afiai originalul

Explanation of ratings
All sections are scored using the following rating system (see table on next page). Detailed descriptions
giving anchor-points for each rating are provided.
Where a [ 0 ] or [ 1 ] rating is provided on an attribute that is regarded as critical to the safe use of an
instrument, the review will recommend that the instrument should only be used in exceptional
circumstances by highly skilled experts or in research.
The instrument review needs to indicate which, given the nature of the instrument and its intended use,
are the critical technical qualities. It is suggested that the convention to adopt is that ratings of these
critical qualities are then shown in bold print.
In the following sections, overall ratings of the adequacy of information relating to validity, reliability and
norms are shown, by default, in bold.

Legend a ratingurilor Toate seciunile sunt marcate folosind urmtorul sistem de evaluare (a se
vedea tabelul de la pagina urmtoare). sunt furnizate descrieri detaliate care dau ancorare puncte
pentru fiecare categorie. n cazul n care o [0] sau [1] rating este furnizat pe un atribut care este
considerat ca fiind critic pentru utilizarea n siguran a unui instrument, revizuirea va recomanda
ca instrumentul s fie utilizat numai n cazuri excepionale, de ctre experi cu nalt calificare
sau n cercetare. Reexaminarea instrumentului trebuie s indice care, avnd n vedere natura
instrumentului i utilizarea preconizat, sunt calitile tehnice critice. Se sugereaz c aceast
convenie s adopte este c ratingurile acestor caliti critice sunt apoi afiate n format tiprit cu
caractere aldine. n urmtoarele seciuni, evaluri globale ale caracterului adecvat al informaiilor
referitoare la validitatea, fiabilitatea i normele sunt prezentate, n mod implicit, n caractere
aldine.
Test Review Form Version 4.2.6

09-04-2013

Page 44

Board of Assessment Report 2013: Document 110c - Annex B

Afiai originalul

Any instrument with one or more [ 0 ] or [ 1 ] ratings regarding attributes that are regarded as
critical to the safe use of that instrument, shall not be deemed to have met the minimum standard.

Orice instrument cu unul sau mai muli [0] sau [1] n ceea ce privete atributele evaluri, care
sunt considerate ca fiind critice pentru utilizarea n siguran a acestui instrument, nu se consider
a fi ndeplinit standardul minim.
Rating/evaluare

Explanation* / explicatie

[n/a]

This attribute is not applicable to this instrument


Acest atribut nu este aplicabil ascestui instrument

Not possible to rate as no, or insufficient information is provided

Nu este posibil pentru a evalua ca nu, sau informaii


insuficiente
1

Inadequate/ inadecvat

Adequate/ adecvat

Good/ bun

Excellent/ excelent

* A five point scale is defined by EFPA but each user can concatenate the points on the scale (for example
combining points 3 and 4 into a single point). The only constraint is that there must be a distinction made
between inadequate (or worse) on the one hand and adequate (or better) on the other. Descriptive terms or
symbols such as stars or smiley faces may be used in place of numbers. Where the five point scale is
replaced or customized, the user should provide a key that links the points and the nomenclature to the five
point scale of EFPA.

Test Review Form Version 4.2.6

09-04-2013

Page 45

Board of Assessment Report 2013: Document 110c - Annex B

Quality of the explanation of the rationale, the presentation and


the information provided

In this section a number of ratings need to be given to various aspects or attributes of the documentation
supplied with the instrument (or package). The term documentation is taken to cover all those materials
supplied or readily available to the qualified user: e.g. the administrator's manual; technical handbooks;
booklets of norms; manual supplements; updates from publishers/suppliers and so on.
Suppliers are asked to provide a complete set of such materials for each Reviewer. If you think there is
something which users are supplied with which is not contained in the information sent to you for review,
please contact your review editor.

Calitatea explicarea raionamentului, prezentarea i informaiile furnizate n aceast seciune o


serie de evaluri trebuie s fie acordate diverselor aspecte sau atribute ale documentaiei furnizat
mpreun cu instrumentul (sau pachet). Termenul "Documentaia" este luat pentru a acoperi
toate aceste materiale furnizate sau uor accesibile utilizatorului calificat: de exemplu, manualul
administratorului; manuale tehnice; brouri de norme; suplimente de manuale; actualizri de la
edituri / furnizori i aa mai departe. Furnizorii sunt rugai s furnizeze un set complet de astfel
de materiale pentru fiecare examinator. Dac credei c exist ceva care utilizatorii sunt furnizate
mpreun cu care nu sunt coninute n informaiile trimise la tine pentru revizuire, v rugm s
contactai editorul dvs. de revizuire.
Afiai originalul

7.1 Quality of the explanation of the rationale


If the instrument is a computer-adaptive test particular attention should be paid to the items 7.1.1 to 7.1. 6.

Calitatea explicarea raionamentului Dac instrumentul este o atenie special de test de


calculator adaptiv trebuie acordat elementelor 7.1.1 la 7.1. 6.

Items to be rated n/a or 0 to 4


7.1.1

Rating

Theoretical foundations of the constructs

n/a

n/a

n/a

Fundamentele teoretice ale constructelor


7.1.2

Test development (and/or translation or adaptation)


procedure

Dezvoltarea de testare (i / sau de traducere sau


adaptare) pro-ceduri
7.1.3

Thoroughness of the item analyses and item analysis


model
Seriozitatea analizei

i modelul de analiz

Test Review Form Version 4.2.6

09-04-2013

Page 46

Board of Assessment Report 2013: Document 110c - Annex B


7.1.4

Presentation of content validity

n/a

n/a

n/a

Prezentarea de valabilitate a coninutului


7.1.5

Summary of relevant research

Rezumatul cercetrii relevante


7.1.6

Overall rating of the quality of the explanation of the


rationale
This overall rating is obtained by using judgment based
on the ratings given for items 7.1.1 7.1.5

Evaluarea de ansamblu a calitii explicarea


raionamentului Acest rating de ansamblu este
obinut prin utilizarea unei judeci bazate pe
ratingurile acordate pentru articolele 7.1.1 - 7.1.5

7.2 Adequacy of documentation available to the user (user and technical


manuals, norm supplements, etc.)
The focus here is on the quality of coverage provided in the documentation accessible to qualified users.
Note that sub-section 7.2 is about the comprehensiveness and clarity of the documentation available to
the user (user and technical manuals, norm supplements etc.) in terms of its coverage and explanation. In
terms of the quality of the instrument as evidenced by the documentation, areas in this part are elaborated
on under: 7.1, 7.3, 9, 10 and 11.

Accentul aici este pe calitatea acoperirii furnizate n documentaia accesibil utilizatorilor


calificai. Reinei c sub-seciunea 7.2 este de aproximativ exhaustivitatea i claritatea
documentaiei disponibile utilizatorului (manuale de utilizare i tehnice, completeaz norma,
etc.), n ceea ce privete acoperirea i explicaia acesteia. n ceea ce privete calitatea
instrumentului dup cum reiese din documentaia, suprafeele din aceast parte sunt elaborate n
conformitate cu: 7,1, 7,3, 9, 10 i 11.
Items to be rated n/a or 0 to 4, benchmarks are provided for
an excellent (4) rating.
7.2.1

Rating

Rationale (see rating 7.1.6)


Excellent: Logical and clearly presented description of
what it is designed to measure and why it was
constructed as it was.

n/a

Raionament (vezi Evaluarea pe 7.1.6) Excelent:


logic i descrierea prezentate n mod clar a ceea ce
este proiectat pentru a msura i de ce a fost
construit aa cum era.

Test Review Form Version 4.2.6

09-04-2013

Page 47

Board of Assessment Report 2013: Document 110c - Annex B


7.2.2.1

Development
Excellent: Full details of item sources, development of
stimulus material according to accepted guidelines (e.g.
Haladyna, Downing, & Rodriguez, 2002; Moreno,
Martinez, & Muiz, 2006), piloting, item analyses,
comparison studies and changes made during
development trials.

n/a

n/a

n/a

Dezvoltare excelenta: Detalii complete surse de


elemente, dezvoltarea de materiale de stimulare n
conformitate cu liniile directoare acceptate (de
exemplu, Haladyna, Downing, i Rodriguez, 2002,
Moreno, Marti-nez, & Muiz, 2006), pilotarea,
analize post, studii de comparaie i modificrile
efectuate n timpul studiilor de dezvoltare
7.2.2.2

Development of the test through translation/adaptation


Excellent: Information in the manual showing that the
translation/adaptation process was done according to
international guidelines (ITC, 2000) and included:
Input from native speakers of new language
Multiple review by both language and content (of
test) experts
Back translation from new language into original
language
Consideration of cultural and linguistic differences.

Dezvoltarea testului prin traducere / adaptare


excelent: Informaii n prezentarea manual c
procesul de traducere / adaptare a fost realizat n
conformitate cu orientrile internaionale (ITC,
2000) i a inclus: Intrare de la vorbitori nativi de
limb nou recenzie multiple de ambele limbi i
continut (de testare) experi napoi traducere din
limba nou n limba original Luarea n
considerare a diferenelor culturale i lingvistice.
7.2.3

Standardisation
Excellent: Clear and detailed information provided about
sizes and sources of standardisation sample and
standardisation procedure.

Standardizare Excelent: clare i detaliate furnizate


informaii cu privire la dimensiunile i sursele de
prob de standardizare i procedura standardisation.

Test Review Form Version 4.2.6

09-04-2013

Page 48

Board of Assessment Report 2013: Document 110c - Annex B


7.2.4

Norms
Excellent: Clear and detailed information provided about
sizes and sources of norms groups, representativeness,
conditions of assessment etc.

n/a

n/a

n/a

n/a

Norme excelente: clare i detaliate furnizate


informaii cu privire la dimensiunile i sursele de
grupuri de norme, reprezentativitate, condiiile de
evaluare etc.
7.2.5

Reliability
Excellent: Excellent explanation of reliability and
standard error of measurement (SEM), and a
comprehensive range of internal consistency, temporal
stability and/or inter-scorer and inter-judge reliability
measures and the resulting SEMs provided with
explanations of their relevance, and the generalisability
of the assessment instrument.

Fiabilitate excelent: explicaie excelent de


fiabilitate i de eroare stan-dard de msurare (SEM),
precum i o gam larg de consisten intern,
stabilitate temporal i / sau inter-scorer i msurile
de fiabilitate inter-judector i rezultat SEM
prevzute cu explicaiile rele lor -vance, iar
generalisability evalurii n strument.
7.2.6

Construct validity
Excellent: Excellent explanation of construct validity with
a wide range of studies clearly and fairly described.

Validitatea de construcie excelent: explicaie


excelent a validitii de construct, cu o gam larg
de studii s-a descris n mod clar i corect
7.2.7

Criterion validity
Excellent: Excellent explanation of criterion validity with
a wide range of studies clearly and fairly described.

Criteriul de valabilitate excelent: explicaie


excelent a validitii criteriului cu o gam larg de
studii s-a descris n mod clar i corect
7.2.8

Computer generated reports


Excellent: Clear and detailed information provided about
the format, scope, reliability and validity of computer
generated reports.

Rapoarte generate de calculator: Excelent informaii


clare i detaliate furnizate cu privire la formatul,
domeniul de aplicare, fiabilitatea i valabilitatea
rapoartelor generate de calculator.
Test Review Form Version 4.2.6

09-04-2013

Page 49

Board of Assessment Report 2013: Document 110c - Annex B


7.2.9

Adequacy of documentation available to the user


(user and technical manuals, norm supplements,
etc.)

n/a

This rating is obtained by using judgment based on the


ratings given for items 7.2.1 7.2.8

Caracterul adecvat al documentaiei disponibile


utilizatorului (manuale de utilizare i tehnice,
suplimente norma, etc) Acest rating este obinut prin
utilizarea unei judeci bazate pe ratingurile
acordate pentru articolele 7.2.1 - 7.2.8

7.3 Quality of the procedural instructions provided for the user


7.4 Calitatea instruciunilor procedurale prevzute pentru utilizator
Items to be rated n/a or 0 to 4, benchmarks are provided for
an excellent (4) rating
7.3.1

Rating

For test administration


Excellent: Clear and detailed explanations and step-bystep procedural guides provided, with good detailed
advice on dealing with candidates' questions and
problem situations.

n/a

n/a

Pentru testul de administrare excelenta: explicaii


clare i detaliate i pas cu pas ghiduri de procedur
prevzute, cu bune detaliate ad-vice pe care se
ocup cu ntrebri candidailor i situaii
problematice.
7.3.2

For test scoring


Excellent: Clear and detailed information provided, with
checks described to deal with possible errors in scoring.
If scoring is done by the computer, is there evidence
that the scoring is done correctly?

Pentru un scor de ncercare excelente: clare i


informaii detaliate furnizate, cu verificri s-a
descris pentru a face fa cu posibile erori n notare.
n cazul n gol se face de ctre calculator, exist
dovezi c notarea se face corect?

Test Review Form Version 4.2.6

09-04-2013

Page 50

Board of Assessment Report 2013: Document 110c - Annex B


7.3.3

For norming
Excellent: Clear and detailed information provided, with
checks described to deal with possible errors in
norming.

n/a

n/a

n/a

If norming is done by the computer, is there evidence


that score transformation is correct and the right norm
group is applied?

Pentru normarea Excelent: informaii clare i


detaliate furnizate, cu controalele descrise pentru a
face fa cu posibile erori n norma-ing. n cazul n
care normarea se face de ctre calculator, exist
dovezi care nscrie de transformare este corect i
este aplicat grupul norm drept?
7.3.4

For interpretation and reporting


Excellent: Detailed advice on interpreting different
scores, understanding normative measures and dealing
with relationships between different scales, with
illustrative examples and case studies; also advice on
how to deal with the possible influence of inconsistency
in answering, response styles, faking, etc.

Pentru interpretare i raportare excelenta: detaliata


sfaturi cu privire la interpretarea scorurilor
diferite, nelegerea msurilor normative i care se
ocup cu relaiile dintre diferite scale, cu exemple
illustra tive i studii de caz; De asemenea, sfaturi
cu privire la modul de a face cu posibila influen
a incoeren ntr-un-swering, stiluri de rspuns,
prefaci, etc.
7.3.5

For providing feedback and debriefing test takers and


others
Excellent: Detailed advice on how to present feedback
to candidates including the use of computer generated
reports (if available)

Pentru furnizarea de feedback i debriefing factorii


de testare i alii Excelent: sfaturi detaliate cu
privire la modul de a prezenta feedback
candidailor, inclusiv utilizarea rapoartelor generate
de calculator (dac sunt disponibile

Test Review Form Version 4.2.6

09-04-2013

Page 51

Board of Assessment Report 2013: Document 110c - Annex B


7.3.6

For providing good practice issues on fairness and bias


Excellent: Detailed information reported about gender
and ethnic bias studies, with relevant warnings about
use and generalisation of validities

n/a

n/a

n/a

n/a

Pentru furnizarea de probleme de bun practic pe


corectitudine i prejudecat excelent: Informaii
detaliate raportate despre studiile de gen i
prejudecat etnic, cu avertismente relevante cu
privire la utilizarea i generalizarea elementelor
valide
7.3.7

Restrictions on use
Excellent: Clear descriptions of who should and who
should not be assessed, with well-explained
justifications for restrictions (e.g. types of disability,
literacy levels required etc.)

Restricii privind utilizarea excelente: descrieri


clare cine ar trebui i cine nu ar trebui s fie
evaluate, cu justificare-tiile bine explicat de
restricii (de exemplu tipuri de dizabilitate,
alfabetizare niv-ELS necesar etc.)
7.3.8

Software and technical support


Excellent: In the case of Computer Based Testing (CBT)
or Web Based Testing (WBT): the information with
respect to browser requirements, the installation of any
required computer software and the operation of the
software is complete (also covering possible errors and
different systems), and availability of technical support is
clearly described.

Software-ul i suport tehnic excelent: n cazul


testrii Computer Based (CBT) sau de testare
bazat pe Web (WBT): informaiile cu respect la
cerinele de browser, instalarea oricrui program de
calculator necesar i funcionarea software-ului este
complet ( care s acopere, de asemenea, posibile
erori i sisteme diferite), precum i disponibilitatea
suportului tehnic este descris n mod clar
7.3.9

References and supporting materials


Excellent: Detailed references to the relevant supporting
academic literature and cross-references to other
related assessment instrument materials.

Referinte si materiale de sprijin excelente: trimiteri


detaliate la literatura de specialitate academice
justificative relevante i trimiteri la alte materiale
de instrumente de evaluare aferente.

Test Review Form Version 4.2.6

09-04-2013

Page 52

Board of Assessment Report 2013: Document 110c - Annex B


7.3.10

Quality of the procedural instructions provided for


the user
This overall rating is obtained by using judgment based
on the ratings given for items 7.3.1 7.3.9

n/a

n/a

Calitatea instruciunilor procedurale prevzute


pentru utilizator Acest rating de ansamblu este
obinut prin utilizarea unei judeci bazate pe
ratingurile acordate pentru articolele 7.3.1 - 7.3.9
7.4

Overall adequacy
This overall rating for section 7 is obtained by using
judgment based on the overall ratings given for the subsections 7.1, 7.2, and 7.3.

adecvarea global Aceast clasificare general


pentru seciunea 7 se obine prin utilizarea unei
judeci bazate pe evaluri generale oferite pentru
sub-seciunile 7.1, 7.2 i 7.3.

Test Review Form Version 4.2.6

09-04-2013

Page 53

Board of Assessment Report 2013: Document 110c - Annex B

Reviewers comments on the documentation: (comment on rationale, presentation and information


provided)

comentariile evaluatorilor privind documentaia: (comentariu la fundamentarea, prezentarea i


informaiile furnizate)

Test Review Form Version 4.2.6

09-04-2013

Page 54

Board of Assessment Report 2013: Document 110c - Annex B

Quality of the test materials

8.1

Quality of the test materials of paper-and-pencil tests


(this sub-section can be skipped if not applicable)

Calitatea materialelor de testare 8.1 Calitatea materialelor de testare ale testelor de hrtie i creion
(aceast sub-seciune poate fi omis dac nu este cazul))
Items to be rated n/a or 0 to 4
8.1.1

Rating

General quality of test materials (test booklets, answer


sheets, test objects, etc.)

n/a

n/a

n/a

n/a

n/a

n/a

calitatea general a materialelor de testare (brouri de


test, foi de rspuns, obiectele de testare, etc.)
8.1.2

Ease with which the test taker can understand the task

Uurina cu care taker de test poate nelege sarcina


8.1.3

Clarity and comprehensiveness of the instruction


(including sample items and practice trials) for the test
taker

Claritatea i plenitudinea instruciunii (inclu-ing


elemente de prob i ncercri practice) pentru testare
taker
8.1.4

Ease with which responses or answers can be made by


the test taker

Uurina cu care rspunsurile sau rspunsuri pot fi


fcute de ncercare taker
8.1.5

Quality of the formulation of the items and clarity of


graphical content in the case of non-verbal items.

Calitatea formulrii elementelor i claritatea


coninutului grafic n cazul elementelor non-verbale.
8.1.6

Quality of the materials of paper-and-pencil tests


This overall rating is obtained by using judgment based on
the ratings given for items 8.1.1 8.1.5

Calitatea materialelor de teste de hrtie i creion


Acest rating de ansamblu este obinut prin utilizarea
unei judeci bazate pe ratingurile acordate pentru
articolele 8.1.1 - 8.1.5
8.2

Quality of the test materials of CBT and WBT


(this sub-section can be skipped if not applicable)

Test Review Form Version 4.2.6

09-04-2013

Page 55

Board of Assessment Report 2013: Document 110c - Annex B

Calitatea materialelor de testare CBT i WBT (aceast sub-seciune poate fi omis dac nu este
cazul)
Items to be rated n/a or 0 to 4
8.2.1

Rating

Quality of the design of the software (e.g. robustness in


relation to operation when incorrect keys are pressed,
internet connections fail etc.)

n/a

n/a

n/a

n/a

n/a

n/a

n/a

Calitatea design-ului a software-ului (de exemplu,


robusteii n relaie cu funcionare atunci cnd tastele
incorecte sunt apsate, conexiunile la internet
eueaz, etc.
8.2.2

Ease with which the test taker can understand the task

Uurina cu care cel ce face testul poate nelege


sarcina
8.2.3

Clarity and comprehensiveness of the instructions


(including sample items and practice trials) for the test
taker, the operation of the software and how to respond if
the test is administered by computer

Claritatea i plenitudinea instruciunilor (elementele


de prob incluzand studiile practice) pentru testare
taker, funcionarea software-ului i cum s
reacioneze n cazul n care testul este administrat de
calculator
8.2.4

Ease with which responses or answers can be made by


the test taker

Uurina cu care rspunsurile sau rspunsuri pot fi


fcute de examinator
8.2.5

Quality of the design of the user interface

Calitatea design-ului a interfeei utilizatorului


8.2.6

Security of the test against unauthorized access to items


or to answers

Securitatea testului mpotriva accesului neautorizat la


elemente sau rspunsuri
8.2.7

Quality of the formulation of the items and clarity of


graphical content in the case of non-verbal items.

Calitatea formulrii elementelor i claritatea


coninutului grafic n cazul elementelor non-verbale.

Test Review Form Version 4.2.6

09-04-2013

Page 56

Board of Assessment Report 2013: Document 110c - Annex B


8.2.8

Quality of the materials of CBT and WBT


This overall rating is obtained by using judgment based on
the ratings given for items 8.2.1 8.2.7

n/a

Calitatea materialelor de CBT i WBT Acest rating


de ansamblu este obinut prin utilizarea unei judeci
bazate pe ratingurile acordate pentru articolele 8.2.1 8.2.7

Reviewers comments on quality of the materials

comentariile evaluatorilor privind calitatea materialelor

Test Review Form Version 4.2.6

09-04-2013

Page 57

Board of Assessment Report 2013: Document 110c - Annex B

Norms

General guidance on assigning ratings for this section


It is difficult to set clear criteria for rating the technical qualities of an instrument. These notes provide
some guidance on the sorts of values to associate with inadequate, adequate, good and excellent ratings.
However these are intended to act as guides only. The nature of the instrument, its area of application, the
quality of the data on which norms are based, and the types of decisions that it will be used for should all
affect the way in which ratings are awarded.

Norme
orientri generale privind acordarea ratingurilor pentru aceast seciune, este dificil s se
stabileasc criterii clare pentru calitile de rating tehnice ale unui instrument. Aceste note ofer
unele ndrumri cu privire la tipurile de valori pentru a se asocia cu evaluri inadecvate, adecvate,
bune i foarte bune. Totui, acestea sunt destinate s acioneze ca numai ghidaje. Natura
instrumentului, aria de aplicare a acestuia, calitatea datelor pe care se bazeaz norme, precum i
tipurile de decizii pe care va fi utilizat pentru toate ar trebui s afecteze modul n care sunt
acordate evaluri.
To give meaning to a raw test score two ways of scaling or categorizing raw scores can be distinguished
(American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education, 1999). First, a set of scaled scores or norms may be derived from the
distribution of raw scores of a reference group. This is called norm-referenced interpretation (see subsection 9.1). Second, standards may be derived from a domain of skills or subject matter to be mastered
(domain-referenced interpretation) or cut scores may be derived from the results of empirical validity
research (criterion-referenced interpretation)(see sub-section 9.2). With the latter two possibilities raw
scores will be categorized in two (for example pass of fail) or more different score ranges, e.g. to assign
patients in different score ranges to different treatment programs, to assign pupils scoring below a critical
score to remedial teaching, or to accept or reject applicants in personnel selection.

Pentru a da un sens la un test de prime scor dou moduri de scalare sau categorisire scorurilor
brute pot fi distinse (American Association Educational Research, American Psychological
Association, i Consiliul Naional pentru Msurarea n Educaie, 1999). n primul rnd, un set de
scoruri sau norme scalate pot fi derivate din distribuia scorurilor brute ale unui grup de referin.
Aceasta se numete interpretare normai (a se vedea sub-seciunea 9.1). n al doilea rnd,
standardele pot fi derivate dintr-un domeniu de competene sau subiect care trebuie stpnite
(interpretare de referin-domeniu) sau tiate scorurile pot fi obinute din rezultatele cercetrii
validitii empirice (interpretarea de referin criteriu) (a se vedea subseciunea 9.2). Cu scorurile
prime ultimele dou posibiliti vor fi clasificate n dou (de exemplu, "trecere" a "nu") sau mai
multe intervale diferite de scor, de exemplu, pentru a atribui pacieni n diferite scor variaz la
programe de tratament diferite, pentru a aloca elevii punctare de mai jos un scor critic de predare
de remediere, sau de a accepta sau de a respinge solicitani de selecie a personalului.
Afiai originalul
9.1

Norm-referenced interpretation
(This sub-section can be skipped if not applicable)

Test Review Form Version 4.2.6

09-04-2013

Page 58

Board of Assessment Report 2013: Document 110c - Annex B


Notes on international norms
Careful consideration needs to be given to the suitability of international (same language) norms. Where
these have been carefully established from samples drawn from a group of countries, they should be rated
on the same basis as nationally based (single language) norm groups. Where a non-local norm is
provided strong evidence of equivalence for both test versions and samples to justify its use should be
supplied. Generally such evidence would require studies demonstrating scalar equivalence between the
source and target language versions. Where this has not been reported then it should be commented
upon in the Reviewers comments at the end of section 9.
An international norm may be the most appropriate for international usage (i.e. comparing people who
have taken the test in different languages) but the issues listed below should be considered in determining
its appropriateness. In general, use of an international norm requires the demonstration of at least
measurement equivalence between the source and target language versions of the test.

Interpretarea normala (Aceast sub-seciune poate fi omis dac nu este cazul)


Note cu privire la normele internaionale O atenie deosebit trebuie s se acorde adecvarea
(aceeai limb), normele internaionale. n cazul n care acestea au fost atent stabilite de probe
prelevate dintr-un grup de ri, acestea ar trebui s fie evaluat pe aceeai baz ca i la nivel
naional (n baza singur limb), grupe norm. n cazul n care o norm de baz non-local
furnizat dovezi puternice de echivalen pentru ambele versiuni de ncercare i probe pentru a
justifica utilizarea acestuia ar trebui s fie furnizate. n general, astfel de probe ar necesita studii
care sa demonstreze echivalena scalare ntre versiunile de limb surs i int. n cazul n care
acest lucru nu a fost raportat, atunci trebuie comentat n observaiile recenzorilor la sfritul
seciunii 9. O norm internaional poate fi cea mai potrivit pentru utilizare internaional (de
exemplu, compararea persoanelor care au luat testul n diferite limbi), dar problemele enumerate
mai jos ar trebui s fie luate n considerare n determinarea oportunitatea acesteia. n general,
utilizarea unei norme internaionale impune demonstrarea echivalenei cel puin msurare ntre
versiunile surs i limba int ale testului.
Afiai originalul
Note cu privire la normele internaionale necesit o atenie deosebit trebuie acordat adecvarea
(aceleiai limbi), normele internaionale. n cazul n care acestea au fost atent stabilite de probe
prelevate dintr-un grup de ri, acestea ar trebui s fie evaluat pe aceeai baz ca i la nivel
naional (n baza singur limb), grupe norm. n cazul n care o norm de baz non-local
furnizat dovezi puternice de echivalen pentru ambele versiuni de ncercare i probe pentru a
justifica utilizarea acestuia ar trebui s fie furnizate. n general, astfel de probe ar necesita studii
care sa demonstreze echivalena scalare ntre versiunile de limb surs i int. n cazul n care
acest lucru nu a fost raportat, atunci trebuie comentat n observaiile recenzorilor la sfritul
seciunii 9. O norm internaional poate fi cea mai potrivit pentru utilizare internaional (de
exemplu, compararea persoanelor care au luat testul n diferite limbi), dar problemele enumerate
mai jos ar trebui s fie luate n considerare n determinarea oportunitatea acesteia. n general,
utilizarea unei norme internaionale impune demonstrarea echivalenei cel puin msurare ntre
versiunile surs i limba int ale testului.
Afiai originalul
The nature of the sample

Test Review Form Version 4.2.6

09-04-2013

Page 59

Board of Assessment Report 2013: Document 110c - Annex B


The balance of sources of the sample (e.g. a sample that is 95% German with a 2% Italian and 3%
British is not a real international sample). A sample could be weighted to better reflect its different
constituents.
The equivalence of the background (employment, education, circumstances of testing etc.) of the
different parts of the sample. Norm samples which do not allow this to be evaluated are insufficient.

Natura eantionului
Echilibrul surselor de prob (de exemplu, o prob care este de 95%, din Germania, cu un
britanic de 2%, italian i 3% nu este o prob internaional real). O prob ar putea fi ponderate
pentru a reflecta mai bine constituenii si diferii.
Echivalena fundal (ocuparea forei de munc, educaie, condiii de testare, etc.) ale diferitelor
pri ale eantionului. probe de norma care nu permit acest lucru s fie evaluate sunt insuficiente.
The type of measure:
Where there are measures which have little or no verbal content then there will be less impact on
translation. This will apply to performance tests and to some extent to abstract and diagrammatic
reasoning tests where should be less impact on the scores.

Tipul msurii:
n cazul n care exist msuri care au coninut puin sau deloc verbal, atunci nu va fi un impact
mai mic asupra traducerii. Acest lucru se va aplica la testele de performan i ntr-o anumit
msur, la teste abstracte i raionament n cazul n care ar trebui s schematice fie un impact mai
mic asupra scorurilor

The equivalence of the test version used with the different language samples.
There should be evidence that all the language versions are well translated/adapted
Is there any evidence that any of the groups have completed the test in a non-primary language?

Echivalena versiunii de testare utilizat mpreun cu diferitele eantioane de limb. Trebuie s


existe dovezi c toate versiunile lingvistice sunt bine traduse / adaptate Exist dovezi c oricare
dintre grupurile s-au finalizat testul ntr-un limbaj non-primar?
Similarities of scores in different samples:
Evidence should be provided about the relative score patterns of the sample sections from different
countries. Where there are large differences these should be accounted for and the implications in use
discussed. E.g. if a Spanish sample scores higher on a scale than a Dutch sample is there an
explanation of what it means to compare members of either group, or a third group against the
average? Is there an interpretation of the difference?

Similariti de scoruri n probe diferite:


Probele trebuie furnizate cu privire la modelele de scor relative ale seciunilor de prob din
diferite ri. n cazul n care exist diferene mari de acestea ar trebui s fie contabilizate i
implicaiile n utilizarea discutate. De exemplu, n cazul n care scorurile de prob spaniole mai
mari pe o scar de un eantion olandez exist o explicaie a ceea ce nseamn a compara membrii
grupului, fie, sau un al treilea grup fa de media? Exist o interpretare a diferenei?

Test Review Form Version 4.2.6

09-04-2013

Page 60

Board of Assessment Report 2013: Document 110c - Annex B


Absence of these sources of evidence need to be commented upon in the Reviewers Comments at the
end of the section
Guidance given about generalising the norms beyond those groups included in the international norms
should be included in the manual for the instrument
e.g. if a norm is made up of 20% German, 20% French,20% Italian, 20% British and 20% Dutch, it
might be appropriate to use it as a comparison group for Swiss or Belgian candidates but it may not be
appropriate to use it as a comparison for a group of Chinese applicants.

Absena acestor surse de probe trebuie s fie comentate n Examinatorilor Comentariile la finalul
Ghidului seciunii dat despre generalizrii normelor, dincolo de aceste grupuri sunt incluse n
normele internaionale ar trebui s fie incluse n manualul pentru instrumentul de exemplu, n
cazul n care o norm este format din 20% din Germania, 20% francez, 20% italian, 20%
britanici i 20% olandez, ar putea fi necesar s-l foloseasc ca un grup de comparatie pentru
candidaii elveieni sau belgieni, dar este posibil s nu fie necesar s-l foloseasc ca comparaie
pentru un grup de solicitani chinezi.
9.1

Norm-referenced interpretation
Where an instrument is designed for use without recourse to norms or reference groups (e.g.,
ipsative tests designed for intra-individual comparisons only), the not applicable category should
be used rather than no information given. However, the reviewer should evaluate whether the
reasoning to provide no norms is justified, otherwise the category no information given must be
used.

Norma de referin- interpretare n cazul n care un instrument este proiectat pentru a fi


utilizate fr a se recurge la norme sau grupuri de referin (de exemplu, testele ipsative
concepute pentru numai comparaii intra-individuale), categoria "nu se aplic" ar trebui
s fie utilizate mai degrab dect "nici o informaie dat". Cu toate acestea, examinatorul
trebuie s evalueze dac raionamentul pentru a furniza nici o norm este justificat, n
caz contrar categoriei "nici o informaie dat" trebuie s fie utilizate.
9.1.1

Appropriateness for local use, whether local or international norms


Note that for adapted tests only local (nationally based) or really international norms are eligible
for the ratings 2, 3 or 4 even if construct equivalence across cultures is found. Where
measurement invariance issues arise separate norms should be provided for (sub)groups and
any issues encountered should be explained.

Potrivite pentru cazul utilizrii locale, dac normele locale sau internaionale Reinei c,
pentru testele adaptate numai normele locale (pe baz la nivel naional), sau ntr-adevr
internaionale sunt eligibile pentru rating 2, 3 sau 4, chiar dac construiesc echivalena
ntre diferitele culturi este gsit. n cazul n care apar probleme de invarianta-msur
ment necesar s se prevad norme distincte pentru (sub) grupuri i orice probleme
ntlnite trebuie explicate.
Not applicable Nu

n/a

se aplic

No information given Nu

exist informaii date

Not locally relevant (e.g. inappropriate foreign samples)


Test Review Form Version 4.2.6

09-04-2013

1
Page 61

Board of Assessment Report 2013: Document 110c - Annex B

Nu este relevant la nivel local (de exemplu, eantioane strine


necorespunztoare)

Local sample(s) that do(es) not fit well with the relevant application domain but could be
used with caution

prob (e) Local care fac (ele) care nu se potrivesc bine cu domeniul de aplicare
relevant, dar ar putea fi utilizat cu precauie

Local country samples or relevant international samples with good relevance for
intended application

eantioane de ar locale sau eantioane internaionale relevante, cu relevan


bun pentru aplicare
Local country samples or relevant international samples drawn from well-defined
populations from the relevant application domain

eantioane de ar locale sau eantioane internaionale relevante extrase din


Populatiile bine definite din domeniul de aplicare relevant
9.1.2

Appropriateness for intended applications

Pentru aplicaiile destinate adecvrii

Not applicable nu se aplica

n/a

No information given nu exista informatii date

Norm or norms not adequate for intended applications

Norma sau norme nu sunt adecvate pentru aplicaiile destinate

Adequate general population norms and/or range of norm tables, or adequate norms for
some but not all intended applications

norme adecvate populaiei n general i / sau gama de tabele normei, sau norme
adecvate pentru unele, dar nu toate aplicaiile destinate

Good range of norm tables

gam bun de tabele

Excellent range of sample relevant, age-related and sex-related norms with information
about other differences within groups (e.g. ethnic group mix)

Gama excelenta de legate de mbtrnirea populaiei i a normelor de prob


relevante, legate de sex, cu informaii despre alte diferene n cadrul unor grupuri
(de exemplu, se amestec grup etnic)
9.1.3

Sample sizes (classical norming)


For most purposes, samples of less than 200 test takers will be too small, as the resolution
provided in the tails of the distribution will be very small. The SEmean for a z-score with N = 200 is
0.071 of the SD - or just better than one T-score point. Although this degree of inaccuracy may
have only minor consequences in the centre of the distribution the impact at the tails of the
distribution can be quite big (and this may be the score ranges that are most relevant for
Test Review Form Version 4.2.6

09-04-2013

Page 62

Board of Assessment Report 2013: Document 110c - Annex B


decisions to be taken on basis of the test scores). If there are international norms then in
general, because of their heterogeneity, these need to be larger than the typical requirements of
local samples.
Different guideline figures are given for low and high stakes use. Generally high-stakes use is
where a non-trivial decision is based at least in part on the test score(s).

Mrimea eantionului (normant clasic) Pentru cele mai multe scopuri, mostre de factorii
de ncercare mai puin de 200 va fi prea mic, deoarece rezoluia prevzut n cozile de
distribuie va fi foarte mic. SEmean pentru un scor z cu N = 200 este 0.071 SD - sau pur
i simplu mai bine dect un punct de scor T. Cu toate c acest grad de imprecizie poate
avea doar consecine minore n centrul distribuiei impactului la cozile de distribuie
poate fi destul de mare (iar acest lucru poate fi intervalele de scor care sunt cele mai
relevante pentru deciziile care urmeaz s fie luate pe baza testului nscris). Dac exist
norme internaionale, atunci, n general, datorit eterogenitii lor, acestea trebuie s fie
mai mare dect cerinele tipice ale eantioanelor locale. Cifrele de orientare diferite sunt
date pentru a fi utilizate mize mici i mari. n general, cu miz mare este n cazul n care
utilizai o decizie de baz non-trivial se bazeaz cel puin parial pe punctajul testului (e
Low-stakes use/
miza mica a utilizari

High-stakes decisions
Miza mare a decizilor

Not applicable/ nu se aplica

n/a

No information given/ informatii lipsa


Inadequate sample size/ marimea

e.g. < 200

e.g. 200-299

e.g. 200-299

e.g. 300-399

Good sample size marimea mostrei e.g. 300-999

e.g. 400-999

e.g. 1000

mostrei inadecvate

Adequate sample size marimea


mostrei adecvate

adevarate

Excellent sample size marinea

e.g. 1000

mostrei excelente
9.1.4

Sample sizes continuous norming


Continuous norming procedures have become more and more popular. They are used
particularly for tests that are intended for use in schools (e.g. group 1 to 8 in primary education)
or for a specific age range (e.g. an intelligence test for 6-16 year olds). Continuous norming is
more efficient as fewer respondents are required to get the same amount of accuracy of the
norms. Bechger, Hemker, and Maris (2009) have computed some values for the sizes of
continuous norm groups that would give equal accuracy compared to classical norming. When
eight subgroups are used N = 70 (8x70) gives equal accuracy compared to Ns of 200 (8x200)
with the classical approach; N = 100 (x8) compares to 300 (x8) and N = 150 (x8) to 400 (x8). In
these cases the accuracy on the basis of the continuous norming approach is even better in the
middle groups, but somewhat worse in the outer groups. Apart from the greater efficiency,
another advantage is that, based on the regression line, values for intermediate norm groups can
be computed. However, the approach is based on rather strict statistical assumptions. The test
author has to show that these assumptions have been met, or that deviations from these
Test Review Form Version 4.2.6

09-04-2013

Page 63

Board of Assessment Report 2013: Document 110c - Annex B


assumptions do not have serious consequences for the accuracy of the norms.
Note that when the number of groups is higher, the number of respondents in each group may be
lower and vice versa. For high-stakes decisions, such as school admission, the required number
shifts by one step upwards.

Dimensiunea eantioanelor normarea continu procedurile de normare continue au


devenit din ce n ce mai populare. Ele sunt utilizate n special pentru testele-LY, care sunt
destinate utilizrii n coli (de exemplu, grupa 1 la 8, n nvmntul primar) sau pentru
un anumit interval de vrst (de exemplu, un test de inteligenta pentru copiii de 6-16
ani). normarea continu este mai efi-cient ct mai puini respondeni sunt necesare pentru
a obine aceeai cantitate de precizie a normelor. Bechger, HEMKER i Maris (2009) sau calculat anumite valori pentru dimensiunile grupurilor continue norma care ar da o
precizie egal n comparaie cu normare clasic. Atunci cnd se utilizeaz opt sub-grupe
N = 70 (8x70) ofer o precizie egal n comparaie cu Ns de 200 (8x200) cu abordarea
clasic; N = 100 (x8) se compar cu 300 (x8) i N = 150 (x8) la 400 (x8). n aceste
cazuri, precizia pe baza abordrii continue normare este chiar mai bine n grupurile de
mijloc, dar ceva mai ru n grupurile exterioare. n afar de o mai mare eficien , un alt
avantaj este acela c, bazat pe linia de regresie, valorile pentru grupurile normate
intermediare pot fi calculate. Cu toate acestea, abordarea se bazeaz pe ipoteze statistice
destul de stricte. Autorul testului trebuie s demonstreze c aceste ipoteze au fost
ndeplinite, sau c abaterile de la aceste ipoteze nu au consecine serioase pentru
corectitudinea normelor. Reinei c, atunci cnd numrul de grupuri este mai mare,
numrul de respondeni din fiecare grup poate fi mai mic i vice-versa. Pentru deciziile
pe mize mari, cum ar fi accesul la coal, numrul necesar de deplasri cu un pas n sus.
Not applicable nu se aplica

n/a

No information given informatii lipsa

Inadequate sample size (e.g. fewer than 8 subgroups with a maximum of 69


respondents each) Dimensiunea eantionului inadecvat (de exemplu, mai puin
de 8 subgrupuri, cu un maxim de 69 de a reaciona-ente fiecare

Adequate sample size (e.g. 8 subgroups with 70 - 99 respondents each) Dimensiunea


eantionului adecvat (de exemplu, 8 subgrupuri cu 70 - 99 de respondeni din
fiecare)

Good sample size (e.g. 8 subgroups with 100 - 149 respondents each) Dimensiunea
bun prob (de exemplu, 8 subgrupuri cu 100 - 149 de respondeni fiecare)

mrime
excelent eantionului (de exemplu, 8 subgrupuri cu cel puin 150 de respondeni
fiecare)
Excellent sample size (e.g. 8 subgroups with at least 150 respondents each)

9.1.5

Procedures used in sample selection (select one)


A norm group must be representative of the referred group. A sample can be considered
representative of the intended population if the composition of the sample with respect to a
number of variables (e.g., age, gender, education) is similar to that of the population, and when
the sample is gathered with a probability sampling model. In such a model the chance of being
Test Review Form Version 4.2.6

09-04-2013

Page 64

Board of Assessment Report 2013: Document 110c - Annex B


included in the sample is equal for each element in the population. In both probability and nonprobability sampling different methods can be used.
In probability sampling, when an individual person is the unit of selection, three methods can be
differentiated: purely random, systematic (e.g. each tenth member of the population) and
stratified (for some important variables, e.g. gender, numbers to be selected are fixed to
guarantee representativeness on these variables). However (e.g. for the sake of efficiency),
groups of persons can also be sampled (e.g. school classes), or a combination of group and
individual sampling can be used. In non-probability sampling four methods are differentiated:
pure convenience sampling (simply add every tested person to the norm group, as is done in
most samples for personnel selection; post-hoc data may be classified into meaningful subgroups based on biographical and situational information), quota sampling (as in convenience
sampling, but it is specified before how many respondents in each subgroup are needed, as is
done in survey research), snow ball sampling (ask you friends to participate, and ask them to ask
their friends, etc.) and purposive sampling (e.g., select extreme groups to participate).

Procedurile utilizate n selectarea eantionului (selectai unul) Un grup de norm trebuie


s fie reprezentativ al grupului menionat. O prob poate fi considerat reprezen-tantul
din populaia vizat n cazul n care compoziia probei cu privire la un numr de
variabile (de exemplu, vrst, sex, educaie) este similar cu cea a populaiei, iar atunci
cnd proba este adunat cu modelul de probabilitate de eantionare. ntr-un astfel de
model ansa de a fi n-clus n eantion este egal pentru fiecare element din populaie. n
ambele eantioane aleatoare i non-probabilitate pot fi folosite diferite metode.
Prelevarea de probe de probabilitate, atunci cnd o persoan fizic este unitatea de
selecie, trei metode pot fi difereniate: pur aleatoare, sistematic (de exemplu, fiecare
membru zecime din populaie) i stratificarea-ficate (pentru unele variabile importante,
cum ar fi sexul, numerele care urmeaz s fie selectat sunt fixate pentru a garanta
reprezentativitatea acestor variabile). Cu toate acestea (de exemplu, din motive de
eficien), grupuri de per-fiii pot fi, de asemenea, inclui n eantion (de exemplu, clasele
de coal), sau o combinaie de grup i individuale sam-Pling poate fi utilizat. n
eantionare non-probabilitate patru metode sunt difereniate: prelevare de probe
comoditate pur (pur i simplu adugai de fiecare persoan testat la grupul norm, aa
cum se face n majoritatea probelor pentru selectia personalului; post-hoc de date pot fi
clasificate n subgrupuri semnificative bazate pe biografice i informaii situationala),
eantionarea cotelor (ca n eantionare comoditate, dar este specificat nainte de modul n
care sunt necesare pentru muli respondeni din fiecare subgrup, aa cum se face n
cercetarea sondaj), zpad de prelevare de probe cu bil (v cer prietenii s participe i s
le cerei s cear acestora prieteni, etc.) i de prelevare de probe teleologic (de exemplu,
selectai grupuri extreme pentru a participa).
Afiai originalul

No information is supplied nicio informatie nu este furmizata

[ ]

Probability sample random probabilitatea mostrei/ esantion- intamplare

[ ]

Probability sample systematic probabilitatea mostrei/ esantion - sistematic

[ ]

Probability sample stratified probabilitatea mostrei/ esantion- stratificat

[ ]

Probability sample cluster probabilitatea mostrei/ esantion manunchi

[ ]

Test Review Form Version 4.2.6

09-04-2013

Page 65

Board of Assessment Report 2013: Document 110c - Annex B


Probability sample multiphases (e.g. first cluster then random within clusters)
probabilitatea mostrei/ esantion multifaze

[ ]

Non-probability sample convenience nu exista probabilitatea mostrei/ esantion-

[ ]

comodiatea
Non-probability sample quota nu exista probabilitatea mostrei/ esantion cota

[ ]

Non-probability sample snow ball nu exista probabilitatea mostrei/ esantion- bulgare

[ ]

de zapada
Non-probability sample purposive nu exista probabilitatea mostrei/ esantion

[ ]

Other, describe: ............................................... altele, descrieti

[ ]

Representativeness of the norm sample(s) Reprezentativitatea

9.1.6

eantionului normei (e)

Not applicable nu se aplica

n/a

No information given fara informative lipsa


Inadequate representativeness for the intended application domain or
representativeness cannot be adequately established with the information provided

the

reprezentativitate inadecvat pentru domeniul de aplicare intenionat sau


reprezen-vitatea nu poate fi stabilit n mod adecvat cu informaiile furnizate

Adequate adecvat

Good bun

Excellent: Data are gathered by means of a random sampling model; a thorough


description of the composition of the sample(s) and the population(s) with respect to
relevant background variables (such as gender, age, education, cultural background,
occupation) is provided; good representativeness with regard to these variables is
established

9.1.7

Excelenta: Datele sunt colectate prin intermediul unui model de eantionare


aleatorie; o descriere amnunit a compoziiei eantionului (e) i a populaiei (e)
n ceea ce privete variabilele de fond relevante (cum ar fi sexul, vrsta, educaia,
mediul cultural, ocupaie) este prevzut; bun reprezentativitate n ceea ce
privete aceste variabile este stabilit
Quality of information provided about minority/protected group differences, effects of age, gender
etc.

Calitatea informaiilor furnizate cu privire la diferentele de grup / protejate minoritare,


efectele de vrst, sex etc
Not applicable neaplicabila

n/a

No information given fara informati lipsa


Test Review Form Version 4.2.6

09-04-2013

0
Page 66

Board of Assessment Report 2013: Document 110c - Annex B

Inadequate information informatii inadecvate


Adequate general information, with minimal analysis

informaii generale adecvate,

cu analiza minim

Good descriptions and analyses of groups and differences

descrieri bune i analize

ale grupurilor i diferenelor

Excellent range of analyses and discussion of relevant issues relating to use and
interpretation gam excelent de analize i discutarea problemelor relevante

legate

de utilizare i inter-pretare

9.1.8

How old are the normative studies? Cat de vechi sunt studile normativa
Not applicable neaplicabile

9.1.9

n/a

No information given fara informati date

Inadequate, 20 years or older inadecvate, peste 20 de ani sau mai vechi

Adequate, norms between 15 and 19 years old adecavte, normativeintre 15 si 19 ani

Good, norms between 10 and 14 years old bune, intre 10 si 14 ani

Excellent, norms less than 10 years old excelente mai mici de 10 ani

Practice effects (only relevant for performance tests)

Efecte practice (relevante numai pentru

testele de performan)

Not applicable neaplicabile

n/a

No information given though practice effects can be expected N u

sunt informaii avnd

[ ]

n vedere, dei efectele practice pot fi de ateptat


General information given informatii generale date
Norms for second test application after typical test-retest-interval

[ ]

Normele de aplicare

al doilea test dup test-retest interval tipic

9.2

[ ]

Criterion-referenced interpretation
(This sub-section can be skipped if not applicable)

To determine the critical score(s) one can differentiate between procedures that make use of the judgment
of experts (these methods are also referred to as domain-referenced norming, see sub-category 9.2.1)
and procedures that make use of actual data with respect to the relation between the test score and an
external criterion (referred to as criterion-referenced in the restricted sense, see sub-category 9.2.2).
Test Review Form Version 4.2.6

09-04-2013

Page 67

Board of Assessment Report 2013: Document 110c - Annex B

Interpretarea se face referire-criteriu (Aceast sub-seciune poate fi omis dac nu este cazul)
Pentru a determina scorul (e) critic se poate diferenia ntre procedurile care fac uz de judecata
experilor (aceste metode sunt, de asemenea denumite normare de referin-domeniu, a se vedea
sub-categorie 9.2.1), precum i proceduri care utilizeaz datele reale n ceea ce privete relaia
dintre scorul de ncercare i un criteriu extern (denumit criteriu de referin n sens restrns, a se
vedea sub-categorie 9.2.2).

9.2.1

Domain-referenced norming

9.2.1.1

If the judgment of experts is used to determine the critical score, are the judges appropriately
selected and trained?
Judges should have knowledge of the content domain of the test and they should be
appropriately trained in judging (the work of) test takers and in the use of the standard setting
procedure applied. The procedure of the selection of judges and the training offered must be
described.

n cazul n care hotrrea de experi este utilizat pentru a determina scorul critic, sunt
judectorii selectai i instruit corespunztor? Judectorii trebuie s aib cunotine din
domeniul coninutului testului i trebuie s fie instruii corespunztor n judecarea
(lucrarea) factorii de ncercare i n utilizarea procedurii de setare standard aplicat.
Trebuie descris procedura de selecie a judectorilor i formarea oferit.

Not applicable

9.2.1.2

n/a

No information given

Inadequate

Adequate

Good

Excellent

If the judgment of experts is used to determine the critical score, is the number of judges used
adequate?
The required number of judges depends on the tasks and the contexts. The numbers
suggested should be considered as an absolute minimum.

n cazul n care hotrrea de experi este utilizat pentru a determina scorul critic, este
numrul de judectori utilizat adecvat? Numrul necesar de judectori depinde de
sarcinile i contextele. Numerele propuse trebuie s fie considerate drept un minim
absolut.

Not applicable

n/a

No information given
Test Review Form Version 4.2.6

0
09-04-2013

Page 68

Board of Assessment Report 2013: Document 110c - Annex B

9.2.1.3

Inadequate (less than two judges)

Adequate (two judges)

Good (three judges)

Excellent (four judges or more)

If the judgment of experts is used to determine the critical score, which standard setting
procedure is reported? (select one)

n cazul n care hotrrea de experi este utilizat pentru a determina scorul critic, care
procedura de stabilire a standardelor este raportat? (alege unul)

Nedelsky

[ ]

Angoff

[ ]

Ebel

[ ]

Zieky and Livingston (limit group)

[ ]

Berk (contrast groups)

[ ]

Beuk

[ ]

Hofstee

[ ]

Other, describe:
9.2.1.4

[ ]

If the judgment of experts is used to determine the critical score, which method to compute
inter-rater agreement is reported? (select one)
Coefficient p0

[ ]

Coefficient Kappa

[ ]

Coefficient Livingston

[ ]

Coefficient Brennan and Kane

[ ]

Intra Class Coefficient

[ ]

Other, describe:
9.2.1.5

[ ]

If the judgment of experts is used to determine the critical score, what is the size of the interrater agreement coefficients (e.g. Kappa or ICC)?
In the scientific literature there are no unequivocal standards for the interpretation of these
kinds of coefficients, although generally values below .60 are considered insufficient. Below
the classification of Shrout (1998) is followed. Using the classification needs some caution,
because the prevalence or base rate may affect the value of Kappa.

Test Review Form Version 4.2.6

09-04-2013

Page 69

Board of Assessment Report 2013: Document 110c - Annex B


n/a

Not applicable

9.2.1.6

No information given

Inadequate (e.g. r < 0.60)

Adequate (e.g. 0.60 r < 0.70)

Good (e.g. 0.70 r < 0.80)

Excellent (e.g. r 0.80)

How old are the normative studies?


n/a

Not applicable

9.2.1.7

No information given

Inadequate, 20 years or older

Adequate, norms between 15 and 19 years old

Good, norms between 10 and 14 years old

Excellent, norms less than 10 years old

Practice effects (only relevant for performance tests)

No information given though practice effects can be expected

[ ]

General information given

[ ]

Norms for second test application after typical test-retest-interval

[ ]

9.2.2

Criterion-referenced norming

9.2.2.1

If the critical score is based on empirical research, what are the results and the quality of this
research?
To answer this question no explicit guidelines can be given as to which level of relationship is
acceptable, not only because what is considered high or low may differ for each criterion to
be predicted, but also because prediction results will be influenced by other variables such as
base rate or prevalence. Therefore, the reviewer has to rely on his/her expertise for his/her
judgment. Also the composition of the sample used for this research (is it similar to the group
for which the test is intended, more heterogeneous, or more homogeneous?) and the size of
this group must be taken into account.

Dac scorul critic se bazeaz pe cercetri empirice, care sunt rezultatele i calitatea
acestei cercetri? Pentru a rspunde la aceast ntrebare nu exist linii directoare
explicite pot fi date la care nivelul de relaie este acceptabil, nu numai pentru c ceea
ce este considerat "ridicat" sau "mici" poate fi diferit pentru fiecare criteriu pentru a fi
prezis, dar, de asemenea, pentru c rezultatele de predicie va fi influenat de alte
Test Review Form Version 4.2.6

09-04-2013

Page 70

Board of Assessment Report 2013: Document 110c - Annex B

variabile, cum ar fi rata de baz sau prevalen. Prin urmare, examinatorul trebuie s se
bazeze pe expertiza lui / ei pentru judecata lui / ei. De asemenea, compoziia probei
utilizate pentru aceast cercetare (este similar cu grupul pentru care este destinat testul,
mai eterogen, sau mai omogene?) i mrimea acestui grup trebuie s fie luate n
considerare.
Afiai originalul

n/a

Not applicable

9.2.2.2

No information given

Inadequate

Adequate

Good

Excellent

How old are the normative studies? Cat de vechi sunt studiile normative?
n/a

Not applicable

9.2.2.3

No information given

Inadequate, 20 years or older

Adequate, norms between 15 and 19 years old

Good, norms between 10 and 14 years old

Excellent, norms less than 10 years old

Practice effects (only relevant for performance tests)

Efecte practice (relevante numai pentru testele de performan)


[ ]

No information given though practice effects can be expected

Nu exist informaii avnd n vedere, dei efectele practice pot fi de ateptat


General information given

[ ]

Norms for second test application after typical test-retest-interval

[ ]

Normele de aplicare al doilea test dup test-retest interval tipic


9.3

Overall adequacy
This overall rating is obtained by using judgment based on the ratings given for items 9.1
9.2.2.3.

Test Review Form Version 4.2.6

09-04-2013

Page 71

Board of Assessment Report 2013: Document 110c - Annex B


The overall rating for norm-referenced interpretation can never be higher than the rating for
the sample-size-item, but it can be lower dependent on the other information provided. From
this other information especially information about the representativeness and the ageing of
norms is relevant. If non-probability norm groups are used the quality of the norms can at most
be qualified as adequate, but only when the description of the norm group shows that the
distribution on relevant variables is similar to the target or referred group. The overall rating
should reflect the characteristics of the largest and most meaningful norms rather than
average across all published norms.
The overall rating for criterion-referenced interpretation in case judges are used to determine
the critical score can never be higher than the rating for the size of the inter-rater agreement,
but it can be lower dependent on the other information provided. From this other information
especially the correct application of the method concerned and the quality, the training and the
number of judges are important. If the critical score is based on empirical research, the rating
can never be higher than the rating for item 9.2.2.1, but it can be lower when the studies are
too old.

adecvarea global Aceast clasificare general este obinut prin utilizarea unei judeci
bazate pe ratingurile acordate pentru articolele 9.1 - 9.2.2.3. Evaluarea general pentru
interpretarea normai nu poate fi niciodat mai mare dect ratingul pentru eantionul
de dimensiune-element, dar poate fi dependent de jos pe celelalte informaii furnizate.
Din acest alte informaii n special informaii despre reprezentativitatea i mbtrnirea
normelor este relevant. Dac se folosesc grupe normate neprobabilistice calitatea
normelor poate fi cel mai calificat drept "adecvat", dar numai atunci cnd descrierea
grupului norm arat c distribuia pe variabile relevante este similar cu inta sau
grupul menionat. Evaluarea general ar trebui s reflecte caracteristicile cele mai mari
i cele mai semnificative norme, mai degrab dect "media" n toate normele
publicate. Evaluarea general pentru interpretarea menionat criteriu n judectorii de
caz sunt folosite pentru a determina scorul critic nu poate fi mai mare dect ratingul
pentru dimensiunea acordului inter-evaluatori, dar poate fi dependent de jos pe
celelalte informaii furnizate. Din aceast alte informaii n special aplicarea corect a
metodei n cauz i calitatea, instruirea i numrul de judectori sunt importante. Dac
scorul critic se bazeaz pe cercetarea empiric, rating-ul nu poate fi niciodat mai
mare dect ratingul pentru punctul 9.2.2.1, dar poate fi mai mic atunci cnd studiile
sunt prea vechi.

n/a

Not applicable
No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 72

Board of Assessment Report 2013: Document 110c - Annex B

Reviewers comments on the norms: Brief report about the norms and their history, including information
on provisions made by the publisher/author for updating norms on a regular basis. Comments pertaining to
non-local norms should be made here.

comentariile evaluatorilor cu privire la normele: Scurt raport cu privire la normele i istoricul


acestora, inclusiv informaii cu privire la dispoziiile fcute de editor / autor pentru actualizarea
normelor n mod regulat. Comentarii referitoare la normele de baz non-locale ar trebui s fie aici.

Test Review Form Version 4.2.6

09-04-2013

Page 73

Board of Assessment Report 2013: Document 110c - Annex B

10

Reliability

General guidance on assigning ratings for this section


Reliability refers to the degree to which scores are free from measurement error variance (i.e. a range of
expected measurement error). For reliability, the guidelines are based on the need to have a small
Standard Error for estimates of reliability. Guideline criteria for reliability are given in relation to two distinct
contexts: the use of instruments to make decisions about groups of people (e.g. organizational diagnosis)
and their use for making individual assessments. Reliability requirements are higher for the latter than the
former. Other factors can also affect reliability requirements, such as the kind of decisions made and
whether scales are interpreted on their own, or aggregated with other scales into a composite scale. In the
latter case the reliability of the composite should be the focus for rating not the reliabilities of the
components.

Fiabilitate orientare general privind acordarea ratingurilor pentru aceast seciune fiabilitate se
refer la gradul n care scorurile sunt libere de variaie erorilor de msurare (de exemplu, un
interval de eroare de msurare de ateptat). Pentru fiabilitate, liniile directoare se bazeaz pe
necesitatea de a avea o mic eroare standard pentru estimrile de fiabilitate. Criteriile de orientare
pentru fiabilitate sunt date n legtur cu dou contexte distincte: utilizarea instrumentelor pentru
a lua decizii cu privire la grupuri de persoane (de exemplu, diagnoza organizationala), precum i
utilizarea acestora pentru a face evaluri individuale. Cerinele de fiabilitate sunt mai mari pentru
acesta din urm dect cea dinti. Ali factori care pot afecta, de asemenea cerine de fiabilitate,
cum ar fi tipul deciziilor luate i dac scale sunt interpretate pe cont propriu, sau agregate cu alte
scri ntr-o scar de compozit. n acest din urm caz fiabilitatea compozitului ar trebui s se
concentreze pentru evaluare nu a Fiabilitate:
Fiabilitile componentelor.
Fiabilitatea se refer la gradul n care scorurile sunt libere de variaie erorilor de msurare (de
exemplu, un interval de eroare de msurare de ateptat). Pentru fiabilitate, liniile directoare se
bazeaz pe necesitatea de a avea o mic eroare standard pentru estimrile de fiabilitate. Criteriile
de orientare pentru fiabilitate sunt date n legtur cu dou contexte distincte: utilizarea
instrumentelor pentru a lua decizii cu privire la grupuri de persoane (de exemplu, diagnoza
organizationala), precum i utilizarea acestora pentru a face evaluri individuale. Cerinele de
fiabilitate sunt mai mari pentru acesta din urm dect cea dinti. Ali factori care pot afecta, de
asemenea cerine de fiabilitate, cum ar fi tipul deciziilor luate i dac scale sunt interpretate pe
cont propriu, sau agregate cu alte scri ntr-o scar de compozit. n acest din urm caz fiabilitatea
compozitului ar trebui s se concentreze pentru evaluare nu a Fiabilitate: Fiabilitile
componentelor.

When an instrument has been translated and/or adapted from a non-local context, one could apply
reliability evidence of the original version to support the quality of the translated/adapted version. In this
case evidence of equivalence of the measure in a new language to the original should be proposed.
Without this it is not possible to generalise findings in one country/language version to another. For
internal consistency reliability evidence based on local groups is preferable, however, as this evidence is
more accurate and usually easy to get. For some guidelines with respect to establishing equivalence see
the introduction of the section on Validity. An aide memoire of critical points for comment when an
instrument has been translated and/or adapted from a non-local context is included in the Appendix.

Test Review Form Version 4.2.6

09-04-2013

Page 74

Board of Assessment Report 2013: Document 110c - Annex B

Atunci cnd un instrument a fost tradus i / sau adaptat dintr-un context non-locale, se poate
aplica o dovad de fiabilitate a versiunii originale pentru a sprijini calitatea versiunii traduse /
adaptate. n acest caz, dovada echivalenei msurii ntr-o nou limb original ar trebui s fie
propuse. Fr acest lucru nu este posibil de a generaliza concluziile ntr-o singur ar versiune /
limb n alta. Pentru dovezi consecven fiabilitatea intern bazat pe grupuri locale este de
preferat, cu toate acestea, deoarece aceste dovezi sunt mai precise i de obicei, uor pentru a
obine. Pentru unele linii directoare cu privire la stabilirea echivalenei a se vedea introducerea
seciunii privind valabilitate. Un memoire consilier al punctelor critice pentru comentarii atunci
cnd un instrument a fost tradus i / sau adaptat dintr-un context non-local inclus n apendice.
It is difficult to set clear criteria for rating the technical qualities of an instrument. These notes provide
some guidance on the values to be associated with inadequate, adequate, good and excellent ratings.
However these are intended to act as guides only. The nature of the instrument, its area of application, the
quality of the data on which reliability estimates are based, and the types of decisions that it will be used
for should all affect the way in which ratings are awarded. Under some conditions a reliability of 0.70 is
fine; under others it would be inadequate. For these reasons, summary ratings should be based on your
judgment and expertise as a reviewer and not simply derived by averaging sets of ratings.

Este dificil s se stabileasc criterii clare pentru calitile de rating tehnice ale unui instrument.
Aceste note ofer unele indicaii cu privire la valorile care urmeaz s fie asociate cu evaluri
inadecvate, adecvate, bune i foarte bune. Totui, acestea sunt destinate s acioneze ca numai
ghidaje. Natura instrumentului, aria de aplicare a acestuia, calitatea datelor pe care estimrile de
fiabilitate se bazeaz, precum i tipurile de decizii pe care va fi utilizat pentru toate ar trebui s
afecteze modul n care sunt acordate evaluri. n anumite condiii o fiabilitate de 0,70 este bine;
sub altele, ar fi inadecvat. Din aceste motive, evaluri sumare ar trebui s se bazeze pe judecata i
expertiza ca referent i nu pur i simplu derivate prin calcularea mediei seturi de rating.
In order to provide some idea of the range and distribution of values associated with the various scales
that make up an instrument, enter the number of scales in each section. For example, if an instrument
being used for group-level decisions had 15 scales of which five had retest reliabilities lower than 0.6, six
between 0.60 and 0.70 and the other four in the 0.70 to 0.80 range, the median stability could be judged
as adequate (being the category in which the median of the 15 values falls). If more than one study is
concerned, first the median value per scale should be computed, taking the sample sizes into account; in
some cases results from a meta-analysis may be available, these can be judged in the same way. This
would be entered as:

n scopul de a oferi o idee despre gama i distribuia valorilor asociate diferitelor scale care
constituie un instrument, introducei numrul de solzi n fiecare seciune. De exemplu, dac un
instrument utilizat pentru deciziile la nivel de grup a avut 15 scale din care cinci au Fiabilitate:
Fiabilitile retestare mai mic dect 0,6, ase ntre 0,60 i 0,70, iar celelalte patru din 0.70-0.80
gama, stabilitatea median poate fi considerat drept " adecvat "(fiind categoria n care mediana
15 valori scade). n cazul n care mai mult de un studiu este n cauz, mai nti valoarea medie pe
scal trebuie s fie calculat, lund n considerare dimensiunile eantioanelor; n unele cazuri,
rezultate dintr-o meta-analiz pot fi disponibile, acestea pot fi judecate n acelai mod. Acest
lucru ar fi introduse ca:

Test Review Form Version 4.2.6

09-04-2013

Page 75

Board of Assessment Report 2013: Document 110c - Annex B


Stability
No information given
Inadequate (e.g. r < 0.60)
Adequate (e.g. 0.60 r < 0.70)
Good (e.g. 0.70 r < 0.80)
Excellent (e.g. r 0.80)
* M = median stability

Number of scales
(if applicable)
[-]
[5]
[6]
[4]
[0]

M*
0
1
2
3
4

For each of the possible ratings example values are given for guidance only - especially the distinctions
between Adequate, Good and Excellent. For high stakes decisions, such as personnel selection, these
example values will be .10 higher. However, it needs to be noted that decisions are often based on
aggregate scale scores. Aggregates may have much higher reliabilities than their component primary
scales. For example, primary scales in a multi-scale instrument may have reliabilities around 0.70 while
Big Five secondary aggregate scales based on these can have reliabilities in the 0.90s. Good test
manuals will report the reliabilities of secondary as well as primary scales.
It is realised that it may be impossible to calculate actual median figures in many cases. What is required
is your best estimate, given the information provided in the documentation. There is space to add
comments. You can note here any concerns you have about the accuracy of your estimates. For example,
in some cases, a very high level of internal consistency might be commented on as indicating a bloated
specific.

Pentru fiecare dintre posibilele exemple de rating valorile sunt date numai cu titlu orientativ - n
special distincia dintre "adecvat", "bine" i "excelent". Pentru deciziile pe mize mari, cum ar fi
selectarea personalului, aceste exemple de valori vor fi mai mari .10. Cu toate acestea, este
necesar s se constate c deciziile sunt de multe ori pe baza scorurilor la scara aggre poarta.
Agregate pot avea mult mai mari dect cele Fiabilitate: Fiabilitile scalele lor primare de
componente. De exemplu, cntare primare ntr-un instrument multi-scar poate avea n jurul
valorii de 0,70 n timp ce Fiabilitate: Fiabilitile mari cinci scale agregate secundare bazate pe
acestea pot avea n 0.90s Fiabilitate: Fiabilitile. manuale bune de testare va raporta de secundar
Fiabilitate: Fiabilitile i scale primare. Este realizat c poate fi imposibil s se calculeze cifrele
reale mediane, n multe cazuri. Ceea ce este necesar este estimarea cea mai bun, avnd n vedere
informaiile furnizate n documentaie. Exist spaiu pentru a aduga comentarii. Puteti observa
aici orice probleme le avei despre acurateea estimrilor tale. De exemplu, n unele cazuri, un
nivel foarte ridicat de consisten intern ar putea fi comentat ca indicnd un "specific umflat".

10

Reliability

10.1

Data provided about reliability (select two if applicable)

Date furnizate despre fiabilitate (selectai dou, dac este cazul)


No information given

Only one reliability coefficient given (for each scale or subscale)

Coeficientul de fiabilitate doar o singur dat (pentru fiecare scal sau subscala)

Test Review Form Version 4.2.6

09-04-2013

Page 76

Board of Assessment Report 2013: Document 110c - Annex B


Only one estimate of standard error of measurement given (for each scale or subscale)

Doar o singur estimare a erorii de msurare standard dat (pentru fiecare scal
sau subscala)
Reliability coefficients for a number of different groups (for each scale or subscale)

Coeficienii de fiabilitate pentru un numr de diferite grupuri (pentru fiecare


scal sau subscal))
Standard error of measurement given for a number of different groups (for each scale
or subscale)

Eroarea standard de msurare date pentru un numr de diferite grupuri (pentru


fiecare scal sau subscala)
10.2

Internal consistency
The use of internal consistency coefficients is not sensible for assessing the reliability of speed
tests, heterogeneous scales (also mentioned empirical or criterion-keyed scales; Cronbach,
1970), effect indicators (Nunnally & Bernstein, 1994) and emergent traits (Schneider & Hough,
1995). In these cases all items concerning internal consistency should be marked not
applicable. It is also biased as a method for estimating reliability of ipsative scales. Alternate
form or retest measures are more appropriate for these scale types.
Internal consistency coefficients give a better estimate of reliability than split-half coefficients
corrected with the Spearman-Brown formula. Therefore, the use of split-halves is only justified
if, for any reason, information about the answers on individual items is not available. Split-half
coefficients can be reported in item 10.7 (Other methods).

coeren intern Utilizarea coeficienilor de consisten intern nu este sensibil pentru


evaluarea fiabilitii testelor de vitez, scale eterogene (menionat, de asemenea scale
empirice sau tastate criteriu; Cronbach, 1970), indicatori de efect (Nunnally &
Bernstein, 1994) i trsturi emergente (Schneider & Hough, 1995). n aceste cazuri,
toate elementele referitoare la coerena intern ar trebui s fie marcate "nu se aplic."
De asemenea, prtinitoare ca o metod pentru estimarea fiabilitii scalelor ipsative.
Msurile de form sau retestare alternative sunt mai potrivite pentru aceste tipuri de
scar. Coeficienii de consisten intern dau o estimare mai bun a fiabilitii dect
coeficienii de divizare jumtate corectate cu formula Spearman-Brown. Prin urmare,
utilizarea split-jumatati se justific numai n cazul n care, din orice motiv, informaii cu
privire la rspunsurile pe elemente individuale nu sunt disponibile. coeficieni splitjumtate pot fi raportate la punctul 10.7 (alte metode).

10.2.1

Sample size
Not applicable

n/a

No information given

One inadequate study (e.g. sample size less than 100)

One adequate study (e.g. sample size of 100-200)

Test Review Form Version 4.2.6

09-04-2013

Page 77

Board of Assessment Report 2013: Document 110c - Annex B

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies


10.2.2

10.2.3

Kind of coefficients reported (select as many as applicable) tipul coeficientilor raportati, se aleg
cat mai multi
Not applicable

n/a

Coefficient alpha or KR-20

Lambda-2

Greatest lower bound

Omega (factor analysis)

Theta (factor analysis)

Other, describe: ..

Number of scales
(if applicable)

Size of coefficients

M*

Not applicable

10.2.4

n/a

No information given

[ ]

Inadequate (e.g. r < 0.70)

[ ]

Adequate (e.g. 0.70 r < 0.80)

[ ]

Good (e.g. 0.80 r < 0.90)

[ ]

Excellent (e.g. r 0.90)

[ ]

Reliability coefficients are reported with samples which . (select one)

Coeficienii de fiabilitate sunt raportate cu probe care .... (alege unul)

. do not match the intended test takers, leading to more favourable coefficients (e.g.
inflation by artificial heterogeneity)

.. nu se potrivesc cu factorii de ncercare destinate, ceea ce duce la coeficieni


mai favorabile (de exemplu, inflaia prin eterogenitate artificial))
. do not match the intended test takers, but the effect on the size of the coefficients is
unclear

nu se potrivesc cu factorii de testare prevzute, dar efectul asupra mrimii


coeficienilor este neclar

Test Review Form Version 4.2.6

09-04-2013

Page 78

Board of Assessment Report 2013: Document 110c - Annex B


. do not match the intended test takers, leading to less favourable coefficients (e.g.
reduction by restriction of range)

. match the intended test takers se potrivesc

Not applicable nu se aplica

n/a

.. nu se potrivesc cu factorii de ncercare destinate, ceea ce duce la coeficieni


mai puin favorabile (de exemplu, reducerea prin restrngerea domeniului)

10.3

Test retest reliability temporal stability


Test retest refers to relatively short time intervals, whereas temporal stability refers to longer
intervals in which more change is acceptable. Particularly for tests to be used for predictions
over longer periods both aspects are relevant. To assess the temporal stability more than one
retest may be required.
The use of a test retest design is not sensible for assessing the reliability of state measures
(actually a high test retest coefficient would invalidate the state character of a test). In this case
all items concerning test retest reliability should be marked not applicable.

Test de retestare fiabilitate - Testul de stabilitate temporal retestare se refer la


intervale relativ scurte de timp, n timp ce stabilitatea temporal se refer la intervale
mai lungi, n care o schimbare mai este acceptabil. In special pentru testele care
urmeaz s fie utilizate pentru a stabili previziuni pe perioade mai lungi de ambele
aspecte sunt relevante. Pentru a evalua stabilitatea temporal poate fi necesar mai mult
de o retestare. Utilizarea unui desen sau model de test retestare nu este sensibil pentru
evaluarea fiabilitii msurilor de stat (de fapt, un coeficient de retestare ridicat de test
va fi invalidat caracterul de stat al unui test). n acest caz, toate articolele referitoare ar
trebui marcate nu se aplica.
10.3.1

Sample size
Not applicable

10.3.2

n/a

No information given

One inadequate study (e.g. sample size less than 100)

One adequate study (e.g. sample size of 100-200)

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies

Size of coefficients marimea coeficientilor

Number of scales
(if applicable)

M*

Not applicable

n/a

No information given

[ ]

Inadequate (e.g. r < 0.60)

[ ]

Adequate (e.g. 0.60 r < 0.70)

[ ]

Test Review Form Version 4.2.6

09-04-2013

Page 79

Board of Assessment Report 2013: Document 110c - Annex B

10.3.3

Good (e.g. 0.70 r <0.80)

[ ]

Excellent (e.g. r 0.80)

[ ]

Data provided about the test-retest interval (select or fill in test-retest interval)

Date furnizate cu privire la intervalul de testare-retestare (selectai sau completai


intervalul de testare-retestare)

10.3.4

Not applicable

n/a

No information given

The interval is: intervalul este :

Reliability coefficients are reported with samples which . (select one)

Coeficienii de fiabilitate sunt raportate cu probe care .... (alege unul)

. do not match the intended test takers, leading to more favourable coefficients (e.g.
inflation by artificial heterogeneity)

. match the intended test takers se potrivesc

Not applicable

n/a

nu se potrivesc cu factorii de ncercare destinate, ceea ce duce la coeficieni mai


favorabile (de exemplu, inflaia prin eterogenitate artificial)
. do not match the intended test takers, but effect on size of coefficients is unclear

nu se potrivesc cu factorii de ncercare avute n vedere, dar efectul asupra


dimensiunii coeficienilor este neclar
. do not match the intended test takers, leading to less favourable coefficients (e.g.
reduction by restriction of range)

nu se potrivesc cu factorii de ncercare destinate, ceea ce duce la coeficieni mai


puin favorabile (de exemplu, reducerea prin restrngerea domeniului)

10.4

Equivalence reliability (parallel or alternate forms) echivalea flexibilitatii (forme paralele


sau alternative)

10.4.1

Sample size
Not applicable

Test Review Form Version 4.2.6

n/a

09-04-2013

Page 80

Board of Assessment Report 2013: Document 110c - Annex B

10.4.2

No information given

One inadequate study (e.g. sample size less than 100)

One adequate study (e.g. sample size of 100-200)

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies

Are the assumptions for parallelism* met for the different versions of the test for which
equivalence reliability is investigated?
*Note that tests can be considered to be parallel tests if in the same group the mean scores,
variances and correlations with other tests are the same.

Sunt ipotezele pentru paralelismului * ndeplinite pentru diferitele versiuni ale testului
pentru care echivalena fiabilitatea este investigat? * Reinei c testele pot fi
considerate ca fiind teste paralele dac n acelai grup scorurile medii, varianele i
corelaiile cu alte teste sunt aceleai.
Not applicable

10.4.3

n/a

No information given

Inadequate

Adequate

Good

Excellent

Size of coefficients marimea coeficientilor

Number of scales
(if applicable) nr de scale daca se
aplica

Not applicable

M*
n/a

No information given

[ ]

Inadequate (e.g. r < 0.70)

[ ]

Adequate (e.g. 0.70 r < 0.80)

[ ]

Good (e.g. 0.80 r < 0.90)

[ ]

Excellent (e.g. r 0.90)

[ ]

10.4.4

Reliability coefficients are reported with samples which . (select one)

Coeficienii de fiabilitate sunt raportate cu probe care .... (alege unul)

Test Review Form Version 4.2.6

09-04-2013

Page 81

Board of Assessment Report 2013: Document 110c - Annex B


. do not match the intended test takers, leading to more favourable coefficients (e.g.
inflation by artificial heterogeneity)

. match the intended test takers se potrivesc cu testul

Not applicable nu se aplica

n/a

nu se potrivesc cu factorii de ncercare destinate, ceea ce duce la coeficieni mai


favorabile (de exemplu, inflaia prin eterogenitate artificial)
. do not match the intended test takers, but effect on size of coefficients is unclear

nu se potrivesc cu factorii de ncercare avute n vedere, dar efectul asupra


dimensiunii coeficienilor este neclar
. do not match the intended test takers, leading to less favourable coefficients (e.g.
reduction by restriction of range)

nu se potrivesc cu factorii de ncercare destinate, ceea ce duce la coeficieni mai


puin favorabile (de exemplu, reducerea prin restrngerea domeniului)

10.5

IRT based method

10.5.1

Sample size
It is difficult to give uniform guidelines for the adequacy of sample sizes in case IRT methods
for the estimation of reliability are used, because the requirements are different in function of
the item response format and the item response model used. Dependent on the item response
model used minimum values for adequate sample sizes are: 200 for 1-parameter studies, 400
for 2-parameter studies, and 700 for 3-parameter studies (based on Parshall, Davey, Spray, &
Kalohn, 2001). These values apply to dichotomous models, but can be of some guidance for
the reviewer when polytomous models are used for which the sample sizes may be smaller.

Dimensiunea eantionului Este dificil de a obine orientri uniforme pentru caracterul


adecvat al dimensiunii eantioanelor n metodele de caz IRT pentru estimarea fiabilitii
sunt utilizate, deoarece cerinele sunt diferite n funcie de formatul de rspuns element
i modelul de rspuns element utilizat. n funcie de modelul de rspuns element utilizat
valorile minime pentru dimensiunile "adecvate" prob sunt: 200 pentru studii 1parametru, 400 pentru studii 2-parametru, i 700 pentru studii de 3-parametru (bazat pe
Parshall, Davey, Spray, & Kalohn, 2001). Aceste valori se aplic modelelor dihotomice,
dar poate fi de orientare pentru examinator atunci cnd sunt utilizate modele
polytomous pentru care mrimile eantionului pot fi mai mici. .

Not applicable nu se aplica

n/a

No information given fara informatii

One inadequate study studio neadecvat

One adequate study studio adecvat

Test Review Form Version 4.2.6

09-04-2013

Page 82

Board of Assessment Report 2013: Document 110c - Annex B


One large or more than one adequate sized study

Un studio mare sau mai mult de un studiu de dimensiuni adecvate.


Good range of adequate to large studies
10.5.2

Kind of coefficients reported (select as many as applicable)


The first method gives the reliability of the estimated latent trait which in IRT replaces the
estimated true score, i.e. test score (see Embretson & Reise, 2000). The second method is
based on information about the individual items and gives an estimate of the reliability when the
requirements typical for IRT are met (Mokken, 1971). The third method gives an estimate of the
accuracy of the measurement related to the position on the latent trait.

Un fel de coeficieni raportate (selectai ct mai multe, dup caz) Prima metod d
fiabilitatea trsturii latente estimate care n IRT nlocuiete scorul adevrat estimat,
adic scorul de ncercare (a se vedea Embretson & Reise, 2000). Cea de a doua metod
se bazeaz pe informaii cu privire la elementele individuale i ofer o estimare a
fiabilitii atunci cnd cerinele tipice pentru IRT sunt ndeplinite (Mokken, 1971). A
treia metod ofer o estimare a preciziei de msurare referitoare la poziia de pe
trasatura latenta. .

Reliability of the estimated latent trait

[ ]

Fiabilitatea trsturii latente estimat.

Rho

[ ]

Information function informatie functionala

[ ]

Others, describe:

[ ]

Not applicable nu se aplica

n/a

Test Review Form Version 4.2.6

09-04-2013

Page 83

Board of Assessment Report 2013: Document 110c - Annex B


10.5.3

Size of coefficients (based on the final test length)


Both guidelines for reliability coefficients (including
rho) as for the information function are given. The
guidelines for the information function are based
on those for reliability coefficients since
Information = 1/SE2, and given some often made
assumptions, r = 1 - SE2. Note that SE and
information values are dependent on the value of
the latent trait and that each test has a range
within which the information value is optimal. The
rating should not a priori be based on this optimal
value, but on the information value of the score or
range of scores that are of specific importance
(e.g., critical scores). For these scores the
information value may be optimal, but not
necessarily so. If there are no such scores, the
rating should be based on the mean information
value (see also Reise & Havilund, 2005). Because
there is not much experience with these rules-ofthumb, we advise raters to use these rules with
care.

Number of scales
(if applicable) nr de scale daca se
aplica

Dimensiunea de coeficieni (pe baza lungimii


de ncercare final) Ambele linii directoare
pentru coeficienii de fiabilitate (inclusiv rho)
ca pentru funcia de informaii sunt date.
Liniile directoare pentru funcia de informare
se bazeaz pe cele pentru coeficienii de
fiabilitate din moment ce informaia = 1 /
SE2, i avnd n vedere unele ipoteze fcute
de multe ori, r = 1 - SE2. Reinei c valorile
SE i informare depind de valoarea trsturii
latente i c fiecare test are un domeniu n
care valoarea informaie este optim.
Evaluarea nu trebuie a priori s se bazeze pe
aceast valoare optim, ci pe valoarea
informativ a punctajului sau intervalul de
scoruri care prezint o importan specific
(de exemplu, scoruri critice). Pentru aceste
scoruri valoarea de informaii poate fi optim,
dar nu n mod necesar acest lucru. n cazul n
care nu exist astfel de scoruri, rating-ul ar
trebui s se bazeze pe valoarea medie a
informaiilor (a se vedea, de asemenea, Reise
& Havilund, 2005). Pentru c nu exist prea
mult experien cu aceste reguli de degetul
mare, v sftuim s utilizeze aceste
evaluatorii reguli cu grij. .
Afiai originalul

Test Review Form Version 4.2.6

09-04-2013

Page 84

M*

Board of Assessment Report 2013: Document 110c - Annex B


Not applicable

n/a

No information given

[ ]

Inadequate (e.g. r < 0.70; information < 3.33)

[ ]

[ ]

Good (e.g. 0.80 r < 0.90; 5.00 information <


10.00)

[ ]

Excellent (e.g. r 0.90; information 10.00)

[ ]

Adequate (e.g. 0.70 r < 0.80; 3.33 information


< 5.00)

10.6

Inter-rater reliability
If the scoring of a test involves no judgmental processes (e.g. simply summing the scores of
multiple-choice items), this type of reliability is not required and all items concerning inter-rater
reliability should be marked not applicable. Note that although inter-rater reliability may not
apply to the test as a whole, it may apply to one or more subtests (e.g. some subtests of an
intelligence test).

fiabilitate inter-evaluator n cazul n care punctajul unui test nu implic procese


subiective (de exemplu, pur i simplu nsumarea scorurilor de elemente cu mai multe
variante), acest tip de fiabilitate nu este necesar i toate elementele referitoare la
fiabilitate inter-evaluatori trebuie s fie marcate "nu se aplic." Reinei c, dei
fiabilitatea inter-evaluatori nu se poate aplica la ncercare n ansamblul su, se poate
aplica la unul sau mai multe subteste (de exemplu, unele subteste ale unui test de
inteligenta).
10.6.1

Sample size
Not applicable

10.6.2

n/a

No information given

One inadequate study (e.g. sample size less than 100)

One adequate study (e.g. sample size of 100-200)

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies

Kind of coefficients reported (select as many as applicable)

Un fel de coeficieni raportate

(selectai ct mai multe, dup caz)


Not applicable nu se aplica

n/a

Percentage agree procentaj

[ ]

Coefficient Kappa coefficient kappa

[ ]

Test Review Form Version 4.2.6

09-04-2013

Page 85

Board of Assessment Report 2013: Document 110c - Annex B

10.6.3

Intra Class Correlation corelatia intra-class

[ ]

Coefficient Iota coefficient iota

[ ]

Other, describe: altul,descrieti

[ ]

Size of coefficients
To some methods mentioned in 10.6.2 the guide
numbers may not apply as no rs are computed.

Number of scales
(if applicable) nr de scale daca se
aplica

M*

Dimensiunea de coeficieni la unele metode


menionate n 10.6.2 numerele orientative nu
se pot aplica ca nu r sunt calculate.
Not applicable

n/a

No information given

[ ]

Inadequate (e.g. r < 0.60)

[ ]

Adequate (e.g. 0.60 r < 0.70)

[ ]

Good (e.g. 0.70 r < 0.80)

[ ]

Excellent (e.g. r 0.80)

[ ]

10.7

Other methods of reliability estimation Alte

10.7.1

Sample size

metode de estimare a fiabiliti

Not applicable

n/a

No information given

One inadequate study (e.g. sample size less than 100)

One adequate study (e.g. sample size of 100-200)

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies

10.7.2

Describe method: descrierea metodei

10.7.3

Results rezultate

Number of scales
(if applicable) numarul de scale
daca se aplica

Not applicable nu se aplica

n/a

No information given
Inadequate

Test Review Form Version 4.2.6

M*

09-04-2013

[ ]

[ ]

Page 86

Board of Assessment Report 2013: Document 110c - Annex B

10.8

Adequate

[ ]

Good

[ ]

Excellent

[ ]

Overall Adequacy
This overall rating is obtained by using judgment based on the ratings given for items 10.1
10.7.3. Do not simply average numbers to obtain an overall rating.
For some instruments, internal consistency may be inappropriate (broad traits or scale
aggregates), in which case more emphasis on the retest data should be placed. In other cases
(state measures), retest reliabilities would be inappropriate, so emphasis should be placed on
internal consistencies. For your final judgment you should also take into account:
whether the test is used for individual assessment or to make decisions on groups of people
the nature of the decision (high-stakes vs. low-stakes)
whether one or more (types of) reliability studies are reported
whether also standard errors of measurement are provided
procedural issues, e.g. group size, number of reliability studies, heterogeneity of the
group(s) on which the coefficient are computed, number of raters if inter-rater agreement is
computed, length of the test-retest interval, etc.
comprehensiveness of the reporting on the reliability studies.

Adecvarea global Acest rating de ansamblu este obinut prin utilizarea unei judeci
bazate pe ratingurile acordate pentru articolele 10.1 - 10.7.3. Nu fac pur i simplu
numere medii pentru a obine un rating de ansamblu. Pentru unele instrumente,
consisten intern pot fi inadecvate (trasaturi generale sau agregate la scar), caz n
care ar trebui pus accentul mai mult pe datele retestare. n alte cazuri (msuri de stat),
ar fi retestare Fiabilitate: Fiabilitile inadecvat, deci accentul ar trebui pus pe
consistene interne. Pentru judecata final dac testul estear trebui, de asemenea, s ia
n considerare: utilizat pentru evaluarea individual sau de a lua decizii asupra unor
natura deciziei (cu miz mare vs. mize mici) grupuri de persoane dac una sau mai
multe ( tipuri de) studii de fiabilitate sunt raportate dac erorile de asemenea, de
msurare standard sunt furnizate aspecte procedurale, marimea exemplu de grup,
numrul de studii de fiabilitate, eterogenitatea grupului (e) pe care se calculeaz
coeficientul, numrul de evaluatori dac este inter-evaluatori acord se calculeaz,
lungimea intervalului de testare-retestare etc. comprehensivitii raportarea cu privire
la studiile de fiabilitate.
No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 87

Board of Assessment Report 2013: Document 110c - Annex B

Test Review Form Version 4.2.6

09-04-2013

Page 88

Board of Assessment Report 2013: Document 110c - Annex B

Reviewers comments on Reliability: Underline the strong and weak aspects of the evidence of
reliability available. Comments pertaining to equivalence/reliability generalisation should also be made
here (if applicable).

comentariile evaluatorilor cu privire la fiabilitatea: sublinia aspectele forte i slabe ale probelor
de fiabilitate disponibile. Comentarii referitoare la echivalena / generalizare fiabilitatea ar trebui
s fie, de asemenea, fcute aici (dac este cazul).

Test Review Form Version 4.2.6

09-04-2013

Page 89

Board of Assessment Report 2013: Document 110c - Annex B

11 Validity
General guidance on assigning ratings for this section
Validity is the extent to which a test serves its purpose: can one draw the conclusions from the test scores
which one has in mind? In the literature many types of validity are differentiated, e.g. Drenth and Sijtsma
(2006, p. 334 340) mention eight different types. The differentiations may have to do with the purpose of
validation or with the process of validation by specific techniques of data analysis. In the last decades of
the past century there was a growing consensus that validity should be considered as a unitary concept
and that differentiations in types of validity should be considered as different ways of gathering evidence
only (American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education, 1999). Borsboom, Mellenbergh, and Van Heerden (2004) state
that a test is valid for measuring an attribute if variation in the attribute causally produces variation in the
measured outcomes. Although this is a different approach, also in the opinion of these authors a
differentiation between types a validity is not relevant.

Valabilitatea
Orientare general privind atribuirea ratingurilor pentru aceast seciune validitate este msura n
care un test servete scopului su: se poate trage concluzii din rezultatele testelor pe care o are n
minte? n literatura de specialitate sunt difereniate multe tipuri de valabilitate, de exemplu,
Drenth i Sijtsma (2006, p 334 -. 340) menioneaz opt tipuri diferite. Diferenierile pot avea de a
face cu scopul validrii sau cu procesul de validare prin tehnici specifice de analiz a datelor. n
ultimele decenii ale secolului trecut a existat un consens n cretere c valabilitatea ar trebui
considerat drept un concept unitar i c diferenierile n tipuri de valabilitate ar trebui s fie
considerate ca fiind diferite modaliti de a reuni numai probe (American Research Association
Educational, American Psychological Association, & Consiliul naional pentru Msurarea n
Educaie, 1999). Borsboom, Mellenbergh, i Van Heerden (2004) afirm c un test este valabil
pentru msurarea unui atribut n cazul n care variaia n atribut produce variaii n cauzal
rezultatele msurate. Cu toate c aceasta este o abordare diferit, de asemenea, n opinia acestor
autori o difereniere ntre tipurile de o perioad de valabilitate nu este relevant.
However, whichever approach to validity one prefers, for a standardised judgment it is necessary to
structure the concept of validity a bit. For this reason, separate sub-sections on construct and criterion
validity are differentiated. Depending on the purpose of the test one of these aspects of validity may be
more relevant than the other. However, it is realized that construct validity is the more fundamental
concept and that evidence on criterion validity may add to establishing the construct validity of a test.
It is realized also, that a test may have different validities depending on the type of decisions made with
the test, the type of samples used, etc. However, inherent in a test review system is that one quality
judgment is made about the (construct or criterion) validity of a test. This judgment should be a reflection
of the quality of the evidence supporting the claim that the test can be used for the interpretations that are
stated in the manual. The broader the intended applications, the more validity evidence the
author/publisher should deliver. Note that the final rating for construct and criterion validity will be a kind of
average of this evidence and that there may be situations or groups for which the test may have higher or
lower validities (or for which the validity may not have been studied at all).

Cu toate acestea, n funcie de abordare a validitii se prefer, pentru o hotrre standardizat,


Test Review Form Version 4.2.6

09-04-2013

Page 90

Board of Assessment Report 2013: Document 110c - Annex B

este necesar structurarea conceptului de valabilitate un pic. Din acest motiv, sub-seciuni
separate privind construcia i validitatea criteriu sunt difereniate. n funcie de scopul testului
una dintre aceste aspecte de valabilitate poate fi mai relevant dect cealalt. Cu toate acestea, se
nelege c validitatea de construct este conceptul fundamental i c dovezile privind
valabilitatea criteriu poate aduga la stabilirea validitii de construct a unui test. Se realizeaz, de
asemenea, ca un test poate avea diferite n funcie de elemente valide tipul deciziilor luate cu
testul, tipul de probe utilizate etc. Cu toate acestea, inerente ntr-un sistem de revizuire de testare
este c o hotrre de calitate se face despre (construct sau criteriu) validitatea unui test. Aceast
hotrre ar trebui s fie o reflectare a calitii probelor ce susin afirmaia c testul poate fi utilizat
pentru interpretrile care sunt menionate n manual. Mai largi aplicaiile avute n vedere, mai
multe probe de valabilitate autorul / editorul ar trebui s livreze. Reinei c ratingul final pentru
construct i criteriu de valabilitate va fi un fel de medie a acestei probe i c pot exista situaii sau
grupuri pentru care testul
When an instrument has been translated and/or adapted from a non-local context, evidence of
equivalence of the measure in a new language to the original should be proposed. Without this it is not
possible to generalise findings in one country/language version to another. Examples of equivalent
evidence:
Invariance in construct structure e.g. via factor structure or correlation with standard measures.
Similar criterion related validity e.g. similar profile of correlations of a multi-scale instrument with
independent external criterion such as ratings of job competencies.
Items show similar patterns of scale loadings e.g. items correlate in same pattern with other scales;
strongest/weakest loading items are similar in original and new languages.
Bilingual candidates have similar profiles in two languages (c.f. alternate form reliability).

Atunci cnd un instrument a fost tradus i / sau adaptat dintr-un context non-locale, ar trebui
propuse dovezi ale echiva-LENCE a msurii ntr-o nou limb cu originalul. Fr acest lucru nu
este posibil de a generaliza concluziile ntr-o singur ar versiune / limb n alta. Exemple de
probe echivalente: invariana n structura de construct - de exemplu, prin structura factorului sau
corelare cu msurile standard. La fel validitatea legat criteriu - de exemplu, profil similar de
corelaii ale unui instrument multi-scar cu criteriu extern independent - precum ratingurile
competene de locuri de munc. Elemente arat modele similare de ncrcri la scar de
exemplu elemente n acelai model sunt corelate cu alte scale; Cele mai puternice / cele mai slabe
elemente de ncrcare sunt similare n limbile originale i noi. candidaii bilingve au profiluri
similare n dou limbi (cf. form alternativ de fiabilitate
Afiai originalul
Validity generalisation needs stronger evidence when translating tests across linguistic families (e.g. from
an Indo-European to a Semitic language). In such a situation equivalence is under greater threat because
of the differences in language structure and cultural differences. However, validity generalisation might be
inferred from evidence of validity invariance in previous translations when a test has been translated into
multiple languages. For instance, if a Swedish test has already been translated into French, German and
Italian and has been shown to have equivalence in these languages.
In considering the whole issue of equivalence, it may be useful to follow Van de Vijver and Poortingas
(2005) classification:

Valabilitate generalizare are nevoie de dovezi mai puternice atunci cnd traducerea de teste in
intreaga familii lingvistice (de exemplu, dintr-un indo-european ntr-o limb semitic). ntr-o
astfel de echivalen situaie se afl sub o ameninare mai mare, din cauza diferenelor n
Test Review Form Version 4.2.6

09-04-2013

Page 91

Board of Assessment Report 2013: Document 110c - Annex B

structura limbii i diferenele culturale. Cu toate acestea, validitatea generalizare se poate deduce
din probe invarianei validitii n traducerile anterioare atunci cnd un test a fost tradus n mai
multe limbi. De exemplu, n cazul n care un test suedez a fost deja tradus n francez, german i
italian i a fost dovedit a avea echivalena n aceste limbi. Lund n considerare ntreaga
problem a echivalenei, poate fi util s se urmeze Van de Vijver i (2005) clasificarea lui
Poortinga:
Structural / functional equivalence
There is evidence that the source and target language versions measure the same psychological
constructs across groups. This is generally demonstrated by showing that patterns of correlations
between variables are the same across groups.
Measurement unit equivalence
There is evidence that the measurement units are the same, but there are different origins across
groups (i.e. individual differences found in group A can be compared with differences found in group
B, but the absolute raw scores for A and B are not directly comparable without some form of
rescaling).
Scalar / Full score equivalence
The same measurement unit and the same origin (i.e. raw scores have the same meanings and can
be compared across groups).

structurale Exist dovezi c versiunile surs i/ funcionale de echivalen limba int msoar
aceleai constructe psihologice n cadrul tuturor grupurilor. Acest lucru este, n general,
demonstrat prin care s arate c modelele de corelaii ntre variabile sunt aceleai grupuri de
peste. Unitate de msur echivalen Exist dovezi c unitile de msur sunt aceleai, dar
exist diferite origini n cadrul tuturor grupurilor (de exemplu, diferenele individuale gsite n
grupa A poate fi comparat cu diferenele constatate n grupa B, dar scorurile brute absolute pentru
A i B nu sunt direct comparabile, fr o anumit form de rescaling). aceeai unitate de
msur i descalar / completa scor de echivalen aceeai origine (adic scorurile brute au
aceleai semnificaii i pot fi comparate n toate grupurile).
The benchmarks and the notes in the sub-sections 11.1 and 11.2 provide some guidance on the values to
be associated with inadequate, adequate, good and excellent ratings. However these are intended to act
as guides only. The nature of the instrument, its area of application, the quality of the data on which
validity estimates are based, and the types of decisions that it will be used for should all affect the way in
which ratings are awarded. For validity, guidelines on sample sizes are based on power analysis of the
sample sizes needed to find moderate sized validities if they exist.

Criteriile de referin i notele n sub-seciunile 11.1 i 11.2 ofer unele ndrumri cu privire la
valorile care urmeaz s fie asociate cu evaluri inadecvate, adecvate, bune i foarte bune. Totui,
acestea sunt destinate s acioneze ca numai ghidaje. Natura instrumentului, aria de aplicare a
acestuia, calitatea datelor pe care se bazeaz estimrile de valabilitate, precum i tipurile de
decizii pe care va fi utilizat pentru toate ar trebui s afecteze modul n care sunt acordate evaluri.
Pentru validitatea, orientri privind dimensiunile eantioanelor se bazeaz pe analiza puterii
dimensiunilor eantioanelor necesare pentru a gsi de dimensiuni moderate, n cazul n care
elemente valide acestea exist.

Test Review Form Version 4.2.6

09-04-2013

Page 92

Board of Assessment Report 2013: Document 110c - Annex B

11.1

Construct validity

The purpose of construct validation is to find an answer to the question whether the test actually measures
the intended construct or, partly or mainly, something else. Common methods for the investigation of
construct validity are exploratory or confirmatory factor analysis, item-test correlations, comparison of
mean scores of groups for which score differences may be expected, testing for invariance of factor
structure and item-bias (DIF) for different groups, correlations with other instruments which are intended to
measure the same (convergent validity) or different constructs (discriminant validity), Multi-Trait-MultiMethod research (MTMM), IRT-methodology and (quasi-)experimental designs.

Constructul
Validitatea Scopul validrii construct este de a gsi un rspuns la ntrebarea dac testul msoar
efectiv construcia preconizat sau, parial sau n principal, altceva. Metode comune de
investigare a validitii de construct sunt analize exploratorii sau de confirmare factor, corelaiile
element-test, compararea scorurilor medii ale grupurilor pentru care scorul diferenele pot fi de
ateptat, testarea pentru invarianta a structurii factorului i elementul-prejudecat (DIF) pentru
diferite grupuri , corelaii cu alte instrumente, care sunt destinate s msoare acelai (validitatea
convergent) sau diferitelor construcii (validitate discriminant), cercetarea multi-trsturMulti-Metoda (MTMM), IRT-metodologia i (cvasi-), desene sau modele experimentale.

11.1

Construct validity

11.1.1

Designs used (select as many as are applicable)


No information is supplied

[ ]

Exploratory Factor Analysis

[ ]

Confirmatory Factor Analysis

[ ]

(Corrected) item-test correlations

[ ]

Testing for invariance of structure and differential item functioning across groups

[ ]

Differences between groups

[ ]

Correlations with other instruments and performance criteria

[ ]

MTMM correlations

[ ]

IRT methodology

[ ]

(Quasi-)Experimental Designs

[ ]

Other, describe:

[ ]

Test Review Form Version 4.2.6

09-04-2013

Page 93

Board of Assessment Report 2013: Document 110c - Annex B


11.1.2

Do the results of (exploratory or confirmatory) factor analysis support the structure of the
test?

Fac rezultatele analizei factorului (confirmare exploratoriu sau) sprijin structura


testului?

11.1.3

No information given

Inadequate

Adequate

Good

Excellent

Do the items correlate sufficiently well with the (sub)test score?


Note that very high correlations may mean that items are more or less synonymous and that
the concept measured may be very narrow (a so-called bloated specific)

Face elementele se coreleaz suficient de bine cu (sub) scor de test? Reinei c


corelaii foarte mari poate nsemna c elementele sunt mai mult sau mai puin
sinonime i c conceptul msurat poate fi foarte ngust (aa-numitul "umflat
specific")

11.1.4

No information given

Inadequate

Adequate

Good

Excellent

Is the factor structure invariant across groups and/or is the test free of item-bias (DIF)?
This kind of research can be carried out on basis of models within classical test theory or the
IRT framework. If item-bias is found, the effect on the total score should be estimated (small
effects are acceptable).

Este structura factorului invariante la nivelul grupurilor i / sau este testul liber de
element-prtinire (DIF)? Acest tip de cercetare poate fi realizat pe baza modelelor
din cadrul teoriei clasice de testare sau cadrul IRT. Dac elementul-prejudecat este
gsit, efectul asupra punctajului total ar trebui s fie estimat (mici efecte sunt
acceptabile).

No information given

Inadequate

Adequate

Test Review Form Version 4.2.6

09-04-2013

Page 94

Board of Assessment Report 2013: Document 110c - Annex B

11.1.5

Good

Excellent

Are differences in mean scores between relevant groups as expected?


E.g. pupils in group 8 are expected to score higher than pupils in group 6 on a test for
numerical proficiency; children with the diagnosis ADHD should score higher on a test for
hyperactivity than children not diagnosed with ADHD; salespersons should score higher on a
test for commercial knowledge than the average working population. Even though the
results are in the expected direction, this kind of research usually is inconclusive with
respect to the construct validity of the test. However, the value of this kind of research is that
when the expected differences are not shown, this would raise strong doubts about the
construct validity of the test.

Diferene n ceea ce scorurile medii ntre grupurile relevante cum era de ateptat?
elevii de exemplu, n grupul 8 sunt de ateptat s scor mai mare dect elevii din
grupa 6 pe un test pentru competen numeric; copiii cu ADHD diagnostic ar trebui
s scor mai mare pe un test de hiperactivitate decat copiii care nu diagnosticati cu
ADHD; ar trebui s salespersons un scor mai mare pe un test de cunotine
comerciale dect populaia activ medie. Chiar dac rezultatele sunt n direcia
ateptat, acest tip de cercetare, de obicei, este neconcludent n ceea ce privete
validitatea de construct a testului. Cu toate acestea, valoarea acestui tip de cercetare
este c, atunci cnd diferenele ateptate nu sunt afiate, acest lucru s-ar ridica
ndoieli serioase cu privire la validitatea de construct a testului.
Afiai originalul

No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 95

Board of Assessment Report 2013: Document 110c - Annex B


11.1.6

Median and range of the correlations between the test and tests measuring similar
constructs
An essential element of the process of construct validation is correlating the test score(s)
with scales from similar instruments, the so-called congruent or convergent validity. The
guidelines on congruent validity coefficients need to be interpreted flexibly. Where two very
similar instruments have been correlated (with data obtained concurrently) we would expect
to find correlations of 0.60 or more for adequate. Where the instruments are less similar, or
administration sessions are separated by some time interval, lower values may be
adequate. When evaluating congruent validity, care should be taken when interpreting very
high correlations. When correlations are above 0.90, the likelihood is that the scales in
question are measuring exactly the same construct. This is not a problem if the scales in
question represent a new scale and an established marker. It would be a problem though, if
the scale(s) in question was (were) meant to be adding useful variance to what other scales
already measure. The guidelines given concern correlations that are not adjusted for
common-method variance or attenuation. Therefore, also the reliabilities of both instruments
should be taken into account when judging the congruent validity coefficients. E.g., when
both instruments have a reliability of .75, the maximum correlation between the instruments
is .56. If reliabilities are higher, higher correlations are to be expected.

Median i intervalul corelaiilor dintre testele i ncercrile de msurare Construcii


similare Un element esenial al procesului de validare construct este corelarea
punctajului (e) de testare cu solzi de la instrumente similare, aa-numita validitatea
congruente sau convergente. Liniile directoare privind coeficienii de validitate
congruente trebuie interpretate n mod flexibil. n cazul n care au fost corelate cu
dou instrumente foarte asemntoare (cu datele obinute n acelai timp) ne-ar
atepta s gseasc corelaii de 0,60 sau mai mult pentru "adecvat". n cazul n care
instrumentele sunt mai puin similare, sau sesiuni de administrare sunt separate
printr-un interval de timp, valori mai mici pot fi adecvate. Atunci cnd se evalueaz
validitatea congruente, trebuie avut grij atunci cnd se interpreteaz corelaii foarte
ridicate. Atunci cnd corelaiile sunt peste 0,90, probabilitatea este c balana n
cauz msoar exact acelai construct. Aceasta nu este o problem n cazul n care
balana n cauz reprezint o nou scar i un marcator de stabilit. Ar fi o problem,
dei, n cazul n care scara (e) n cauz a fost (au fost) destinate a fi adugarea de
variaie util la ceea ce alte scale msoar deja. Liniile directoare avnd n vedere
preocuparea corelrile care nu sunt ajustate pentru variaia comun-metod sau
atenuare. Prin urmare, de asemenea ambelor instrumente Fiabilitate: Fiabilitile

11.1.7

No information given

Inadequate (r < 0.55)

Adequate (0.55 r < 0.65)

Good (0.65 r < 0.75)

Excellent (r 0.75)

Do the correlations with other instruments show good discriminant validity with respect to
constructs that the test is not supposed to measure?

Fac corelaiile cu alte instrumente demonstreaz o bun validitate discriminant n


ceea ce privete constructele c testul nu ar trebui s se msoare?
Test Review Form Version 4.2.6

09-04-2013

Page 96

Board of Assessment Report 2013: Document 110c - Annex B

11.1.8

No information given

Inadequate

Adequate

Good

Excellent

If a Multi-Trait-Multi-Method design is used, do the results support the construct validity of


the test (does it really measure what it is supposed to measure and not something else)?
Note that if an MTMM design is used, research as mentioned in 11.1.6 and 11.1.7 may not
be required anymore.

n cazul n care este utilizat un design multi-Trstur-Multi-Metoda, face rezultatele


susin validitatea de construct a testului (nu este cu adevrat msoar ceea ce trebuie
s se msoare i nu altceva)? Reinei c, n cazul n care este utilizat un design
MTMM, cercetare, aa cum sa menionat n 11.1.6 i 11.1.7 pot s nu mai fie
necesar.

11.1.9

No information given

Inadequate

Adequate

Good

Excellent

Other, e.g. IRT-methodology, (quasi-)experimental designs (describe):

Altele, de exemplu, IRT-metodologie, (cvasi-) modele experimentale (descriei):

No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 97

Board of Assessment Report 2013: Document 110c - Annex B


11.1.10

Sample sizes
The guidelines below concern studies within the classical test theory framework. For the
estimation of item-parameters within IRT methodology adequate sample sizes are: more
than 200 for 1-parameter studies, more than 400 for 2-parameter studies and more than 700
for 3-parameter studies (based on Parshall, Davey, Spray, & Kalohn, 2001).

Eantion de dimensiuni Indicaiile de mai jos se refer la studii n cadrul teoriei


clasice de testare. Pentru estimarea elementelor parametrilor din cadrul metodologiei
IRT dimensiuni "adecvate" eantion sunt: mai mult de 200 de studii 1-parametru,
mai mult de 400 de studii 2-parametru i mai mult de 700 pentru studii 3-parametru
(bazat pe Parshall, Davey, Spray, & Kalohn, 2001).

11.1.11

No information given

One inadequate study (e.g. sample size less than 100)

One adequate study (e.g. sample size of 100-200)

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies

Quality of instruments as criteria or markers

Calitatea instrumentelor drept criterii sau markeri


No information given

Inadequate quality

Adequate quality

Good quality

Excellent quality with wide range of relevant markers for convergent and divergent
validation

Calitate excelenta cu o gam larg de markeri relevani pentru convergente i


divergente de validare

Test Review Form Version 4.2.6

09-04-2013

Page 98

Board of Assessment Report 2013: Document 110c - Annex B


11.1.12

How old are the validity studies?


It is difficult to formulate a general rule for taking the age of the research into account. For
tests that intend to measure constructs in an area on which important theoretical
developments have taken place, 15 year old research may be almost useless, whereas for
other tests 20 year old (or even older) research still may be relevant.

Ct de vechi sunt studiile de valabilitate? Este dificil s se formuleze o regul


general pentru a lua vrsta cercetrii n considerare. Pentru testele pe care
intenioneaz s le msoare constructe ntr-o zon pe care au avut loc evoluii
teoretice importante, 15 ani de cercetare veche poate fi aproape inutil, n timp ce
pentru alte teste de 20 de ani (sau chiar mai mari) de cercetare nc mai pot fi
relevante.

Number of years numarul de ani


11.1.13

Construct validity - Overall adequacy


This overall rating is obtained by using judgment based on the ratings given for items 11.1.1
11.1.12. Do not simply average numbers to obtain an overall rating.
In addition to the outcomes of the construct validity research, for your final judgment you
should also take into account whether analysis techniques are used correctly (e.g. is the
significance level corrected for correlating the instrument to other instruments without clear
hypotheses, so-called fishing), whether the research samples are similar to the group(s) for
which the test is intended (e.g., more heterogeneity will inflate correlations, samples of
students may give results that cannot be generalized), the size of the research sample(s),
the quality of other instruments that are used (e.g. in convergent and discriminant validity
research), and the age of the studies.

Validitatea de construcie - capacitatea general Acest rating de ansamblu este


obinut prin utilizarea unei judeci bazate pe rating acordate pentru articolele 11.1.1
- 11.1.12. Nu fac pur i simplu numere medii pentru a obine un rating de ansamblu.
n plus fa de rezultatele cercetrii validitii de construct, pentru judecata final ar
trebui, de asemenea, s ia n considerare dac tehnicile de analiz sunt utilizate n
mod corect (de exemplu, este nivelul de semnificaie corectat pentru corelarea
instrumentului cu alte instrumente fr ipoteze clare, aa-numitele "pescuit '), dac
probele de cercetare sunt similare cu grupul (e) pentru care este destinat testul (de
exemplu, mai eterogenitatea va umfla corelri, eantioanele de elevi pot da rezultate
care nu pot fi generalizate), mrimea eantionului de cercetare (s ), calitatea altor
instrumente care sunt folosite (de exemplu, n convergent i validitatea
discriminant de cercetare), precum i vrsta studiilor.

No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 99

Board of Assessment Report 2013: Document 110c - Annex B

11.2

Criterion-related validity

Criterion-related evidence of validity (concurrent and predictive validity) refers to studies where real-world
criterion measures (i.e. not other instrument scores) have been correlated with scales. Predictive studies
generally refer to situations where assessment was carried out at a qualitatively different point in time to
the criterion measurement - e.g. for a work-related selection measure intended to predict job success, the
instrument would have been carried out at the time of selection - rather than just being a matter of how
long the time interval was between instrument and criterion measurement. Studies can also be postdictive, for example, where scores on a potential selection test are correlated with job incumbents earlier
line manager ratings of performance. Basically, evidence of criterion validity is required for all kinds of
tests. However, when it is explicitly stated in the manual that test use does not serve prediction purposes
(such as educational tests that measure progress), criterion validity can be considered not applicable.

valabilitatea probelor legate de criterii legate de criteriul de validitate (validitate concurent i


predictiv) se refer la studii n cazul n care msurile de criteriu lumea real (de exemplu, nu alte
scoruri de instrumente) au fost corelate cu solzi. Studiile predictive se refer n general la
situaiile n care evaluarea a fost efectuat ntr-un punct "calitativ" diferit n timp util pentru
msurarea criteriului - de exemplu, pentru o msur de selecie legate de locul de munc destinat
pentru a prezice succesul de locuri de munc, instrumentul ar fi fost efectuat la momentul
selecie - mai degrab dect doar a fi o chestiune de ct timp intervalul de timp a fost ntre
instrument de msurare i criteriu. Studiile pot fi, de asemenea, "post-dictive", de exemplu, n
cazul n care scorurile pe un test de selecie potenial sunt corelate cu ale operatorilor tradiionali
de locuri de munc "anterioare rating manager de linie de performan. Practic, este necesar
dovada de valabilitate criteriu pentru toate tipurile de teste. Cu toate acestea, n cazul n care este
menionat n mod explicit n manualul de utilizare care de testat nu servesc unor scopuri de
predicie (cum ar fi testele de nvmnt, care msoar progresul), validitatea criteriu poate fi
considerat "nu este cazul".

11.2

Criterion-related validity

11.2.1

Type of criterion study or studies (select as many as are applicable)

Tipul de studiu criteriu sau studii (selectai ct mai multe sunt aplicabile)
Predictive predictiva

Concurrent concomitenta

Post-dictive post-dictiva
11.2.2

Sample sizes
Marimea esantionului
No information given

One inadequate study (e.g. sample size less than 100)

Test Review Form Version 4.2.6

09-04-2013

Page 100

Board of Assessment Report 2013: Document 110c - Annex B

11.2.3

One adequate study (e.g. sample size of 100-200)

One large (e.g. sample size more than 200) or more than one adequate sized study

Good range of adequate to large studies

Quality of criterion measures

Calitatea criteriilor de masurare


No information given

Inadequate quality

Adequate quality

Good quality

Excellent quality with respect to reliability and representation of the criterion construct
11.2.4

Strength of the relation between the test and criteria


It is difficult to set clear criteria for rating the size of the criterion validity coefficients of an
instrument. A criterion-related validity of 0.20 can have considerable utility in some situations,
while one of 0.40 might be of little value in others. A coefficient of .30 may be considered good
in personnel selection, whereas in educational situations higher coefficients are common. For
these reasons, ratings should be based on your judgment and expertise as a reviewer and
not simply derived by averaging sets of correlation coefficients. The guidelines given are
based on Hemphill (2003; see also Meyer et al., 2001) and concern correlations that are not
corrected for attenuation in either the predictor or the criterion. However, coefficients may be
corrected for restriction of range.

Test Review Form Version 4.2.6

09-04-2013

Page 101

Board of Assessment Report 2013: Document 110c - Annex B


The ranges given below concern validity coefficients, because correlations between tests and
criteria are the most used way to represent criterion validity. However, particularly for use in
clinical situations data on the sensitivity and the specificity of a test may give more useful
information on the relation between a test and a criterion. ROC-curves are a popular way of
quantifying the sensitivity and specificity. Swets (1988) presents an overview of values of
ROC-curves in different areas. For certain types of medical diagnosis the values are
between .81 and .97, for lie detection between .70 and .95, and for educational achievement
(pass/fail) between .71 and .94. These values may be used as guidelines, but it is left to the
expertise of the reviewer to decide to what extent the test can make a useful contribution to
the decision concerned. Also when still other indices are reported, such as the positive and
negative predictive value of a test, the likelihood ratio, etc.

Rezistena relaiei dintre test i criterii Este dificil s se stabileasc criterii clare de
rating mrimea coeficienilor de validitate criteriul unui instrument. O valabilitate
legat de criterii de 0,20 pot avea utilitate considerabil n unele situaii, n timp ce
unul de 0,40 ar putea fi de mic valoare n altele. Un coeficient de 0.30 poate fi
considerat bun de selecie a personalului, n timp ce n situaii educaionale
coeficieni mai mari sunt comune. Din aceste motive, evaluri ar trebui s se bazeze
pe judecata i expertiza ca referent i nu pur i simplu derivate prin calcularea mediei
de seturi de coeficieni de corelaie. Orientrile prezentate se bazeaz pe Hemphill
(2003; a se vedea, de asemenea, Meyer i colab., 2001) i se refer la corelaii care nu
sunt corectate pentru atenuarea fie predictor sau criteriu. Cu toate acestea, coeficieni
pot fi corectate pentru restrngerea domeniului.
Intervalele prezentate mai jos se refer la coeficienii de validitate, deoarece corelaii
ntre teste i criterii sunt cel mai utilizat mod de a reprezenta validitatea criteriu. Cu
toate acestea, n special pentru utilizarea n situaii clinice de date privind
sensibilitatea i specificitatea unui test poate da mai util n formaie pe relaia dintre
un test i un criteriu. ROC-curbe sunt o modalitate populara de cuantificare a
sensibilitii i specificitii. Swets (1988) prezint o trecere n revist a valorilor ROC
curbe n diferite zone. Pentru anumite tipuri de diagnostic medical valorile sunt ntre
0.81 i 0.97, pentru detectarea ntre minciun 0.70 i 0.95, precum i pentru realizarea
de invatamant (treci / nu) ntre 0.71 i 0.94. Aceste valori pot fi utilizate ca linii
directoare, dar este lsat la expertiza recenzentului de a decide n ce msur testul
poate aduce o contribuie util la decizia n cauz. De asemenea, atunci cnd sunt
raportate nc ali indici, cum ar fi valoarea pozitiv i negativ pre-dictive unui test,
raportul probabilitate etc.

No information given

Inadequate (r < 0.20)

Adequate (0.20 r < 0.35)

Good (0.35 r < 0.50)

Excellent (r 0.50)

Test Review Form Version 4.2.6

09-04-2013

Page 102

Board of Assessment Report 2013: Document 110c - Annex B


11.2.5

How old are the validity studies?


It is difficult to formulate a general rule for taking the age of the research into account. For
tests that intend to predict behaviour in rapidly changing environments, 15 year old research
may be almost useless, whereas for other tests 20 year old (or even older) research may still
be relevant.

Ct de vechi sunt studiile de valabilitate? Este dificil s se formuleze o regul


general pentru a lua vrsta cercetrii n considerare. Pentru testele pe care
intenioneaz s le prezice comportamentul n medii n schimbare rapid, n vrst de
15 ani de cercetare poate fi aproape inutil, n timp ce pentru alte teste de 20 de ani
(sau chiar mai vechi) de cercetare pot fi nc relevante.
Number of years numarul de ani
11.2.6

Criterion-related validity Overall adequacy


This overall rating is obtained by using judgment based on the ratings given for items 11.2.1
11.2.5. Do not simply average numbers to obtain an overall rating.
Apart from the outcomes of the criterion validity research, for your final judgment you should
also take into account whether the right procedures and analysis techniques are used (e.g. is
there criterion contamination, correction for attenuation, cross-validation), whether the
research samples are similar to the group(s) for which the test is intended (e.g. correction for
restriction of range), the size of the research sample(s), the quality of the criterion instruments
that are used (e.g. is there criterion deficiency), and the age of the studies.

legate de criterii de valabilitate - capacitatea general Acest rating de ansamblu este


obinut prin utilizarea unei judeci bazate pe rating acordate pentru articolele 11.2.1 11.2.5. Nu fac pur i simplu numere medii pentru a obine un rating de ansamblu. n
afar de rezultatele cercetrii validitii criteriului, pentru judecata final ar trebui s
ia n considerare dac se utilizeaz procedurile corecte i tehnici de analiz (de
exemplu, exist contaminare criteriu, corecie pentru atenuare, validare ncruciat),
dac probele de cercetare sunt similar cu grupul (e) pentru care este destinat testul (de
exemplu, corectia pentru limitarea domeniului), mrimea eantionului (e) de
cercetare, calitatea instrumentelor criteriu care sunt folosite (de exemplu, exist deficit
de criteriu) i vrsta studiillor

No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 103

Board of Assessment Report 2013: Document 110c - Annex B

11.3 Overall validity


When judging overall validity, it is important to bear in mind the importance placed on construct validity as
the best indicator of whether a test measures what it claims to measure. In some cases, the main
evidence of this could be in the form of criterion-related studies. Such a test might have an adequate or
better rating for criterion-related validity and a less than adequate one for construct validity. In general the
rating for Overall Validity will be equal to either the Construct Validity or the Criterion-related Validity,
whichever is the greater. However, depending on the purpose of the test, one of these types of evidence
may be considered more relevant than the other. The rating for Overall Validity should not be regarded as
an average or as the lowest common denominator.

valabilitate global Atunci cnd judeca validitatea general, este important s se in cont de
importana acordat validitii de construct ca cel mai bun indicator dac un test msoar ceea ce
pretinde pentru a msura. n unele cazuri, dovada principal a acestei ar putea fi sub forma unor
studii legate de criteriu. Un astfel de test ar putea avea un rating de "adecvat" sau mai bine pentru
validitatea legat de criterii i o mai mic dect una adecvat pentru validitatea de construct. n
general, ratingul pentru valabilitate global va fi egal fie valabilitatea Construct sau valabilitatea
legat de criterii, oricare dintre acestea este mai mare. Cu toate acestea, n funcie de scopul
testului, unul dintre aceste tipuri de probe pot fi considerate mai relevante dect cealalt.
Evaluarea pentru valabilitate n general nu ar trebui s fie considerat ca o medie sau ca cel mai
mic numitor comun.

11.3

Validity Overall adequacy


This overall rating is obtained by using judgment based on the ratings given for items 11.1.1
11.2.6. Do not simply average numbers to obtain an overall rating.

Valabilitate - capacitatea general Acest rating de ansamblu este obinut prin utilizarea
unei judeci bazate pe ratingurile acordate pentru elementele 11.1.1 -11.2.6. Nu fac pur
i simplu numere medii pentru a obine un rating de ansamblu.

No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 104

Board of Assessment Report 2013: Document 110c - Annex B


Reviewers comments on validity (all the evidence of validity included). Comments pertaining to
equivalence/validity generalisation should also be made here (if applicable).

observaiile evaluatorilor cu privire la valabilitatea (toate dovezile de valabilitate incluse).


Comentarii referitoare la generalizarea echivalenei / validitii ar trebui, de asemenea, fcute
aici (dac este cazul).

Test Review Form Version 4.2.6

09-04-2013

Page 105

Board of Assessment Report 2013: Document 110c - Annex B

12

Quality of computer generated reports

Judging computer-based reports is made difficult by the fact that many suppliers will, understandably, wish
to protect their intellectual property in the algorithms and scoring rules. In practice, sufficient information
should be available for review purposes from the technical manual describing the development of the
reporting process and its rationale, and through the running of a sample of test cases of score
configurations. Ideally the documentation should also describe the procedures that were used to test the
report generation for accuracy, consistency and relevance. For the purpose of reviewing at least three
reports based on different score profiles including the actual scores should be provided, even if the
algorithms for generating the reports are confidential.
For each of the following attributes, some questions are stated that should help you make a judgment, and
a definition of an excellent (4) rating is provided.

Calitatea de rapoarte generate de calculator Judecnd rapoarte bazate pe computer este ngreunat
de faptul c muli furnizori vor, de neles, doresc s protejeze proprietatea intelectual n
algoritmii i regulile de notare. n practic, informaii suficiente ar trebui s fie disponibile n
scopuri de revizuire din manualul tehnic care descrie evoluia procesului de raportare i motivaia
acestuia, precum i prin rularea unui eantion de cazuri de testare de configuraii de scor. n mod
ideal, documentaia ar trebui s descrie, de asemenea, procedurile care au fost utilizate pentru a
testa generarea rapoartelor pentru acuratee, consisten i relevan. n scopul de a revizui cel
puin trei rapoarte bazate pe profiluri de scor diferite, inclusiv scorurile reale ar trebui s fie
furnizate, chiar dac algoritmii pentru generarea rapoartelor sunt confideniale. Pentru fiecare
dintre urmtoarele atribute, unele ntrebri sunt a declarat c ar trebui s v ajute s fac o
judecat, i este prevzut o definiie a unui (4) rating "excelent".
Afiai originalul

Items to be rated n/a or 0 to 4, benchmarks are provided for an excellent (4) rating.
12.1

Scope or coverage
Reports can be seen as varying in both their breadth and their specificity. Reports may also vary
in the range of people for whom they are suitable. In some cases it may be that separate tailored
reports are provided for different groups of recipients.

Does the report cover the range of attributes measured by the instrument?

Does it do so at a level of specificity justifiable in terms of the level of detail obtainable from
the instrument scores?

Can the 'granularity' of the report (i.e. the number of distinct score bands on a scale that are
used to map onto different text units used in the report) be justified in terms of the scales
measurement errors?

Is the report designed for the same populations of people for whom the instrument was
developed? (e.g. groups for whom the norm groups are relevant, or for whom there is
relevant criterion data etc.).

Domeniul de aplicare sau de acoperire Rapoartele pot fi vzute ca variind att limea lor
i specificitatea lor. Rapoartele pot varia, de asemenea, n intervalul de oameni pentru
care acestea sunt adecvate. n unele cazuri, se poate ntmpla ca rapoarte adaptate
Test Review Form Version 4.2.6

09-04-2013

Page 106

Board of Assessment Report 2013: Document 110c - Annex B

distincte sunt furnizate pentru diferite grupuri de destinatari. Are raportul acoper
intervalul de atribute msurate de instrument? Are face acest lucru la un nivel de
specificitate justificat n ceea ce privete nivelul de detaliere poate fi obinut din
scorurile instrumentului? Poate "granularitatea" a raportului (adic numrul de benzi de
scor distincte pe o scara care sunt folosite pentru a mapa pe diferite uniti de text
folosite n raport) s fie justificate n ceea ce privete erorile de msurare cntare? Este
raportul proiectat pentru aceleai populaii de oameni pentru care a fost dezvoltat
instrumentul? (De exemplu, grupuri pentru care grupurile de norma sunt relevante, sau
pentru care exist date de criteriu pertinent, etc.).
Afiai originalul
No information given fara informatie

Inadequate inadecvat

Adequate adecvat

Good bun

Excellent: Excellent fit between the scope of the instrument and the scope of the report,
with the level of specificity in the report being matched to the level of detail measured by
the scales. Good use made of all the scores reported from the instrument.

Excelent: potrivire excelent ntre domeniul de aplicare al instrumentului i


domeniul de aplicare al raportului, cu nivelul de specificitate n raport fiind
corelat cu nivelul de detaliere msurat prin scalelor. o bun utilizare a tuturor
scorurile raportate de instrument.

12.2

Reliability
How consistent are the reports in their interpretation of similar sets of score data?
If report content is varied (e.g. by random selection from equivalent text units), is this done in
a satisfactory manner?
Is the interpretation of scores and the differences between scores justifiable in terms of the
scale measurement errors?

Fiabilitate Ct de coerente sunt rapoartele n interpretarea lor de seturi similare de date


scor? n cazul n care coninutul de raport este variat (de exemplu, prin selecie
aleatoare de la uniti de text echivalent), se face acest lucru ntr-un mod satisfctor?
Este interpretarea scorurilor i diferenele dintre scorurile justificate n ceea ce privete
erorile de msurare la scar?

No information given

Inadequate

Adequate

Good

Test Review Form Version 4.2.6

09-04-2013

Page 107

Board of Assessment Report 2013: Document 110c - Annex B


Excellent: Excellent consistency in interpretation and appropriate warnings provided for
statements, interpretation and recommendations regarding their underlying errors of
measurement.

Excelent: consecven excelent n interpretarea i avertizri adecvate prevzute


declaraii, interpretarea i recomandri cu privire la erorile lor care stau la baza de
msurare.
12.3

Relevance or validity
The linkage between the instrument and the content of the report may be explained either within
the report or be separately documented. Where reports are based on clinical judgment, the
process by which the expert(s) produced the content and the rules relating scores to content
should be documented.

How strong is the relationship between the content of the report and the scores on the
instrument? To what degree does the report go beyond or diverge from the information
provided by the instrument scores?

Does the report content relate clearly to the characteristics measured by the instrument?

Does it provide reasonable inferences about criteria to which we might expect such
characteristics to be related?

What empirical evidence is provided to show that these relationships actually exist?

It is relevant to consider both the construct validity of a report (i.e. the extent to which it provides
an interpretation that is in line with the definition of the underlying constructs) and criterionvalidity (i.e. where statements are made that can be linked back to empirical data).

Relevan sau validitate Legtura dintre instrument i coninutul raportului poate fi


explicat fie n cadrul raportului sau s fie documentate separat. n cazul n care rapoartele
se bazeaz pe raionamentul clinic, procesul prin care expertul (e) a produs coninutul i
normele privind scorurile la coninutul ar trebui s fie documentate. Ct de puternic
este relaia dintre coninutul raportului i scorurile de pe in-strument? n ce msur
raportul depete sau se abat de la informaiile pro-scorurile de condiia instrumentului?
Are coninutul de raport se refer n mod clar la caracteristicile msurate de instrument?
Nu ofer concluzii rezonabile cu privire la criteriile de la care ne-am putea atepta un
astfel-risticile s caracterizai fie legate? Ce este furnizat dovezi empirice pentru a arta
c aceste relaii exist de fapt? Este relevant s ia n considerare att validitatea de
construct a unui raport (de exemplu, n msura n care acesta prevede o interpretare care
este conform cu definiia constructelor care stau la baza) i criteriu de valabilitate (de
exemplu, n cazul n care se fac declaraii care pot fi legate de date empirice).
Afiai originalul

No information given

Inadequate

Adequate

Good

Excellent: Relationship between the scales and the report content, with clear
justifications provided.

Test Review Form Version 4.2.6

09-04-2013

Page 108

Board of Assessment Report 2013: Document 110c - Annex B

Excelent: Relaia dintre scalele i coninutul raportului, cu clar-tiile furnizate


justificare.
12.4

Fairness, or freedom from systematic bias

Is the content of the report and the language used likely to create impressions of
inappropriateness for certain groups?

Does the report make clear any areas of possible bias in the results of the instrument?

Are alternate language forms available? If so, have adequate steps been taken to ensure
their equivalence?

Corectitudinea, sau libertatea de polarizare sistematic Este coninutul raportului i


limbajul folosit probabil pentru a crea impresii de inappropri-pentru anumite grupuri de
caracterul cores-? Are raportul face clar orice zone posibile de prtinire n rezultatele
instrumentului? Sunt forme alternative disponibile de limb? Dac este aa, au fost
luate msuri adecvate pentru a se asigura echivalena lor?

No information given

Inadequate

Adequate

Good

Excellent: Clear warnings and explanations of possible bias, available in all relevant user
languages.

Excelent: cu avertismente clare i explicaiile posibile prtinire, disponibile n


toate limbile relevante ale utilizatorilor
12.5

Acceptability
This will depend substantially on the complexity of the language used in the report, the
complexity of the constructs being described and the purpose for which it is intended.

Is the form and content of the report likely to be acceptable to the intended recipients?

Is the report written in a language that is appropriate for the likely levels of numeracy and
literacy of the intended reader?

Acceptabilitate Acest lucru va depinde n mod substanial de complexitatea limbajului


utilizat n raport, complex tatea dintre construciile fiind descrise i scopul pentru care
este destinat. Este forma i coninutul raportului de probabil s fie acceptabil pentru
destinatarii? Este raportul scris ntr-o limb pe care este adecvat pentru nivelurile
probabile de numeraie i luminat Alfabetizarea ale cititorului intenionat?

No information given

Inadequate

Adequate

Test Review Form Version 4.2.6

09-04-2013

Page 109

Board of Assessment Report 2013: Document 110c - Annex B


Good

Excellent: Very high acceptability, well-designed and well-suited to the intended


audience.

Excelent: Foarte mare acceptabilitate, bine concepute i bine adaptate la audienta.


12.6

Length
This is also an aspect of Practicality and should be reflected in the rating given for this, but too
long reports may also be an indication of over-interpretation of scores. Therefore the length of
reports is rated separately also. Generally reports that on average take more than one page per
scale (excluding title pages, copyright notices etc.) may be over long and over-interpreted.

Lungimea Acesta este, de asemenea, un aspect al Practicalitatea i ar trebui s se reflecte


n rating-ul dat pentru acest lucru, dar rapoartele prea lungi pot fi, de asemenea, o
indicaie a supra-interpretare a scorurilor. Prin urmare, lungimea rapoartelor este evaluat
separat, de asemenea. raporteaz c, n general, n medie, s ia mai mult de o pagin pe
scal (exclusiv pagini de titlu, notificrile privind drepturile de autor etc.) pot fi peste
lungi i supra-interpretate

12.7

No information given

Inadequate

Adequate

Good

Excellent

Overall adequacy of computer generated reports


This overall rating is obtained by using judgment based on the ratings given for items 12.1 12.6.
Do not simply average numbers to obtain an overall rating.

adecvarea global generate de calculator rapoarte Acest rating de ansamblu este obinut
prin utilizarea unei judeci bazate pe rating acordate pentru articolele 12.1 -12.6. Nu fac
pur i simplu numere medii pentru a obine un rating de ansamblu
No information given

Inadequate

Adequate

Good

Excellent

Test Review Form Version 4.2.6

09-04-2013

Page 110

Board of Assessment Report 2013: Document 110c - Annex B

Reviewers comments on computer generated reports


The evaluation can consider additional matters such as whether the reports take into account any checks
of consistency of responding, response bias measures (e.g. measures of central tendency in ratings) and
other indicators of the confidence with which the person's scores can be interpreted.
Comments on the complexity of the algorithms can be included, e.g. whether multiple scales are
considered simultaneously, how scale profiles are dealt with etc. Such complexity should, of course, be
supported by a clear rationale in the manual.

comentariile evaluatorilor cu privire la rapoartele generate de calculator Evaluarea poate lua n


considerare alte aspecte, cum ar fi, dac rapoartele iau n considerare orice controale de coeren
a rspuns, msuri de polarizare de rspuns (de exemplu, msuri de tendin central n obolaninele), precum i a altor indicatori ai ncredere cu care scorurile persoanei poate fi interpretat.
Comentariile cu privire la complexitatea algoritmilor pot fi inclui, de exemplu, dac mai multe
scale sunt conside-ered simultan, modul n care profilurile de scalare sunt tratate etc. O astfel de
complexitate ar trebui, desigur, s fie sup-portat printr-un raionament clar n manual.

Test Review Form Version 4.2.6

09-04-2013

Page 111

Board of Assessment Report 2013: Document 110c - Annex B

13

Final evaluation

Evaluative report of the test


This section should contain a concise, clearly argued judgment about the test. It should describe its pros
and cons, and give some general recommendations about how and when it might be used - together with
warnings (where necessary) about when it should not be used.
A summary of any positive or negative points raised in connection with adapted and translated tests
should be summarised here. A checklist of the important considerations for such instruments is added in
the Appendix as a reminder of the notes in the relevant sections. Only comment on these if this is
appropriate.
The evaluation should cover topics such as the appropriateness of the instrument for various
assessment functions or areas of application; any special training needs or special skills required;
whether training requirements are set at the right level; ease of use; the quality and quantity of
information provided by the supplier and whether there is important information which is not supplied to
users and where there are issues arising from the instrument being translated or adapted (see
Appendix).
Include comments on any research that is known to be under way, and the supplier's plans for future
developments and refinements etc.

Raportul evaluativ testului Aceast seciune trebuie s conin o hotrre concis, n mod clar a
argumentat despre test. Ar trebui s descrie argumente pro i contra, i s dea cteva
recomandri generale cu privire la modul n care i cnd s-ar putea folosi - mpreun cu
avertismente (acolo unde este cazul) cu privire la cazul n care nu ar trebui s fie utilizat. Un
rezumat al oricror puncte pozitive sau negative ridicate n legtur cu testele adaptate i traduse
trebuie s fie cuprinse aici. O list de verificare a consideraii importante pentru astfel de
instrumente se adaug n apendice ca o aducere aminte a notelor din seciunile respective.
Numai comentarii cu privire la aceste dac acest lucru este de-dup caz.
Evaluarea ar trebui s acopere subiecte cum ar fi caracterul adecvat al instrumentului pentru
diverse funcii sau domenii de aplicare a evalua-ment; orice pregtire special are nevoie sau
cunotine speciale; dac cerinele trenului sunt stabilite la ING nivelul corect; usor de folosit;
calitatea i cantitatea informaiilor furnizate de ctre furnizor i dac exist informaii
importante care nu sunt furnizate utilizatorilor i n cazul n care exist probleme care decurg din
instrumentul s fie tradus sau adaptat (a se vedea apendicele). Includei comentarii cu privire la
orice activitate de cercetare, care este cunoscut a fi n curs de desfurare, i planurile
furnizorului pentru dezvoltri i rafinri viitoare etc.

Test Review Form Version 4.2.6

09-04-2013

Page 112

Board of Assessment Report 2013: Document 110c - Annex B


Conclusions
Concluzii

Test Review Form Version 4.2.6

09-04-2013

Page 113

Board of Assessment Report 2013: Document 110c - Annex B

Recommendations (select one)


The relevant recommendation, from the list
given, should be indicated. Normally this will
require some comment, justification or
qualification. A short statement should be
added relating to the situations and ways in
which the instrument might be used, and
warnings about possible areas of misuse.

1 Requires further development. Only suitable for


use in research, not for use in practice

2 Only suitable for use by an expert user


(exceeding EFPA User Qualification Level 2) under
carefully controlled conditions or in very limited
areas of application

Norms

10

Reliabilityoverall

11

Validity-overall

12

Computer generated reports

adecvat numai pentru utilizarea de ctre un


utilizator expert (care depete EFPA
Calificarea utilizatorului Nivelul 2) n condiii
controlate cu atenie sau n zone foarte
limitate de aplicare

Recomandri (selectai una)


Recomandarea relevant, din lista dat,
trebuie indicat. n mod normal, acest
lucru va necesita unele comentarii,
justificare sau cali-ficare. O scurt
declaraie trebuie adugat referitoare la
situaiile i modalitile n care ar putea fi
utilizate instrumentul i avertismente cu
privire la posibilele domenii de utilizare
necorespunztoare.
9 Norme 10 Fiabilitate-general 11
Valabilitatea-globale 12 generate pe
computer rapoarte n cazul n care oricare
dintre aceste evaluri sunt 0 sau 1 instrument va fi n mod normal, clasificat la
Rec-ommendation 1, 2 sau 3 sau va fi
clasificat la rubrica "Altele" cu o
explicaie adecvat dat.

1 Necesit dezvoltarea n continuare. Potrivit


numai pentru utilizare n cercetare, nu pentru a
fi utilizate n practic

All the characteristics listed below should


have ratings of either n/a, 2, 3, or 4 if an
instrument is to be recommended for
general use (box 4 or 5).

If any of these ratings are 0 or 1 the


instrument will normally be classified under
Recommendation 1, 2, or 3 or it will be
classified under Other with a suitable
explanation given.

3 Suitable for supervised use in the area(s) of


application defined by the distributor by any user
with general competence in test use and test
administration (exceeding EFPA User Qualification
Level 2)

Potrivit pentru utilizarea supravegheat n


zon (e) de ap-definit de complicaie
distribuitorul de ctre orice utilizator, cu
competen general n timpul utilizrii de
testare i testul admini-traie (mai mare EFPA
utilizatorului Nivel de calificare 2)
4 Suitable for use in the area(s) of application
defined by the distributor, by test users who meet
the distributors specific qualifications requirements
(at least EFPA User Qualification Level 2)

Adecvate pentru utilizarea n domeniul


(domeniile) de aplicare a de-a amendat de
ctre distribuitor, de ctre utilizatorii de
testare, care ndeplinesc cerinele de calificare
specifice ale distribuitorului (cel puin EFPA
utilizator de calificare nivelul 2)
5 Suitable for unsupervised self-assessment in the
area(s) of application defined by the distributor

Potrivit pentru auto-evaluare nesupravegheat


n domeniul (domeniile) de aplicare definit de
ctre distribuitor)
6 Other altele
Test Review Form Version 4.2.6

09-04-2013

Page 114

Board of Assessment Report 2013: Document 110c - Annex B

Test Review Form Version 4.2.6

09-04-2013

Page 115

Board of Assessment Report 2013: Document 110c - Annex B

PART 3

Test Review Form Version 4.2.6

BIBLIOGRAPHY

09-04-2013

Page 116

Board of Assessment Report 2013: Document 110c - Annex B


American Educational Research Association, American Psychological Association, and National Council
on Measurement in Education. (1999). Standards for educational and psychological testing.
Washington, DC: American Psychological Association.
Bartram, D. (1996). Test qualifications and test use in the UK: The competence approach. European
Journal of Psychological Assessment, 12, 6271.
Bartram, D. (2002a). EFPA Review Model for the description and evaluation of psychological instruments:
Version 3.2. Evaluation Form. Brussels: EFPA Standing Committee on Tests and Testing (September,
2002).
Bartram, D. (2002b). EFPA Review Model for the description and evaluation of psychological instruments:
Version 3.2. Notes for Reviewers. Brussels: EFPA Standing Committee on Tests and Testing
(September, 2002).
Bartram, D., & Hambleton, R. K. (Eds.) (2006). Computer-based testing and the Internet. Chichester, UK:
Wiley and Sons.
Bartram, D., Lindley, P. A., & Foster, J. M. (1990). A review of psychometric tests for assessment in
vocational training. Sheffield, UK: The Training Agency.
Bartram, D., Lindley, P. A., & Foster, J. M. (1992). Review of psychometric tests for assessment in
vocational training. BPS Books: Leicester.
Bechger, T., Hemker, B., & Maris, G. (2009). Over het gebruik van continue normering [On the use of
continuous norming]. Arnhem, The Netherlands: Cito.
Bennett, R. E. (2006). Inexorable and inevitable: The continuing story of technology and assessment. In
D. Bartram & R. K. Hambleton (Eds.), Computer-based testing and the Internet (pp. 201-217).
Chichester, UK: Wiley and Sons.
Brennan, R. L. (Ed.) (2006). Educational measurement. Westport, CT: ACE/Praeger.
Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.
Downing, S. M., & Haladyna, T. M. (Eds.) (2006). Handbook of test development. Hillsdale, NJ: Erlbaum.
Drasgow, F., Luecht, R. M., & Bennett, R. E. (2006). Technology and testing. In R. L. Brennan (Ed.),
Educational measurement (pp. 471-515).Westport, CT: ACE/Praeger.
Drenth, P. J. D., & Sijtsma, K. (2006). Testtheorie. Inleiding in de theorie van de psychologische test en
zijn toepassingen (4e herziene druk) [Test theory. Introduction in the theory and application of
psychological tests (4th revised ed.)]. Houten, The Netherlands: Bohn Stafleu van Loghum.
Embretson, S. E. (Ed.) (2010). Measuring psychological constructs. Advances in model-based
approaches. Washington, D. C.: American Psychological Association.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists.Mahwah, NJ: Erlbaum.
Evers, A. (2001a). Improving test quality in the Netherlands: Results of 18 years of test ratings.
International Journal of Testing, 1, 137153.
Evers, A. (2001b). The revised Dutch rating system for test quality. International Journal of Testing, 1,
155182.
Evers, A., Braak, M., Frima, R., & van Vliet-Mulder, J. C. (2009-2012). Documentatie van Tests en
Testresearch in Nederland [Documentation of Tests and Testresearch in The Netherlands].
Amsterdam: Boom test uitgevers.
Evers, A., Lucassen, W., Meijer, R., & Sijtsma, K. (2010). COTAN Beoordelingssysteem voor de Kwaliteit
van Tests (geheel herziene versie; gewijzigde herdruk) [COTAN Rating system for test quality
(completely revised edition; revised reprint)]. Amsterdam: NIP.
Evers, A., Muiz, J., Bartram, D., Boben, D., Egeland, J., Fernndez-Hermida, J. R., et al. (2012). Testing
practices in the 21st Century: Developments and European psychologists opinions. European
Psychologist, in press.
Evers, A., Sijtsma, K., Lucassen, W., & Meijer, R. R. (2010). The Dutch review process for evaluating the
quality of psychological tests: History, procedure and results. International Journal of Testing, 10,
295-317.
Test Review Form Version 4.2.6

09-04-2013

Page 117

Board of Assessment Report 2013: Document 110c - Annex B


Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing
guidelines for classroom assessment. Applied Measurement in Education, 15, 309-334.
Hambleton, R. K., Jaeger, R. M., Plake, B. S., & Mills, C. (2000). Setting performance standards on
complex educational assessments. Applied Psychological Measurement, 24, 355366.
Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (Eds.) (2005). Adapting educational and
psychological tests for cross-cultural assessment. Mahwah, NJ: Erlbaum.
Hemphill, J. F. (2003). Interpreting the magnitudes of correlation coefficients. American Psychologist, 58,
78-80.
International Test Commission. (2005). International Guidelines on Computer-Based and Internet
Delivered Testing. Bruxelles, Belgium: Author.
Kersting, M. (2008). DIN Screen, Version 2. Leitfaden zur Kontrolle und Optimierung der Qualitt von
Verfahren und deren Einsatz bei beruflichen Eignungsbeurteilungen [DIN Screen, Version 2. Guide
line for monitoring and optimizing the quality of instruments and their application in proficiency
assessment procedures.]. In M. Kersting. Qualittssicherung in der Diagnostik und Personalauswahl der DIN Ansatz (S. 141-210) [Guaranteeing quality in diagnostics and personnel selection (p. 141210)]. Gttingen: Hogrefe.
Lindley, P. A. (2009). Reviewing translated and adapted tests: Notes and checklist for reviewers:5 May
2009. Leicester, UK: British Psychological Society. Retrieved from http://www.efpa. eu/professionaldevelopment/tests-and-testing.
Lindley, P.A. (2009, July). Using EFPA Criteria as a common standard to review tests and instruments in
different countries. In D.Bartram (Chair), National approaches to test quality assurance. Symposium
conducted at The 11th European Congress of Psychology, Oslo, Norway.
Lindley, P., Bartram, D., & Kennedy, N. (2004). EFPA Review Model for the description and evaluation of
psychological tests: test review form and notes for reviewers: Version 3.3. Leicester, UK: British
Psychological Society (November, 2004).
Lindley, P., Bartram, D., & Kennedy, N. (2005). EFPA Review Model for the description and evaluation of
psychological tests: test review form and notes for reviewers: Version 3.41. Brussels: EFPA Standing
Committee on Tests and Testing (August, 2005).
Lindley, P., Bartram, D., & Kennedy, N. (2008). EFPA Review Model for the description and evaluation of
psychological tests: test review form and notes for reviewers: Version 3.42. Brussels: EFPA Standing
Committee on Tests and Testing (September, 2008).
Lindley, P. A. (Senior Editor), Cooper, J., Robertson, I., Smith, M., & Waters, S. (Consulting Editors).
(2001). Review of personality assessment instruments (Level B) for use in occupational settings. 2 nd
Edition. Leicester, UK: BPS Books.
Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., et al. (2001). Psychological
testing and psychological assessment: A review of evidence and issues. American Psychologist, 56,
128-165.
Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague: Mouton.
Moosbrugger, H., Kelava, A., Hagemeister, C., Kersting, M., Lang, F., Reimann, G., et al. (2009, July). The
German Test Review System (TBS-TK) and first experiences. In D. Bartram (Chair), National
approaches to test quality assurance. Symposium conducted at The 11th European Congress of
Psychology, Oslo, Norway.
Moreno, R., Martnez, R. J., & Muiz, J. (2006). New guidelines for developing multiple-choice tems.
Methodology, 2, 65-72.
Muiz, J., & Bartram, D. (2007). Improving international tests and testing. European Psychologist, 12, 206219.
Nielsen, S. L. (2009, July). Test certification through DNV in Norway. In D. Bartram (Chair), National
approaches to test quality assurance. Symposium conducted at The 11th European Congress of
Psychology, Oslo, Norway.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
Test Review Form Version 4.2.6

09-04-2013

Page 118

Board of Assessment Report 2013: Document 110c - Annex B


Parshall, C. G., Spray, J. A., Davey, T., & Kalohn, J. (2001). Practical Considerations in Computerbased Testing. New York: Springer Verlag.
Prieto, G., & Muiz, J. (2000). Un modelo para evaluar la calidad de los tests utilizados en Espaa [A
model for the evaluation of test quality in Spain]. Papeles del Psiclogo, 77, 6571.
Reise, S. P., & Havilund, M. G. (2005). Item response theory and the measurement of clinical change.
Journal of Personality Measurement, 84, 228-238.
Tideman, E. (2007). Psychological tests and testing in Sweden. Testing International, 17(June), 57.
Schneider, R. J., & Hough, L. M. (1995). Personality and industrial/organizational psychology. In C. L.
Cooper & I. T. Robertson (Eds.), International Review of Industrial and Organizational Psychology,
10, 75-129.
Shrout, P. E. (1998). Measurement reliability and agreement in psychiatry. Statistical Methods in Medical
Research, 7, 301-317.
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.
Testkuratorium. (2009). TBS-TK. Testbeurteilungssystem des Testkuratoriums der Fderation Deutscher
Psychologenvereinigungen. Revidierte Fassung vom 09. September 2009 [TBS-TK. Test review
system of the board of testing of the Federation of German psychologists associations]. Report
Psychologie, 34, 470-478.
Van de Vijver, F. J. R., & Poortinga, Y. H. (2005). Conceptual and methodological issues in adapting tests.
In R. K. Hambleton, P. F. Merenda, & C. D. Spielberger (Eds.), Adapting educational and
psychological tests for cross-cultural assessment. Mahwah, NJ: Erlbaum.
Van der Linden, W. J., & Glas, C. A. W. (Eds.) (2010). Elements of adaptive testing. London: Springer.
Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Erlbaum.
Ziegler, M., MacCann, C., & Roberts, R. (Eds.) (2011). New perspectives on faking in personality
assessment. Oxford, UK: Oxford University Press.

Test Review Form Version 4.2.6

09-04-2013

Page 119

Board of Assessment Report 2013: Document 110c - Annex B

APPENDIX
An aide memoire of critical points for comment when an instrument has been translated and/or
adapted from a non-local context
Development
Evidence or discussion of
Basic psychometric properties

Input from native speakers of new language


Multiple review by both language and content (of test) experts
Back translation from new language into original language
Item performance
Reliability

Norms
A local norm is provided
Non-local norm
International norms
The nature of the sample
The type of measure
The equivalence of the test
version
Similarities of scores in different
samples
Guidance about generalising the
norms

Strong evidence of equivalence for both test versions and


samples
Larger than the typical requirements of local samples
Balance of sources of the sample
Equivalence of the background of the different parts of the
sample
Little or no verbal content
All the language versions are well translated/adapted
Some groups have completed the test in a non-primary
language
Where there are large differences these should be accounted
for and the implications in use discussed

Equivalence/ Reliability/Validity
Invariance in construct structure
Similar criterion related validity
Similar patterns of scale loadings
Alternate form reliability

Via factor structure, equivalence of correlation matrices or


similarity of patterns of correlation with standard measures
Strongest correlation with similar competencies
Items correlate in same pattern with other scales
Strongest/weakest loading items are similar in original and new
languages
Bilingual candidates have similar profiles in two languages

Validity generalisation
Validity generalisation needs
strong evidence
Validity generalisation can be
inferred

Test Review Form Version 4.2.6

When translating tests across linguistic families (e.g. from an


Indo-European to a Semitic language
Where a test has been translated into multiple languages
some validity generalisation can be inferred from evidence of
validity invariance in previous translations: Swedish test has
already been translated into French, German and Italian and
has been shown to have equivalence in these languages

09-04-2013

Page 120

S-ar putea să vă placă și