Documente Academic
Documente Profesional
Documente Cultură
The time spent doing assessments has decreased over the years, probably due to the widening role
of psychologists (not only assessing anymore), criticism on reliability & validity of many
assessment devices, and more other activities beyond administration and interpretation of
traditional tests (e.g. interviewing, observing). Also decline in projective techniques (e.g.
Rorschach).
First clinical interviews were unstructured (free association etc.), then criticism so objective tests,
then structured interviewing. Another trend was neuropsychological assessment with 2 traditions: Commented [TP1]: Read traditions again
(1) pathognomonic sign approach interpret behaviours as indicative of organic impairments,
and base their interview design/tests on a flexible method of testing possible hypotheses for
different types of impairment, and (2) psychometric approach more quantitative approach that
relies on critical cut-off scores to distinguish between normal persons and those with brain
damage. In practice mostly a combi. Then behaviour therapy. Currently: a psychologist doing
assessment might include techniques as interviewing, administering, and interpreting traditional
psychological tests, naturalistic observations, neuropsychological assessment, and behavioural
assessment. Future: influence of technology.
1. Theoretical orientation: research the construct that the test is supposed to measure.
1. Do you adequately understand the theoretical construct the test is supposed to measure?
2. Do the test items correspond to the theoretical description of the construct?
2. Practical considerations:
1. If reading is required by the examinee, does their ability match the level required by the
test?
2. How appropriate is the length of the test?
3. Standardization: adequacy in norms, or in administration.
1. Is the population to be tested similar to the population the test was standardized on?
2. Was the size of the standardization sample adequate?
3. Have specialized subgroup norms been established?
4. How adequately do the instructions permit standardized administration? (e.g. same rooms,
same amount of time, etc.)
4. Reliability: degree of stability, consistency, and predictability.
1. Are reliability estimates sufficiently high (generally around .90 for clinical decision making
and around 0.70 for research purposes)?
2. What implications do the relative stability of the trait, the method of estimating the
reliability, and the test format have on reliability?
5. Validity:
1. What criteria and procedures were used to validate the test?
2. Will the test produce accurate measurements in the context and for the purpose for which
you would like to use it?
RELIABILITY
Reliability the extent to which scores obtained by a person are/would be the same if the
person is re-examined by the same test on different occasions.
-Purpose: estimate the degree of test variance caused by error. Four methods for obtaining
reliability are (1) the extent to which the test produces consistent results upon retesting (test-
retest, time to time), (2) the relative accuracy of a test at a given time (alternate forms, form to
form), (3) the internal consistency of the items (split-half and coefficient alpha, item to item),
and (4) the degree of agreement between two examiners (interscorer, scorer to scorer).
Underlying reliability is:
Error of measurement an estimate of the range of possible random fluctuation that can be
expected in an individual’s score (e.g. misreading of items, change in mood). Always present
in the current system of psychological construct measuring. If there is a large degree of error,
you can’t place much confidence in the scores. Reducing measurement error gives you greater
confidence that the difference between one score and another is more likely to result from
some true difference than chance.
Test-retest reliability
Test-retest reliability determined by administering the test & then repeating it on a second
occasion. The reliability coefficient is calculated by correlating the scores obtained; the degree of
correlation between the 2 scores indicates the extent to which the test scores can be generalized
from one situation to the next. Correlation high? Results less likely to be caused by random error
and more by actual difference in the trait being measured.
-Preferred only if the variable being measured is relatively stable (so not for anxiety for example).
-Consideration factors: practice effect some tasks improve by practice; interval between
administrations, and life changes (e.g. intelligence likely to be stable in months, but from high
school to college changes).
Alternate forms
Alternate forms measuring a trait several times on the same individual using parallel forms of
the test; the different forms should produce similar results. the reliability coefficient is the degree
of similarity between the scores.
-Correlations determined by tests given with a wide time interval show not only a measure of
relation between forms but also temporal stability!
-Less practice effect than with test-retest.
-Difficulty: are the forms actually equivalent to each other? Otherwise, you’re not measuring the
reliability of the test itself but actual differences in performance!
Internal consistency: split-half reliability and coefficient alpha
Measures of the internal consistency of the test items rather than the temporal stability of
different administrations. Best techniques for determining reliability for a trait with a high degree
of fluctuation.
-Split-half method the test is split in half and correlating the items. Often split in odd/even
items, because splitting it in half has cumulative problems (e.g. effects of warming up, fatigue,
boredom).
-Coefficient alpha correlates all items with each other to determine their consistency.
-Limitations: split-half gives fewer items on each half, which results in wider variability because
the individual responses cannot stabilize as easily around a mean.
-General principle: the more items, the higher the reliability, because items compensate for minor
alterations!
Interscorer reliability
Interscorer reliability obtain a series of responses from a single client and have these
responses scored by two different individuals; or have two different examiners test the same client
using the same test and then determine how close their scores or ratings of the person are.
Interscorer coefficient can be calculated using a percentage agreement, a correlation, or
coefficient kappa. ‘
The best form is dependent on the nature of the variable (e.g. stable or not) and the purposes
for which the test is used (e.g. measuring a state).
Standard error of measurement (SEM) the amount of error that can be expected for test
scores, consisting of truth and error (usually included in test manual). The higher the
reliability, the lower the error. Is a standard deviation score:
Confidence interval the range of error that a score is expected to fall in. e.g. a SEM of 3
on an intelligence test would indicate that an individual’s score has a 68% chance of being
within 3 IQ points from the estimated true score. Commented [TP2]:
Commented [TP3R2]: Check needed
VALIDITY
Validity whether a test truly measures the trait it is supposed to measure.
Content validity
Content validity The extent to which a measurement measures all aspects of a construct
(e.g. the affective AND the behavioural dimension of depression). Often considered
subjective because of judgement by experts.
-Related: face validity the degree to which a test seem like it is measuring what it is
supposed to measure, but judged by the test users.
Criterion validity
Criterion validity the extent to which a measure is related to an outside measure, e.g.
correlate an intelligence test score with grade point average. Divided into:
Concurrent validity measurements taken at (approx.) the same time as the test, e.g.
intelligence test at the same time as academic achievement assessment.
Predictive validity outside measurements that were taken some time after the test scores
were derived, so predictive validity might be evaluated by correlating e.g. intelligence test
scores with measures of academic achievement a year after the initial testing.
Which one to use depends on purpose of test: predictive validity for predicting some future
outcome (e.g. for screening individuals who might develop emotional disorders), concurrent
validity for assessment of client’s current state.
Strength of criterion validity depends on the type of variable; e.g. intellectual tests give
relatively higher validity coefficients than personality tests because those have a higher
number of influences variables.
Criterion contamination where the criterion measure is biased, because knowledge of the
test results influences an individual’s later performance. Commented [TP4]: Which results?
Construct validity
INCREMENTAL VALIDITY
Incremental validity (gradual, progressive) the extent to which a measurement is able to produce
information above what is already known, aka if it adds much information to what can be
obtained with simpler already existing methods.
-If a test can produce additional information to what you already know about a client/group
from other tests.
CONCEPTUAL VALIDITY
Conceptual validity a means of evaluating and integrating test data so that the clinician’s
conclusions make accurate statements about the examinee. Concerned with testing constructs
(like construct validity), but in this case the constructs relate to the individual rather than the test
itself. Hypotheses can be considered to represent valid constructs regarding a person if they are
confirmed by e.g. observation, test data, history, etc.
CLINICAL JUDGEMENT
Clinical judgement a special instance of perception in which the clinician attempts to use
whatever sources are available to create accurate descriptions of the client. Sources include test
data, case history, medical records, personal journals, verbal & nonverbal observations, etc.
Phase 1, evaluating the referral question: one of the most important general requirements is
that clinicians understand the vocabulary, conceptual model, dynamics, and expectations of
the referral setting in which they will be working. Further, clinicians must evaluate whether
the referral questions are appropriate for psychological assessment and whether they have a
level of competence necessary to conduct an assessment to answer the specific questions. So
clarify the referral question!
Phase 3: regardless of theoretical orientation, the hypotheses must make sense within a
specific theoretical framework (e.g. low self-esteem may revolve around negative self-talk,
based on a cognitive behavioural perspective).
Phase 8: recommendations cannot be vague or broad, e.g. not recommending ‘’therapy’’ to a
client.
CHAPTER 2: CONTEXT OF CLINICAL
ASSESSMENT
TYPES OF REFERRAL SETTINGS
Referral requests often do not state a specific question that must be answered (e.g. ‘’could you
evaluate Jimmy because he is having difficulties in school?’’) or a decision that must be made,
although many times this is the position that the referral source is in, e.g. a teacher may want to
prove to parents that their child has a serious problem, or a school administrator may need testing
to support a placement decision. Greater clarification necessary to provide useful problem-solving
info! Responsibility for exploring and clarifying the referral question lies with the clinician.
To help clarify the referral question, clinicians should be familiar with the types of environments
in which they will be working:
PSYCHIATRIC SETTING
Psychiatrists could have the role of administrator, therapist, or physician.
LEGAL CONTEXT
Psychologists might be called in at any stage of legal decision making. Must become familiar with
specialized legal terms and evaluate possible malingering and deception. The practice of forensic
psychology includes training/consultation with legal practitioners, evaluation of populations
likely to encounter the legal system, and the translation of relevant technical psychological
knowledge into usable information.
ACADEMIC/EDUCATIONAL CONTEXT
Assessing children who are having difficulty, e.g. in evaluating the nature and extent of a child’s
learning difficulties, measuring intellectual weaknesses AND strengths, assessing behavioural
difficulties, etc. Individual assessment conducted, but wider context very important! (e.g. child’s
dysfunction might be caused by marital problems).
PSYCHOLOGICAL CLINIC
In contrast to the medical, legal, and educational institutions where the psychologist serves as a
consultant, the psychologist working in a psychological clinic often is the decision maker. Mostly
self-referred clients or children referred by parents, or by GP.
Ethical guidelines reflect values that professional psychology endorses (e.g. client safety,
confidentiality, fairness, etc.).
Test data raw and scaled scores, such as subscale scores and test profiles.
Test materials manuals, instruments, protocols, and test questions or stimuli.
Test materials turn into test data as soon as a psychologist places the client’s name on the
materials!