Documente Academic
Documente Profesional
Documente Cultură
GENERAL CONCEPTS
Uses:
1. Measure differences between individuals or between reactions of the same individual under
different circumstances
2. Detection of intellectual difficulties, severe emotional problems, and behavioral disorders
3. Classification of students according to type of instruction, slow and fast learners,
educational and occupational counseling, selection of applicants for professional schools
4. Individual counseling – educational and vocational plans, emotional well-being, effective
interpersonal relations, enhance understanding and personal development, aid in decision-
making
5. Basic research – nature and extent of individual differences, psychological traits, group
differences, identification of biological and cultural factors
6. Investigating problems such as developmental changes in the lifespan, effectiveness of
educational interventions, psychotherapy outcomes, community program impact
assessment, influence of environment on performance
Measures broad aptitudes to specific skills
REMEMBER: In the absence of additional interpretative data, a raw score on any psychological test is
meaningless.
STATISTICAL CONCEPTS
Statistics – used to organize and summarize quantitative data to facilitate and understanding of it
Frequency distribution – tabulating scores into class intervals and counting how often a score falling
in the class interval appears within the data
Normal curve features:
o Largest number of cases cluster in the center of the range
o Number drops gradually in both directions as extremes are approached
o Bilaterally symmetrical – 50% of cases fall to the left and to the right of the
o Single peak in the center
Central tendency – single, most typical or representative scores to characterize the performance of
an entire group
o Mean – average; add all scores and divide by total number of cases
o Mode – most frequent score; midpoint of the class interval with the highest frequency;
highest point on the distribution curve
o Median – middlemost score when all scores have been arranged from smallest to largest
Variability – extent of individual differences around the central tendency
Range – highest and lowest score
Deviation – difference between an individual’s score and the mean of the group (x = X - M)
Standard deviation – square root of the variance; compares the variability of different groups
o higher standard deviation means more individual differences (variation)
DEVELOPMENTAL NORMS
Basal age – highest age at and below which all tests were passed
Mental age – basal age + partial credits in months for tests passed above basal age-level tests
o Mental age unit shrinks correspondingly with age
Grade equivalent – mean raw score obtained by children in each grade
o Disadvantages:
Appropriate only for common subjects taught across grade levels (e.g. not
applicable for high school level)
Emphasis on different subjects may vary from grade to grade
Grade norms are not performance standards
Ordinal scales – sequential patterning of early behavior development
o Developmental stages follow a constant order; each stage presupposes mastery of an earlier
stage
WITHIN-GROUP NORMS
Percentile – percentage of persons who fall below a given raw score
o Indicates person’s relative position in the standardization sample
o The lower the percentile, the lower the standing
o Advantages:
Easy to compute
Can be easily understood
Universally applicable
o Disadvantage: inequality of units
Shows only the relative position but not the amount of difference between the
scores
Standard score – individual’s distance from the mean in terms of standard deviation units
o Linear transformation – retain exact numerical relations of original raw scores
Subtract constant, divide by constant
Also called z-score
𝑋−𝜇
𝑧=
𝜎
Relativity of Norms
IQ should always be accompanied by the name of the test
Individual’s standing may be misrepresented if inappropriate norms are used
Sources of variation across tests:
o Test content
o Scale units of mean and standard deviation
o Composition of standardization sample
Normative sample – ideally, a representative cross-section of the population for which the test is
designed
o Sample – group of persons actually tested
o Population – larger but similarly constituted group from which the sample is drawn
o Should be large enough to provide stable values
o Should be representative of the population under consideration
Else, restrict the population to fit the sample (redefine population)
o Should consider specific influences affecting the normative sample
Specific norms – tests are standardized to more specific populations to suit the purpose of the test
(a.k.a. subgroup, local norms)
OR
Gives general idea about the validity of the test in predicting a criterion
RELIABILITY
RELIABILITY
Consistency of scores obtained by the same person across time, items, or other test conditions
Extent to which individual differences in test scores represent “true” differences or chance errors
Estimate what proportion of test score variance is error variance
o Error variance – difference in scores resulting from conditions that are irrelevant to the
purpose of the test
No test is a perfectly reliable instrument
CORRELATION COEFFICIENT
Expresses the degree of relationship between two scores
Zero correlation indicates the total absence of a relationship
Pearson Product-Moment Correlation Coefficient – accounts for individual’s position in the group
and the amount of deviation from the mean
∑ 𝑥𝑦
𝑟𝑥𝑦 =
(𝑁)(𝑆𝐷𝑥 )(𝑆𝐷𝑦 )
TYPES OF RELIABILITY
Test-Retest Reliability
Repeat same test on the same person on another occasion
Test for correlation between scores on the two separate testing occasions
Source of error variance – fluctuations in performance between the two testing occasions
Shows how test can be generalized across situations
Higher reliability, lower susceptibility to random changes
Need to specify length of interval
Interval rarely exceeds 6 onths
Disadvantage: practice effect
Can only be applied to tests in which performance is not affected by repetition (e.g.
sensorimotor, motor)
Alternate-Form Reliability
Same person is tested with one form on one occasion and an alternate, equivalent form on
another occasion
Test for correlation of scores on the two forms
Measure of both temporal stability and consistency of responses to tw different item samples
Source of error variance: content sampling (to what extent does performance depend on
specific items or arrangement of the test?)
Parallel forms must:
o Be independently constructed;
o Items should be expressed in the same form;
o Same type of content;
o Equivalent range and level of difficulty;
o Instructions, time limits, and sample items must be equivalent
Disadvantage: reduce but does not completely eliminate practice effect
Questionable: degree of change in the test due to repetition (e.g. insight tasks)
Split-Half Reliability
Two scores are obtained by dividing it into equivalent halves
Source of error variance: content sampling
Test for coefficient of internal consistency
Single administration of a single form
Longer test = more reliable
Spearman-Brown formula – for estimating the effect of shortening or lengthening the test
o Used because this type of reliability only technically computes for the reliability of half
the test
Scorer Reliability
Factors excluded from error variance:
o True variance (remains in scores)
o Irrelevant factors that can be controlled experimentally
Correlate results obtained by two separate scorers
VALIDITY
Validity
What the test measures and how well it measures it
What can be inferred from the test scores
Correlation coefficient between a test score and a direct and independent measure of criterion
Criterion-related Validity
Indicate test’s effectiveness in predicting performance in specified activities
Not about time, but about objective of testing
Concurrent – used to diagnose existing status (Does person qualify or the job?)
o Criterion data is already available
Predictive – used to predict future performance (Does person have the pre-requisites to do well in a
job?)
Avoid criterion contamination (e.g. rater’s knowledge of test contaminates criterion ratings)
Criterion measure examples: academic achievement, performance in training, actual job
performance, contrasted groups (extremes of distribution of criterion measures); psychiatric
diagnoses, ratings by authority, correlation between new test and previously-available test
Construct Validity
Extent to which a test measures a theoretical construct or trait
Evidence includes research on nature of the trait and the conditions affecting development and
manifestation
Age differentiation – used in traditional intelligence tests
Correlation with other tests – new test measures approximately the same behavior as the previous
test
o Moderate correlation is desirable
Factorial validity – identification of factors and determining factors that impact the scores
Internal consistency – measure of homogeneity
o Upper criterion group vs lower criterion group – items that do not show higher scores on
upper criterion group are eliminated
Convergent – test correlates highly with others tests that it should theoretically correlate with
Discriminant – test does not correlate with variables it should be theoretically different from
Pre-test and post-test scores = training is valid if, after training, failed items in pre-test were passed
during post-test
Structural Equation Modeling (SEM) – explores relationships among constructs and the path that a
construct uses to affect criterion performance
The same test can have different purposes, leading to different evidences of validity.
Purpose of testing Illustrative question Type of validity
Achievement test How much did you learn? Content
For future performance How well will you learn in the Criterion – predictive
future?
Learning difficulty Is your performance indicative Criterion – concurrent
of a learning disorder?
Measure of ability How well does your score relate Construct
to other indicators of ability?
Validity is built in from the outset and extends even after test dissemination.
Test bias
Slope bias – significantly different validity coefficients in the two groups (differential validity)
Intercept bias – systematically under- or over-predicts criterion performance for a particular group
ITEM ANALYSIS
Item Analysis
Used to shorten test and increase its reliability and validity
Item difficulty – percentage of people passing the item
o Items are usually arranged in increasing difficulty
o The higher the inter-item correlations, the wider the spread of difficulty should be
Thurstone Absolute Scaling
o Find scale values of items separately within each group by converting percentage passing
into z-values
o Translate all these scale values into corresponding values for the group chosen as the
reference group
Test score distribution must approximate the normal curve
Item discrimination – degree to which an item differentiates correctly among test-takers in the
measured behavior
In contrasting groups, upper and lower 27% are used
Purpose: identify deficiencies in the test or in the teaching
Index of discrimination (D) – difference in percentage passing of upper scorers and lower scorers
(convert number of persons passing into percentages)
Phi coefficient – relationship between item and criterion
o What items will significantly correlate with the criterion?
1.96
𝜙0.05 =
√𝑁
INTELLIGENCE
Intelligence
Ability level at a given point in time
Score is not indicative of the reasons behind performance
o Should be descriptive rather than explanatory
Should not be used to label individuals but help in understanding them
Start where they are, assess strengths and weaknesses, make interventions
Contribute to self-understanding and personal development
Not a single entity but a composite of several functions
o Combination of abilities required for survival and advancement within a culture
Measures of scholastic aptitude or academic achievement
o Reflective of prior educational achievement
o Indicator of future performance
o Effective predictor of performance in various occupations and daily life activities
Should not be the only basis for making decisions
Motivation
Personality is not independent from aptitude
Aptitudes cannot be investigated independent from affect
o Prediction of subsequent performance can be enhanced by combining it with information
about motivation and attitudes
Achievement elsewhere can help shape cognitive performance (self-concept)
Hierarchical Theories
Organized factors in a hierarchy
Reconciles single-factor model with multiple-factor models
Broader factors have more loadings on more variables
ABILITY TESTS
Multicultural Testing
Language, speed removed as a parameter
Varying test content
Ravens’s Advanced Progressive Matrices
o Measure of ‘g’
o Requires eduction of relations among abstract items
Culture-Fair Intelligence Test
o Cattell
o Test of fluid reasoning
o Inductive reasoning – make broad generalizations based on available data
o Think logically and solve problems in novel situations, regardless of learned intelligence
Series - Choose which best completes the series
Classification - Identify two figures which are in some way different from others
Matrices - Complete the design or matrix presented
Conditions - Select the one that duplicates the conditions given
Goodenough-Harris Drawing Test
o Accuracy of observation and development of conceptual thinking
o Test may measure different functions at different ages
Approaches to cross-cultural testing
1. Choose items that are common across cultures; validate against local criteria
2. Develop a test within one culture and administer it to persons with different cultural
backgrounds
3. Different tests are developed for each culture, validated, and used only within that culture
Group Tests
Used in educational system, government service, industry military
Typically employs multiple-choice format for uniformity and objectivity in scoring
Increasing difficulty arranged in separately timed subtests
Spiral-omnibus format – single long time limit, mixed items of increasing difficulty
Advantages
o Can be administered simultaneously
o Greatly simplifies examiner’s role
o Provides more uniform testing conditions
o Scoring is more objective
o Provide better established norms
Disadvantages
o Less opportunity for rapport, maintaining cooperation and interest
o Less likely to detect extraneous interfering variables
o Examinees have restricted responses – penalized original thinkers
o Little to no opportunity for direct observations
o Lack of flexibility
Multi-level batteries
Sample major intellectual skills found to be pre-requisite for schoolwork
Suitable for schools for comparability across levels
Youngest age suitable for group testing: Kindergarten / 1st grade
Cognitive Abilities Test
o Verbal – verbal classification, sentence completion, verbal analogies
o Quantitative – quantitative relations, number series, equation building
o Nonverbal – figure classification, figure analogies, figure analysis
Test of Cognitive Skills
o Sequences – understanding and applying rules of arrangement in patterns of figures, letters,
or numbers
o Analogies – identifying the relationship and applying the principle to select a second pair
exhibiting the same relationship
o Verbal Reasoning – identification of essential elements in objects or things, inferring
relationships between sets of words, drawing logical conclusions from verbal passages
o Memory – definitions of a set of artificial words are presented and recall is tested after
other tests have been given
PERSONALITY
Personality Theories
o Biopsychosocial
Source of reinforcement (detached, discordant, dependent, independent,
ambivalent)
Pattern of coping behavior (active vs. passive)
Not a general personality instrument
Help in differential diagnoses
Example: Millon Clinical Multiaxial Inventory
o Manifest Needs System (Henry Murray)
Results in ipsative scores = strength of need is expressed in relation too other needs
within the individual
Normative comparisons questionable
Example: Edwards Personal Preference Schedule
Achievement - need to accomplish tasks well
Deference - need to conform to customs and defer to others
Order - need to plan well and be organized
Exhibition - need to be the center of attention in a group
Autonomy - need to be free of responsibilities and obligations
Affiliation - need to form strong friendships and attachments
Intraception - need to analyze behaviors and feelings of others
Succorance - need to receive support and attention from others
Dominance - need to be a leader and influence others
Abasement - need to accept blame for problems and confess errors to
others
Nurturance - need to be of assistance to others
Change - need to seek new experiences and avoid routine
Endurance - need to follow through on tasks and complete assignments
Heterosexuality - need to be associated with and attractive to members of
the opposite sex
Aggression - need to express one's opinion and be critical of others
Example: Personality Research Form and Other Jackson Inventories
Behaviorally-oriented and mutually-exclusive definitions of 20 personality
constructs
For prediction of behavior of individuals in normal contexts
Interest Inventories
Interest testing – used for educational an career assessment
o Also stimulated by occupational selection and classification
Opinions and attitudes
o For social psychology research
o Consumer research and employee relations
Has exploration validity – interest inventories increase behaviors needed for career exploration
o Used to introduce individual to careers that he or she has not previously considered
Issue: Sex fairness
o Tests are validated against existing groups, it perpetuates group differences
Example: Strong Interest Inventory
o “Like,” “Indifferent,” “Dislike” of 5 categories:
Occupations
School subjects
Activities
Leisure activities
Day-to-day contact with various people
o Levels of Scores
6 general occupation themes (Realistic, Investigative, Artistic, Conventional,
Enterprising, Social) – RIASEC
25 Basic Interest
211 Occupational Scales
o Personal Style
Work Style
Learning Environment
Leadership Style
Risk-taking
o Validity: Criterion
Example: Jackson Vocational Interest Survey
o Work roles – what a person does on the job
o Work styles – preference for situations or environments
o 34 basic interest scales, 26 work roles, and 8 work styles
o Equally applicable to both sexes
o Validity: Construct
Example: Kuder Occupational Interest Survey
o Uses forced-choice triad (liked most to liked least)
o 10 Broad interest areas: Outdoor, Mechanical, Computational, Scientific, Persuasive, Artistic,
Literary, Musical, Social Service, Clerical
o Grouped based on content validity
o Scores expressed as correlation between respondent scores and the interest pattern of a
particular group
Example: Self-Directed Search
o Self-administered, self-scored, self-interpreted
o Holland – occupational preferences s a choice of a way of life
Individuals seek environments that are congruent with their personality types
Vocational choices are implementations of self-concepts
Trends
o Expansion of occupational levels
o Effect of inventory on test-taker
o RIASEC model not a good fit for minority and other cultures
Situational Tests
Placing individual in a situation closely resembling a “real-life” criterion situation
Character Education Inquiry – makes use of familiar, natural situations in one’s routine
o Measures honesty, self-control, altruism
Situational Stress Test – sample individual’s behavior in a stressful, frustrating, or emotionally
disruptive environment
Leaderless Group Discussion – group is assigned a topic for discussion. Measures verbal
communication, verbal problem-solving, and acceptance by peers
Role-playing
Observer Reports
Naturalistic observation – direct observation of spontaneous behavior in natural settings (e.g. diary
method, time sampling)
o No control is exerted over the stimulus situation
Interview – elicit life-history data
o Can be highly-structured to unstructured
o Affords direct observation
Ratings – evaluation of the individual based on cumulative, uncontrolled observations
o Disadvantages:
Ambiguity
Amount of relevant contact
Halo effect
Error of central tendency
Leniency error
Nominating technique – choose one person with whom individual would like to study, work, eat
lunch
o Can identify potential leaders, isolates
o Good concurrent and predictive validity because of high number of raters, raters are in a
good position to observe, and observer’s opinions influence the observed’s action
Biodata
Interview and questionnaires to elicit life-history data
Consistently good predictors of performance
Developed through
o Criterion keying and cross-validation
o Identification of constructs through job analyses and surveys
APPLICATIONS OF TESTING
Educational Testing
Prediction an classification within a specific educational setting
Uses educational achievement tests
Achievement tests
Measures effects of specific program of instruction or training
Measure effects of relatively standardized experiences (controlled, known)
Aptitude – cumulative influence of different learning experiences in daily living
o Measure effect of learning under relatively uncontrolled or unknown conditions
Ability – any measure of cognitive behavior
o Sample of what individual knows at the time of testing
o Level of development in one or more abilities
o Includes both aptitude and achievement
No two tests correlate perfectly with one another
o Difference in achievement and ability could be about overprediction or underprediction
Is objective, uniform, and efficient
Functions:
o Reveal weaknesses in learning
o Give direction to subsequent learning
o Motivate learner
o Provide means of adapting to individual results
o Aid in evaluating teaching
o Aid in formulating educational goals (analyze educational objectives, critical examination of
content of instruction methods)
Item format
o Multiple choice is often used
Disadvantages:
Promote rote memorization
Learning of isolated facts vs. development of problem-solving and
conceptual understanding
o Constructed-response / open-ended = requires examinee to generate an answer
o Portfolio assessment – cumulative record of a sample of a student’s work in various areas
over a period of time
General Achievement Batteries
o Provide profiles of scores on individual subtests or in major academic areas
o Horizontal and vertical comparisons
o Large majority have overlapping items for different levels
o Some are concurrently normed with aptitude tests
Using same normative sample to enable direct comparison of scores
WISC & WIAT
Tests of Minimum Competency
o Ascertain mastery of basic skills
o Teacher-Made Classroom Tests
o Tests for College Level
Used for placement and admissions
o Graduate School Admission
For admission and placement
For scholarships, fellowships, and special appointments
o Diagnostic and Prognostic Testing
Diagnosis of learning disabilities
Prognostic – predict usual performance in a course
Teach-test-teach – how well s/he can learn during one-to-one instruction
o Assessment in Early Childhood Education
Measure outcomes of early childhood education
School readiness – attainment of pre-requisite skills, knowledge, attitudes,
motivation, and behavior to profit from school instruction
Emphasis on abilities required for learning to rea
Occupational Testing
Selection and classification of personnel
Individuals should be placed in a job where they are most qualified
Traits irrelevant to job requirements should not affect selection decisions
Selection tests should be validated with test performance
Intelligence Tests
Explore patterns of test scores for strengths and weaknesses
Profile analysis:
o Amount of scatter or variation among scores
o Base rate data – frequency of such features in normative population
o Score patterns that are typical of special populations / clinical syndromes
Irregularities in performance suggest avenues for exploration
Observing general behavior in the context of testing
Integrate test’s statistical info with human development, personality theory, etc.
Consider both skills and extraneous conditions
Need for supplementary information
Calls for individualized interpretation of test performance rather than uniform application of any
type of pattern analysis
Neuropsychological Assessment
Apply what is known about the brain-body relationship for diagnosis and treatment of brain-
damaged individuals
E.g. left hemisphere lesion = V < P in Wechsler, opposite pattern In right-hemisphere lesions and
diffuse brain damage
Age affects behavioral symptoms caused by brain damage
o Amount of learning, intellectual development
o The younger the age, the greater the effect of brain damage on intellectual functioning
Chronicity – amount of time elapsed since injury will affect physiological changes and behavioral
recovery through learning / compensation
Intellectual impairment may be an indirect result of brain damage
Same behavior may be due to organic, emotional, or mixed causes
Need premorbid ability level to examine the extent of damage
Instruments: Perception of spatial relations and memory for newly learned material (example:
Bender Visual Gestalt Motor Test)
Difficult to interpret results in terms of score patterns
Batteries can measure all significant neuropsychological skills:
o Detect brain damage
o Help identify and localize damaged area
o Differentiate among syndromes
o Planning rehabilitation through identifying type and extent of behavioral deficits
Behavioral Assessment
Define problem through functional analysis of behavior
Select appropriate treatments
Assess behavior change resulting from treatment
Procedures:
o Self-report
o Direct observation
o Physiological measures (for anxiety, sex, and sleep disorders)
Career Assessment
Integrate information from expressed interests, preferences, and value system
Career maturity – mastery of vocational tasks appropriate to age level
Clinical Judgment
Influenced by cultural stereotypes, fallacious prediction principles
Used when satisfactory tests are unavailable
Suited for cases that are rare and idiosyncratic, frequency is too low for the development of
statistical strategies
Psychologist with low levels of cognitive complexity are more likely to form biased clinical judgments
1. Do no harm.
a. Provide services and use techniques in which they have been trained
b. Choose tests that are appropriate for the purpose and for the examinee
c. Recognize boundaries of competencies and limitations of expertise
2. Be sufficiently knowledgeable about the science of human behavior to guard against
unwarranted inferences in interpretation.
3. Protect safety and security of test materials.
4. Protect safety and security of examinees
a. Persons should not be subjected to testing programs under false pretenses
b. Protect the individual’s privacy
i. Information to be asked must be relevant to purpose
ii. Informed consent should include the purpose of testing, data needed, and use
of scores
c. Test-taker should have the opportunity to comment on the report
i. The report should be readily understandable, free from technical jargon and
labels, and oriented towards the immediate objective of testing
d. Records should not be released without the knowledge or consent of the examinee,
unless mandated by law
Anastasi, A. & Urbina, S. (1997). Psychological Testing (7th Ed.). New Jersey: Prentice Hall
Psychological Association of the Philippnes. (2008). Code of Ethics for Philippine Psychologists. Retrieved
from http://www.pap.org.ph/includes/view/default/uploads/code_of_ethics_pdf.pdf