Sunteți pe pagina 1din 23


Educational Multimedia 5th Colloquium

is a mini-version of a full-scale study or a trial run
done in preparation of the complete study
also called a feasibility study
specific pre-testing of research instruments, including
questionnaires or interview schedules (Polit, et al.&
Baker in Nursing Standard, 2002:33-44)
It is reassessment without tears (Blaxter, Hughes &
Tight, 1996:121), trying out all research techniques
and methods, which the researcher have in mind to
see how well they will work in practice.

1.Pilot study is a small experiment designed to
test logistics
2. Gather information prior to a large study
3. Improve the actual studys quality and
4.Reveal deficiencies in the design of a proposed
experiment or procedure and these can then be
addressed before time
5. A good research strategy requires careful
planning and a pilot study will often be a part of
this strategy
The Value of a Pilot Study
Welman and Kruger (1999:146) also listed the following
three values of a pilot study:
i. to detect possible flaws in measurement
procedures and in the operationalisation of
independent variables.
ii. to identify unclear or ambiguous items in a
iii. give important information about any
embarrassment or discomfort experienced
concerning the content or wording of items in a
questionnaire- (the non-verbal behaviour)
Quotes concerning the value and goal of pilot
i. to see if the beast will fly
(De Vos, 2002:410)
i. reassessment without tears
(Blaxter, Hughes & Tight, 1996:121)
i. Do not take the risk. Pilot test first.
(Van Teijlingen & Hundley, 2001:2).
Pilot Study Quotes
Pilot studies should have a well-defined set of aims and objectives to
ensure methodological rigour and scientific validity.

Participants in an external pilot should not later be included in the
main study to make savings in recruitment, because then the decision
to proceed with the main study would not be made independently of
the results of the pilot study.

The analysis of a pilot study should be mainly descriptive or should
focus on confidence interval estimation.

Results from hypothesis testing should be treated as preliminary and
interpreted with caution, as no formal power calculations have been
carried out.

The temptation not to proceed with the main study when significant
differences are found should be avoided.
Pilot Study guidelines

Jumlah responden tidak
ditentukan dengan tepat,
dicadangkan sekurang-
kurangnya 25 orang, lebih baik
antara 50 75 orang.
Untuk kajian
baharu, lakukan
dua kali ujian
Tidak boleh gunakan
kumpulan fokus
Size calculations may not be required for some
pilot or exploratory studies
A pilot study may be used to generate
information to be used for sample size
The sample for a pilot needs to be
representative of the target population
It should be sufficient to address the key
feasibility objectives
It should also be based on the same
inclusion/exclusion criteria
Sample Sizes for Pilot
1.Pilot study is done on a smaller scale.
Thus, actual results of the study may vary
from the results of pilot study.
2. A pilot study is usually carried out on
members of the relevant population, but not
on those who will form part of the final
3. A pilot study is normally small in
comparison with the main experiment and
therefore, can provide only limited
information on the sources and magnitude of
variation of response measures

Validity is arguably the most important criteria for the quality of a test.

The term validity refers to whether or not a test measures what it intends
to measure.

There are several ways to estimate the validity of a test, including content
validity, construct validity, criterion-related validity (concurrent &
predictive) and face validity.

The question of validity is raised in the context of the three points made
1. The form of the test
2. The purpose of the test
3. The population for whom it is intended.

Therefore, we cannot ask the general question Is this a valid test?. The
question to ask is how valid is this test for the decision that I need to
make? or how valid is the interpretation I propose for the test?
Validity is thus a requirement for both quantitative and qualitative
Content related to objectives and
their sampling.

Construct referring to the theory
underlying the target.

Criterion related to concrete criteria
in the real world. It can be
concurrent or predictive.
Concurrent correlating high with
another measure already
Predictive Capable of anticipating
some later measure.

A correlation coefficient is a statistical summary of the relation between two
variables. It is the most common way of reporting the answer to such questions
as the following: Does this test predict performance on the job? Do these two
tests measure the same thing? Do the ranks of these people today agree with
their ranks a year ago?

According to Cronbach, to the question what is a good validity coefficient? the
only sensible answer is the best you can get, and it is unusual for a validity
coefficient to rise above 0.60, though that is far from perfect prediction.

All in all we need to always keep in mind the contextual questions:
what is the test going to be used for?
how expensive is it in terms of time, energy and money?
what implications are we intending to draw from test scores?
Reliability !!!!!!
How representative is the measurement?
Research requires dependable measurement.
Measurements are reliable to the extent that they are repeatable
and that any random influence which tends to make
measurements different from occasion to occasion or circumstance
to circumstance is a source of measurement error. (Nunnally,
Reliability is the degree to which a test consistently measures
whatever it measures. (Gay, 1987)
Errors of measurement that affect reliability are random errors and
errors of measurement that affect validity are systematic or
constant errors.
There are three major categories of reliability for most instruments: test-
retest, equivalent form, and internal consistency
Test-retest the degree to which scores are consistent over
time. It indicates score variation that occurs from
testing session to testing session as a result of
errors of measurement.
Equivalent-Forms Used when it is likely that test takers will recall
responses made during the first session and
when alternate forms are available. Correlate
the two scores. The obtained coefficient is called
the coefficient of stability or coefficient of
Determining how all items on the test relate to all
other items. Kudser-Richardson-> is an estimate
of reliability that is essentially equivalent to the
average of the split-half reliabilities computed for
all possible halves.
Two indexes of internal consistency:
Split half reliability
Coefficient alpha

the extent to which two or more individuals (coders or
raters) agree. Inter-rater reliability assesses the
consistency of how a measuring system is
implemented. Inter-rater reliability is dependent upon
the ability of two or more individuals to be consistent.
Training, education and monitoring skills can
enhance inter-rater reliability.

For example, when two or more teachers use a
rating scale with which they are rating the students
oral responses in an interview (1 being most
negative, 5 being most positive). If one researcher
gives a "1" to a student response, while another
researcher gives a "5," obviously the inter-rater
reliability would be inconsistent.

type of reliability assessment in which the same
assessment is completed by the same rater on two
or more occasions. These different ratings are then
compared, generally by means of correlation. Since
the same individual is completing both assessments,
the rater's subsequent ratings are contaminated by
knowledge of earlier ratings.

Split half reliability

Splitting a test into two equivalent halves and
then assessing the consistency of the scores
across the two halves of the test.
Divide the test into halves and correlate the
scores from the two halves.
Compute the correlation between scores on
the two halves of the test using Spearman-
Brown formula.
The low correlation indicates that the test was
unreliable, a high correlation indicates that the
test was reliable.

Coefficient alpha

Lee Cronbach 1951) developed coefficient
alpha.. Alpha Cronbach
Coefficient alpha tells you the degree to which
the items are interrelated.
Rule of thumb:
At a minimum, greater than or equal to .07 for
research purposes and somewhat greater than
that value (e.g. .09) for clinical testing

( ) correlation coefficients adalah seperti berikut:

0.00 hingga + 1.00 = pada asasnya

.60 hingga .70 = satisfied coefficients
.70 hingga .80 = stability coefficients
.80 hingga .90 = customary coefficients
.90 hingga .95 = sufficient coefficients
.80 hingga .90 = acceptable reliability
.90 hingga + 1.00 = very good reliability
.95 hingga + 1.00 = acceptable standardised test for internal
James Popham (1990), Modern Educational Measurement. A Practitioners
Perspective. 2
Nd Edition, New Jersey: Prentice Hall, Englewood Cliffs, h. 127.
Hair (2003)
What are some ways to improve validity?

Make sure your goals and objectives are clearly defined and
operationalized. Expectations of students should be written down.
Match your assessment measure to your goals and objectives.
Additionally, have the test reviewed by faculty at other schools to obtain
feedback from an outside party who is less invested in the instrument.
Get students involved; have the students look over the assessment for
troublesome wording, or other difficulties.
If possible, compare your measure with other measures, or data that may
be available.
Validity and reliability are closely related.
A test cannot be considered valid unless the
measurements resulting from it are reliable.
Likewise, results from a test can be reliable
and not necessarily valid.
Additional information
1. Pearson product-moment correlation
2. Kesahan dan kebolehpercayaan dalam kajian kualitatif dan
3. Comparison of values of pearsons and spearmans
correlation coefficients on the same sets of data