Sunteți pe pagina 1din 14

Research Design Defining, Measuring, and Manipulating Variables (Module 5) Reliability and Validity

(Module 6)

Defining Variables –Another aspect Operational Definition

• In our research some variables are fairly easy to define, manipulate, and measure.

• For example, if a researcher is studying the effects of exercise on blood pressure, she can find the
relation by:

• Manipulating the amount of exercise by length of time and see the results or

• Manipulate the intensity of the exercise (and monitoring target heart rates). • She can also
periodically measure blood pressure during the course of the study by a machine built to measure in a
consistent and accurate manner.

• Does the fact that a machine exists to take this measurement mean that the measurement is always
accurate? The answer is Yes and No. (We shall discuss this issue in Module 6 when we address
measurement error.)

• Now let us suppose that a researcher wants to study hunger, depression, aggression, in a group of
patients.

• Measurement of these parameters is not as easy as blood pressure.

• One researcher’s definition of what it means to be hungry may be vastly different from other
researcher. Even patients may define hunger in different ways.

• The solution of this problem is that researcher should define, hunger, and this definition is called
operational definition, that is, criteria, the researcher uses to measure or manipulate it.

• In other words, the investigator might define hunger in terms of specific activities such as not having
eaten for 12 hours.

• Thus one operational definition of hunger could be that simple: Hunger occurs when 12 hours have
passed with no food intake

• Researchers must operationally define all variables: those measured (dependent variables) and those
manipulated (independent variables)

• Another example, If a researcher says he measured anxiety in his study,

• The question becomes how did he operationally define anxiety? There are different ways as following
• Anxiety can be defined as the number of nervous actions displayed in a 1-hour time period, or
• As a person’s score on a GSR (Galvanic Skin Response) machine or on the Taylor Manifest Anxiety
Scale. Some measures are better than others, better meaning more reliable and valid (concepts we
discuss in Module 6).

• Once other investigators understand how a researcher has operationally defined a variable, they can
replicate the study if they so desire.

• They can better understand the study and whether it has problems. They can also better design their
own studies based on how the variables were operationally defined

Exercise 1. You are a teacher in a school, and want to send a team of 3 students to participate in an
interschool poetry competition. You have to select best 3 students among the school. Now, Define BEST
THREE STUDENTS

2. You are observing “How many people abide by road rules, by checking, number of people stop on
traffic signal”. How will you define “stopping at traffic signal – break, break for time, car rolling slowly,
etc.”

Properties of Measurement

• After operationally defining independent and dependent variables, next step is to consider the level of
measurement of the dependent variable.

• There are four levels of measurement, each based on the characteristics, or properties, of the data,
which can be:

• Identity,

• magnitude,

• equal unit size, and

• Absolute zero.

• Identity – In this case numbers are allocated just for the sake of identification.

• The number cannot be used in mathematical operations. Thus numbers assigned are just to convey a
particular meaning.

• For instance we can assign 1 to Male, 2 to Female for our study.

• Similarly, if participants in a study have different political affiliations, they receive different scores
(numbers)

• Magnitude – a variable could have Identification and Magnitude as well,

• This means that numbers have an inherent order from smaller to large. For example, Position in Class,
Level of Education or Rank in an Organization
• Variables having Identity and Magnitude are measured on Ordinal Scale.

• Equal Intervals – also called Equal Unit Size means that difference between numbers anywhere on the
scale are the same

• In Most business researches, variables are taken as having equal interval or any variable where the
difference between two units is the same

• as difference between any of the following or previous two units for instance the difference between 4
and 5 is the same as the difference between 76 and 77 i-e 1.

• Variables with Equal Intervals, Magnitude and Identification Properties are measured on Interval Scale.

• Absolute/true zero – means that the zero as a response represents the absence of the property being
measured (e.g., no money, no behavior, none correct)

• In other words, A property of measurement in which assigning a score of zero indicates an absence of
the variable being measured

• However, temperature on 0 is not absolute zero as it still has some effect and we cannot say no
temperature.

SCALES OF MEASUREMENT

• The level, or scale, of measurement depends on the properties of the data. There are four scales of
measurement:

• nominal,

• ordinal,

• interval, and

• ratio.

• Each of these scales has one or more of the properties described in the previous section.

• We discuss the scales in order, from the one with the fewest properties to the one with the most, that
is, from the least to the most sophisticated.

• As we see in later modules, it is important to establish the scale of data measurement in order to
determine the appropriate statistical test to use when analyzing the data.

Nominal Scale

• From the Statistical point of view it is the lowest measurement level.


• Nominal Scale is assigned to items that is divided into categories without having any order or
structure,

• For example, Colors do not have any assigned order,

• We can have 5 colors like Red, Blue, Orange, Green and Yellow and could number them 1 to 5 or 5 to 1
or number them in a mix, here the numbers are assigned to color just for the purpose of identification,
and ordering them

• Ascending or Descending does not mean that Colors have an Order.

• The number gives us the identity of the category assigned.

• The only mathematical operation we can perform with nominal data is to count.

• Another example from research activities is a YES/NO scale, which is nominal.

• It has no order and there is no distance between YES and NO.

Ordinal Scale –

• Next up the list is the Ordinal Scale.

• Ordinal Scale is ranking of responses, for instance Ranking Cyclist at the end of the race at the position
1, 2 and 3.

• Not these are rank and the time distance between 1 and 2 may well not be the same as between 2 and
3, so the distance between points is not the same but there is an order present,

• when responses have an order but the distance between the response is not necessarily same, the
items are regarded or put into the Ordinal Scale.

• Therefore an ordinal scale lets the researcher interpret gross order and not the relative positional
distances.

• This is similar to three positions in a class – difference is there, but not equal.

• Ordinal Scale variables have the property of Identity and Magnitude.

• The numbers represent a quality being measured (identity) and can tell us whether a case has more of
the quality measured or less of the quality measured than another case (magnitude). The distance
between scale points is not equal. Ranked preferences are presented as an example of ordinal scales
encountered in everyday life.

Interval Scale
• A normal survey rating scale is an interval scale for instance when asked to rate satisfaction with a
training on a 5 point scale, from Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree, an
interval scale is being used.

• It is an interval scale because it is assumed to have equal distance between each of the scale elements
i.e. the Magnitude between Strongly Agree and Agree is assumed to be the same as Agree and Strongly
Agree.

• This means that we can interpret differences in the distance along the scale.

• We contrast this to an ordinal scale where we can only talk about differences in order, not differences
in the degree of order i.e. the distance between responses.

Properties of Interval Scales

• Interval scales have the properties of:

• Identity

• Magnitude

• Equal distance

• Variables which fulfill the above mentioned properties are put in this scale. The equal distance
between scale points helps in knowing how many units greater than, or less than, one case is from
another. The meaning of the distance between 25 and 35 is the same as the distance between 65 and
75.

Ratio Scale

• A Ratio Scale is at the top level of Measurement.

• The factor which clearly defines a ratio scale is that it has a true zero point.

• The simplest example of a ratio scale is the measurement of length (disregarding any philosophical
points about defining how we can identify zero length) or money.

• Having zero length or zero money means that there is no length and no money but zero temperature
is not an absolute zero, as it certainly has its effect.

• Ratio scales of measurement have all of the properties of the abstract number system.

Properties of Ratio Scale

• Identity

• Magnitude
• Equal distance

• Absolute/true zero

• These properties allow to apply all possible mathematical operations that include addition,
subtraction, multiplication, and division. The absolute/true zero allows us to know how many times
greater one case is than another. Variables falling in this category and having all the above mentioned
numerical properties fall in ratio scale.

http://www.mnestudies.com/research/types-measurement-scales

• Operational Definition of Anxiety

• Non verbal measures – Could be face expressions

• Physiological measures – blood pressure, respiration rate, etc. (These are measurable)

• Scale of Measurement

• Zip Code, P. O. Box No. – Nominal Scale

• Large, medium and small eggs – Ordinal Scale

• Reaction time – Ratio scale

• Score of SAT – Interval scale

• Class rank - Ordinal Scale

• No. of football Jersey – Nominal scale

• Miles per gallon – ratio scale

Type of Measures

• Self-Report Measures

• Tests • Behavioral Measures

• Physical Measures

Type of Measures

• Self-Report Measures: Self-report measures are typically administered as questionnaires or interviews


to measure how people report that they act, think, or feel.
• Thus self-report measures aid in collecting data on behavioral, cognitive, and affective events (Leary,
2001).

• Behavioral self-report measures typically ask people to report how often they do something such as
how often they eat a certain food, eat out at a restaurant, go to the gym, etc. – Just statement of facts
about one-self

• Cognitive self-report measures ask individuals to report what they think about something, such as
what do you think about canteen service – Involves judgement

• Affective self-report measures ask individuals to report how they feel about something. Questions
concerning emotional reactions such as happiness, depression, anxiety, or stress lie in Affective domain.
Many psychological tests are affective self- report measures. These tests also fit into the category of
measurement tests described in the next section.

• Tests: Tests are measurement instruments used to assess individual differences in various content
areas.

• Psychologists frequently use two types of tests: 1. Personality tests and 2. Ability tests.

• Many personality tests are also affective self-report measures; they are designed to measure aspects
of an individual s personality and feelings about certain things.

• Ability tests, however, are not self-report measures and generally fall into two different categories:
aptitude tests and achievement tests. Aptitude tests measure an individual s potential to do something,
whereas achievement tests measure an individual s competence in an area.•

• Behavioral Measures deals with carefully observing and recording behavior.

• Behavioral measures are often referred to as observational measures because they involve observing
what a participant does.

• Behavioral measures can be applied to anything a person or an animal. For example, the way men and
women carry their bags, or how many follow the road signs.

• The observations can be:

• Direct (while the participant is engaging in the behavior) or

• Indirect

• In direct observations, the participants may become cautious and may react in an unnatural way. This
affects the results. This response of participants is called “reactivity”

• Observers may hide themselves, or use a more indirect means of collecting the data (such as
videotape).
• Using an unobtrusive means of collecting data reduces reactivity, that is, participants reacting in an
unnatural way to being observed.

• Physical Measures - Physical measures are usually taken by means of equipment.

• Weight is measured with a scale, blood pressure with and temperature with a dedicated apparatus.

• Physical measures are much more objective than behavioral measures.

• A physical measure is not simply an observation. Instead, it is a measure of a physical activity that
takes place in the brain or body.

• This is not to say that physical measures are problem free.

• Keep in mind that humans are still responsible for running the equipment that takes the measures and
ultimately for interpreting the data provided by the measuring instrument. Thus even when using
physical measures, a researcher needs to be concerned with the accuracy of the data.

• Self-Report Measures – Subjective/ Objective

• Tests – Subjective/ Objective

• Behavioral Measures - Subjective/ Objective

• Physical Measures - Subjective/ Objective

Meaning of Reliability (Module 6)

• Reliability: An indication of the consistency or stability of a measuring instrument.

• In other words, the measuring instrument must measure exactly the same way every time it is used.

• This consistency means that individuals should receive a similar output each time they use the
measuring instrument.

• For example, a bathroom scale needs to be reliable, that is, it needs to measure the same way every
time an individual uses it, otherwise it is useless as a measuring instrument.

Error in Measurement

• Consider some of the problems with the four types of measures discussed in the previous module (i.e.,
self-report, tests, behavioral, and physical).

• Some problems, known as method errors, stem from the experimenter and the testing situation. For
example,

• Does the individual taking the measures know how to use the measuring instrument properly?

• Is the measuring equipment working correctly?


• Other problems, known as trait errors, stem from the participants.

• Were the participants being truthful?

• Did they feel well on the day of the test?

• Both types of problems can lead to measurement error. In fact, a measurement is a combination of
the true score and an error score.

• The true score is what the score on the measuring instrument would be if there were no error.

• The error score is any measurement error (method or trait) (Leary, 2001; Salkind, 1997).

• The following formula represents the observed score on a measure, that is, the score recorded for a
participant on the measuring instrument used.

• The observed score is the sum of the true score and the measurement error. Observed score = True
score + Measurement error

• The observed score becomes increasingly reliable (more consistent) as we minimize error and thus
have a more accurate true score.

• True scores should not vary much over time, but error scores can vary tremendously from one testing
session to another.

• How then can we minimize error in measurement?

• We can make sure that all the problems related to the four types of measures are minimized.

• These problems include those in recording or scoring data (method error) and those in understanding
instructions, motivation, fatigue, and the testing environment (trait error).

• The conceptual formula for reliability is:

𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑇𝑟𝑢𝑒 𝑆𝑐𝑜𝑟𝑒 / 𝑇𝑟𝑢𝑒 𝑆𝑐𝑜𝑟𝑒 + 𝐸𝑟𝑟𝑜𝑟 𝑆𝑐𝑜𝑟𝑒

• A reduction in error leads to an increase in reliability, i.e. if there is no error, reliability is equal to 1.00,
the highest possible reliability score.

• As error increases, reliability drops – The greater the error, the lower the reliability of a measure

How to Measure Reliability: Correlation Coefficients

• Reliability is measured using correlation coefficients.

• A correlation coefficient measures the degree of relationship between two sets of scores (reading)
and can vary between −1.00 and +1.00
• The stronger the relationship between the variables, the closer the coefficient to either −1.00 or +
1.00.

• Similarly, the weaker the relationship between the variables, the coefficient are closer to 0

• Suppose then that of individuals measured on two variables, the top-scoring individual on variable 1
was also top scoring on variable 2, the second-highest-scoring person on variable 1 was also the second
highest on variable 2, and so on down to the lowest- scoring person.

• In this case there would be a perfect positive correlation ( 1.00) between variables 1 and 2.

• In the case of a perfect negative correlation (− 1.00), the person having the highest score on variable 1
would have the lowest score on variable 2, the person with the second- highest score on variable 1
would have the second-lowest score on variable 2, and so on.

• In reality variables are almost never perfectly correlated.

• Thus most correlation coefficients are less than 1.

Types of Reliability

• There are four types of reliability:

• Test/retest reliability,

• Alternate-forms reliability,

• Split-half reliability, and

• Interrater reliability.

• Each type provides a measure of consistency, but they are used in different situations.

• Test/ Retest Reliability: This deals with repeating the same test on a second occasion, is known as
test/retest reliability

• If the test is reliable, we expect the results for each individual to be similar. That is, the resulting
correlation coefficient will be high (close to 1.00).

• This measure of reliability assesses the stability of a test over time.

• However, it is ideal case. Some error will be present in each measurement so the correlation
coefficient will not be 1.00 in most of the cases, but we expect it to be 0.80 or higher.

• Alternate-Forms Reliability – Using alternate forms of the testing instrument and correlating the
performance of individuals on the two different forms.
• In this case the tests taken at times 1 and 2 are different but equivalent or parallel (hence the terms
equivalent-forms reliability and parallel-forms reliability are also used)

• For example: 1. We want to find the reliability for a test of mathematics comprehension, so we create
a set of 100 questions that measure that construct. Then we randomly split the questions into two sets
of 50 (set A and set B), and administer those questions to the same group of students a week apart. i.e.
say one set of 50 questions on Monday and the next 50 questions to same students on say Friday or
next Monday and then results are correlated 2. We have made 100 different sample of same material.
We take 50 samples in one test and 50 in the same machine after some time. (This way we confirm
calibration also)

• Since, tests taken at times 1 and 2 are different but equivalent or parallel (hence the terms equivalent-
forms reliability and parallel-forms reliability are also used)

• Split-Half Reliability - a third means of establishing reliability is by splitting the items on the test into
equivalent halves and correlating scores on one half of the items with scores on the other half.

• This split-half reliability gives a measure of the equivalence of the content of the test but not of its
stability over time as test/retest, what we did in Alternate form reliability.

• The biggest problem with split-half reliability is determining how to divide the items so that the two
halves are in fact equivalent.

• For example, it would not be advisable to correlate scores on multiple choice questions with scores on
short-answer or essay questions.

• What is typically recommended is to correlate scores on even-numbered items with scores on odd-
numbered items. •

• Interrater Reliability – This measures the reliability of observers rather than tests

• It measure of consistency that assesses the agreement of observations made by two or more raters or
judges.

• Let us say that you are observing play behavior in children. Rather than simply making observations
on your own, it is advisable to have several independent observers collect data. • The observers all
watch the children playing but independently count the number and types of play behaviors they
observe.

• Once the data are collected, interrater reliability needs to be established by examining the percentage
of agreement among the raters.

• If the raters data are reliable, then the percentage of agreement should be high.

• If the raters are not paying close attention to what they are doing or if the measuring scale devised for
the various play behaviors is unclear, the percentage of agreement among observers will not be high.
• Although interrater reliability is measured using a correlation coefficient, the following formula offers
a quick means of estimating interrater reliability:

𝐼𝑛𝑡𝑒𝑟𝑟𝑎𝑡𝑒𝑟 𝑟𝑒𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡𝑠 / 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡𝑠 × 100

• Thus, if your observers agree 45 times out of a possible 50, the interrater reliability is 90% fairly high.

• However, if they agree only 20 times out of 50, then the interrater reliability is 40% low.

• Such a low level of agreement indicates a problem with the measuring instrument or with the
individuals using the instrument and should be of great concern to a researcher.

•Because different questions on the same topic are used,

• alternative-forms reliability tells us whether the questions measure the same concepts (equivalency).
• Whether individuals perform similarly on equivalent tests at different times indicates the stability of a
test.

Total = 250 times

Disagreement = 38 times

Therefore, No of times agreed = 250 – 38 = 212

Interrater reliability = 212 250 × 100 = 85%

85%, which is very high interrater agreement.

VALIDITY

• In addition to being reliable, measures must also be valid.

• Validity refers to whether a (statistical or scientific) study is able to draw conclusions that are in
agreement with statistical and scientific laws.

• This means if a conclusion is drawn from a given data set after experimentation, it is said to be
scientifically valid if the conclusion drawn from the experiment is scientific and relies on mathematical
and statistical laws.

• There are several types of validity;

• Like reliability, validity is measured by the use of correlation coefficients.

• For instance, if researchers developed a new test to measure any parameter, (such as depression),
they might establish the validity of the test by correlating scores on the new test with scores on an
already established measure of depression, and as with reliability we would expect the correlation to be
positive.

• Coefficients as low as 0.20 or 0.30 may establish the validity of a measure (Anastasi & Urbina, 1997).

• In brief it means that the results are most likely not due to chance

Content validity:

• A systematic examination of the test content to determine whether it covers a representative sample
of the domain of behaviors to be measured assesses content validity.

• This type of validity is important to make sure that the test or questionnaire that is prepared actually
covers all aspects of the variable that is being studied. If the test is too narrow, then it will not predict
what it claims.

• In other words, a test with content validity has items that satisfactorily assess the content being
examined.

• To determine whether a test has content validity, researchers consult experts in the area being tested.
(* In fact this is a challenge to EM students, who prepare questionnaire *)

Face Validity

• Face validity simply addresses whether or not a test looks valid on its surface. Does it appear to be an
adequate measure of the conceptual variable? It is generally confused with content validity

• This is just a face value

Criterion validity:

• The extent to which a measuring instrument accurately predicts behavior or ability in a given area
establishes criterion validity.

• Two types of criterion validity may be used, depending on whether the test is used to estimate
present performance (concurrent validity) or to predict future performance (predictive validity).

• The SAT and GRE are examples of tests that have predictive validity because performance on the tests
correlates with later performance in college and graduate school, respectively.

• The tests can be used with some degree of accuracy to predict future behavior.

• A test used to determine whether someone qualifies as a pilot is a measure of concurrent validity. The
test is estimating the person’s ability at the present time, not attempting to predict future outcomes.

• Thus concurrent validation is used for the diagnosis of existing status rather than the prediction of
future outcomes.
Construct Validity

• Construct validity is considered by many to be the most important type of validity.

• The construct validity of a test assesses the extent to which a measuring instrument accurately
measures a theoretical construct or trait that it is designed to measure.

• Some examples of theoretical constructs or traits are verbal fluency, neuroticism, depression, anxiety,
intelligence, and scholastic aptitude. One means of establishing construct validity is by correlating
performance on the test with performance on a test for which construct validity has already been
determined. Thus performance on a newly developed intelligence test might be correlated with
performance on an existing intelligence test for which construct validity has been previously established.
Another means of establishing construct validity is to show that the scores on the new test differ across
people with different levels of the trait being measured. For example, if a new test is designed to
measure depression, you can compare scores on the test for those known to be suffering from
depression with scores for those not suffering from depression. The new measure has construct validity
if it measures the construct of depression accurately.

1. Content and construct validity 2. Face validity 3. A test to measure something (different) other than
what it claims to measure – establish validity through experiment on other things. 4. It is a concern of
validity of the test, because it does not measure, what it is supposed to measure.

S-ar putea să vă placă și