Documente Academic
Documente Profesional
Documente Cultură
Danaida B. Marcelo, MS
CODING
▪ Assigning codes to qualitative data
▪ Using numbers to represent categories of
variables
Ex. Gender – “1” for males, “2” for females o Ex.
Exposure factor – “+” or “1” for exposed, or
“0” for unexposed
▪ The reason for doing this is that some
statistical software cannot process data that
have characters
❖ ORDINAL
✓ Categories can be ordered but differences
between data values either cannot be
determined or are meaningless
✓ Example: Patient status (0 = worse, 1 = stable, 2
= improved); values cannot be subtracted
because 0, 1, 2 are just labels and are
meaningless
✓ Includes the characteristics of the nominal scale
(categories)
✓ Assigns each measurement to one of a limited
number of categories and ranks them in
graded order
✓ Difference between “worse” and“stable” is not
measurable and at the same time is not equal to
the difference between “stable” and “improved”
Note: Aside from identifying the variables according to ✓ The stage of cancer is also an example of an
their role in the hypothesized relationship, it is also ordinal variable – stage 1 is the mildest stage while
important to define how are you going to measure these stage 4 is the most severe – but the
variables. It is imperative to define the variables that differences between stages are not measurable
will be included in a research or study according to its and are not equal
scale of measurement. The scale of measurement is o Examples: Disease severity (mild, moderate,
one factor considered when choosing appropriate severe), Degree of difficulty of exam
statistical techniques when analyzing study results. questions (easy, intermediate, difficult), APGAR
The 4 basic scales of measurement are the nominal, score, Pain score and satisfaction score (0-10)
ordinal, interval and ratio scale. In some books, these *Value are categorical but can order the values
four are termed as levels of measurement instead of because there were inherent order of the values
scales of measurement. That is because from nominal but cannot compute differences of these value
to interval there is an increasing refinement of
measurement. ❖ INTERVAL
✓ Like an ordinal scale, with the additional property
of having equal intervals; meaningful amounts
Page 3 of 13
of differences between data can be determined measurement (e.g., interval or ratio) rather than
✓ Numbers can be arranged in order (minimum to a lower one (nominal or ordinal).
maximum, maximum to minimum) • Certain variables can be measured in various
✓ Unlimited number of values that are equally spaced scales:
✓ There is no inherent (natural or true) zero ▪ Example 1: Age
starting point; the zero is just an arbitrary ❖ Young or old = NOMINAL
measure – therefore, cannot multiply nor divide ❖ In years = RATIO
✓Temperature measured in degrees Celsius = 0 ▪ Example 2: Hemoglobin level
o Normal/abnormal = NOMINAL
degree; Celsius does not mean absence of
o In mg/dl = RATIO
temperature; it is actually the freezing point
• In an ideal research, data is collected in
✓Examples: IQ score, Temperature (in ˚C) 30˚C-20˚C its highest form of scale of measurement
= 10˚C 40˚C-30˚C = 10˚C (can subtract) But we
cannot say that 40˚C is twice more than 20˚C
because of the absence of a true zero point (40 REVIEW: (from Professor’s presentation notes)
is hot, 20 is cold) In nominal measurement the numerical values just
"name" the attribute uniquely. No ordering of the cases
❖ RATIO is implied.
✓ Like the interval level modified to include the
(For example, jersey numbers in basketball are
inherent zero starting point
measures at the nominal level. A player with number 30
✓ 0 = absence of the characteristic, eg. 0 BP = is not more of anything than a player with number 15,
dead patient and is certainly not twice whatever number 15 is.)
✓ For values at this level, differences and ratios
are meaningful In ordinal measurement the attributes can be rank-
✓ Examples: Weight measured in kg; if a person ordered. Here, distances between attributes do not
is 80 kg in weight, then we can say he is twice have any meaning. ( For example, on a survey you might
heavier than a person of 40 kg weight, Height code Educational Attainment as 0=less than H.S.;
in cm, Number of patients seen in a day, 1=some H.S.; 2=H.S. degree; 3=some college; 4=college
Diastolic and systolic blood pressure (in degree; 5=post college.) In this measure, higher
mmHg), Hemoglobin level (µg/dl), Length of numbers mean more education. But is distance from 0
survival time to 1 same as 3 to 4? Of course not. The interval between
values is not interpretable in an ordinal measure.
In interval measurement the distance between
attributes does have meaning. For example, when we
measure temperature (in Fahrenheit), the distance from
30-40 is same as distance from 70-80. The interval
between values is interpretable. Because of this, it
makes sense to compute an average of an interval
variable, where it doesn't make sense to do so for
ordinal scales. But note that in interval measurement
ratios don't make any sense - 80 degrees is not twice as
hot as 40 degrees (although the attribute value is twice
as large).
Finally, in ratio measurement there is always an
absolute zero that is meaningful. This means that you
can construct a meaningful fraction (or ratio) with a
- It is important to consider the highest form of ratio variable.
measurement when collecting data (Weight is a ratio variable. In applied social research
- Nominal attributes are only named – WEAKEST most "count" variables are ratio, for example, the
SCALE OF MEASUREMENT number of clients in past six months. Why? Because you
- Ordinal attributes can be ordered can have zero clients and because it is meaningful to
- Interval attributes are like ordinal but can say that "...we had twice as many clients in the past six
compute for differences months as we did in the previous six months.“)
- Ratio attributes are like interval but can compute
for averages because there is an absolute zero
QUANTITATIVE DATA
✓ Variables that can be expressed numerically
▪ Interval
▪ Ratio
• Inferential statistics
➢ Concerned with analysis of data from a
sample leading to p r e d i c t i o n s o r
associations (inferences) about the target
population
➢Hypothesis testing
Page 5 of 13
NOTE: How to compute for Cumulative % ? NOTE: What to do?
Just add the following percentage like: Instead of displaying of each value of the Quantitative
Continuous values, we create class interval and then
25.0 + 65.2 = 90.2
construct the frequency distribution table
90.2 + 8.9 = 99.1
99.1 + 0.9 = 100.0 • CLASS INTERVALS
✓ Categories made from quantitative data
90.2% of the respondents have good quality of life ✓ Sets defined by a lower limit and an upper limit
when doing summary of the table, you don’t need to
enumerate the percentages just highlight the important Quantitative Continuous Variables
one. HISTOGRAM
Page 7 of 13
to largest (or vice versa)
Consider the following data on the duration of sleep (in o Find the middle value
hours) of 15 first year ✓ If the number of observations is odd, the
medical students the middle value is the median
night before a long n ✓ If the number of observations is even, the
exam: Xi median is the average of the two middle
X 1 + X 2 + X 3 + ... + X n ∑i =1 values
X= = ✓ (Middle position= {n+1}/2}
n n
Mean =
(2+4+6+3.5+2.5+6.
5+1.5+5.5+3+1.5+4
.5+3.7+4.2+1.6+3.4
)/15
Mean = 53.4/15 = 3.56
MODE
Page 9 of 13
• The simplest measure of variability
Measures of Dispersion / Variability (Spread) • Difference between the highest and lowest value
• Range = highest observation – lowest observation
Range = 6.5
– 1.5 = 5
INTERQUARTILE RANGE = Q3 - Q1
Page 10 of 13
Measures of Absolute Variability:
STANDARD DEVIATION
n
2
(x
∑ i − x )
i =1
s=
n −1
sd
cv = *100%
X
• Characteristics
➢ Useful in comparing the results obtained by
different persons who are conducting
measurements involving the same variable but
different scales of measurement (QOL using two
different indices)
➢ When there is no variability, SD = 0, CV = 0
✓When SD = mean, CV = 100%
✓ When SD > mean, CV > 100%
✓The higher the CV, the higher the variability
➢ CV = (SD/mean) x 100%
✓It can be used only to summarize quantitative
data
Page 12 of 13
SUMMARY: DATA ANALYSIS PART I END OF TRANSCRIPTION
Processing Data - GIGO
Describing data
◦ Type of variables
◦ Independent, dependent, extraneous
variables
◦ Nominal, ordinal, interval, ratio
◦ Qualitative, quantitative discrete,
quantitative continuous
◦ Frequency Distributions
◦ Shape of the distribution (normal
distribution,skewed) Transcription Team 2019
Transcribed by: Heidi Cruz
◦ Measures of central tendency (mean, median,
mode) Edited by: …
Page 13 of 13