Data Collection and Analysis in O & G

Data Collection and Analysis In Obstetrics and
Gynecology
Preamble:
 What is Data?
Measurable characteristics of a sampling unit (or subject) of a
population, that yields information about the population.
 Type of Data:
There are mainly two types. viz: Broadly, data can either be Categorical
or Numerical
 Categorical Data:
The simplest type of observation that is made on a subject that comes to
the clinic is the allocation (indeed the classification) of the subject to one of
only two categories that relate to the presence or absence of some
attributes.
Examples:
• Pregnant/Not Pregnant
• Married/Single
• Hypertensive/Normotensive
• Diabetic/Non-Diabetic.
More than two categories:

Examples:
• Marital Status: Married/Single/Divorced/Separated
• Blood group: A/B/AB/O
• Degree of pain: Minimal/Moderate/severe/unbearable
 Numerical Data:
There are two main types viz: Discrete and continuous.
Discrete Data: Arise when observations take certain numerical values

through counting.
Examples: Number of children, number of visits to ANC in a year, number of

ectopic heart beats in 24 hours, number of threatened abortions in the last two
years, etc.
Continuous Data: Usually obtained by some form of measurements.
1
Examples: Height, weight, age, body temperature, blood pressure, serum
cholesterol, etc.
 Other types of Data:

Censored Data: In many cases of life data, one could find that all of the
subjects in the sample may not have failed. That is, in some cases the event of
interest may not be observed or the exact times-to-failure of some of the
subjects may not be known. These types of data are commonly called censored
data and they are of three types; viz: right censored (or suspended), interval
censored and left censored data.
Right Censored (Suspended): These are the cases (of life data) composed of
subjects that did not fail.
Examples: 8 breast cancer cases, 5, failed at the end of experiment then the
remaining 3 would be regarded as suspended (right censored) data.
Interval Censored Data: Interval censored data results where there is

uncertainty as to the exact times the units failed within an interval.
Examples: Assuming units are being inspected every 6 hours say at 6:00 am,
12:00 noon, 6:00 pm and so on. Assuming 8 were surviving at 6:00 am and
when inspected at 12:00 noon only 7 were surviving. Then you can only say
that one failed between 6:00 am and 12:00 noon. The exact time when that
one failed would not be known.
Left Censored Data: In this case, failure time is only known to be before a
certain time.
Example: Suppose an experiment scheduled for inspection after 12 hours is
found to have failed before inspection. Thus, what is known is that the
experiment failed sometime before 12 hours (i.e. between 0 and 12 hours) but
nor exactly when.
 Variable:
• A Variable is any attribute, Phenomenon or event that can have different
values.
• A variable can either be quantitative of qualitative
• A quantitative variable describes a characteristic in terms of a numerical
value. The value may vary from subject to subject or from time to time in
the same subject. The value is expressed in units of measurement.
Examples: Height in meters, Blood pressure in mm/Hg, weight in kilograms,
etc.
2
• A qualitative variable describes the attribute of a characteristic (by
classifying it into categories to which the subject either belongs or does not
belong).
Examples: State of origin, Tribe or Ethnic group, etc.
Types of Variables: Two types: Continuous and Discrete.
 Continuous Variable:
A Variable with potentially infinite number of possible values in any interval. It
can assume either integral or fractional values and can be measured to
different levels of accuracy. Continuous variable is realized through actual
measurements.
Examples: Weight of babies delivered in a Health facility could be 314, 2.98.
2.94, 3.10 kg.
 Discrete Variable:
Can have a number of values in any interval. The values are invariably whole
numbers. They are integers. Discrete variable is usually realized through
counting.
Examples: Number of children in a family, number of clinic in a community,
number of children delivered within a given period in a Teaching Hospital, etc.
 Collection of Data (In O & G)

 Sources of Data:
There are two main sources of data in Healthcare delivery including O & G.
these are regular or routine system and Ad Hoc systems.
 Regular or Routine Data Collection Systems:

A regular or routine data collection system usually consists of established
procedures for collecting data (in the clinics) as they become available. This
could be at national, sub-national or institutional levels. This system provides
a rough indication of the frequency of occurrence of diseases and their
descriptive epidemiology, which serves as leads concerning disease etiology.
The sources of data in this system include information from: hospital (medical)
records, autopsy reports, physician records, etc.
3
Example: (Part of Patient’s Form)
Patient’s Name: -----------------------------------------
Patient’s Number:
Data of Registration:
Data of Birth:
Sex (1= male, 2 = female):
Marital status:
Religion:
Ethnic group/Tribe:
Height (m):
Weight (kg)
Syst Diast
Blood pressure (mm Hg):
Number of Pregnancies:
Number of Deliveries:
Number of Children Alive:
Number of Children Dead:
Number of Abortions:
• The advantage of this system of data collection is that it guarantees

availability of data in every specific area of healthcare delivery.
Ad Hoc Data Collection Systems:

Ad hoc data collection is usually in the form of a (Research) survey to gather
information that may not be available on a regular basis. This at times may
include special investigative studies or it could just be the collection of
additional information as part of the routine data collection. This system gives
a large coverage of the population.
4
Examples:
• An investigation of the effects of FGM on complications during delivery
• An investigation of breastfeeding practices among women who registered a
birth in the previous year.
• A study to investigate whether the use of hormonal contraceptives affect
the fertility status of the users.
• The Ad hoc data collection systems could be extensive, intensive and

expensive. However, an advantage of the Ad hoc system is that it provides
accurate and reliable data (when well conducted) in response to the
specific needs of the users. An important tool for ad hoc data collection
system is the use of adequate questionnaire.
 Good Questionnaire Design.
 Guidelines for Designing a Questionnaire
 ·Use simple language

 Avoid long complicated questions (avoid double
negatives)
 Be unambiguous – be clear and simple
 Do not ask general questions if you want specific
answers. Ask only valid questions.
 Do not ask leading questions
 Avoid hypothetical questions about situations outside
the people’s direct experience
 Be careful with embarrassing questions. Do not make
it too difficult for the respondents.
 Use minimum number of questions.
 Pre-coded questions enable you to analyse your replies
easily by the computer, but they may force people to
give wrong answers.
 People tend to choose the first response.
 Ask easy questions first and difficult questions last.
 Pre-test your questionnaires.
5
 Steps in the Planning of a Survey
Step 1 – Preparation of a detailed written statement of

the objectives of the survey.
 Step 2 – Determination of the items of information
required and methods of collection.
 Step 3 – Definition of the reference population on
which information is to be sought.
 Step 4 – decision on whether the reference population
is to be studied as a whole or in part (sample).
 Step 5 – Determination of the number of units in the
population to be selected for study during the survey
(sample size).
 Step 6 – Decision on how respondents will be selected
from the population (sampling method).
 Step 7 – Design, testing and validation of the
questionnaires on which observations will be recorded.
 Step 8 – Selection and training of enumerators
(interviewers).
 Step 9 – Collection of data.
Step 10 – Preparation for data analysis.
 Analysis of Data:
The general methodology for the analysis of data (in O & G) is of two types; viz:
Descriptive and Inferential.
Descriptive Statistics Approach for Data Analysis:
 Descriptive Statistics:
Descriptive statistics are the statistical tools for the organization and
summarization of data. They describe a set of data which eventually provides a
6
basis for a generalization about a population when only a sample is observed.
Descriptive statistics point up a characteristic of the population being studied.
Descriptive statistics simply summarize a mass of data into a few simple ideas.
In data analysis, descriptive statistics are presented in tables which provides
summary statistics for continuous, numeric variables. The summary statistics
includes:
• measures of central tendency such as mean, median and mode
• measures of dispersion (spread of the distribution) such as range and
standard deviation (including variance of the distribution)
• measures of distribution such as skewness and kurtosis which indicate
how much a distribution varies from a normal distribution.
In summary, descriptive statistics described a set of data which will provide a

basis for a generalization about a population when a sample is observed.
Thus, descriptive statistics point up a characteristic of the population being
studied. Descriptive statistics summarize a mass of data into a few simple
ideas.
 Organization and Presentation of data

Useful information is usually not immediately evident from a mass of raw data.
Collected data need to be organized in such a way that the information they
contain may clearly reveal the patterns of variation in the distribution.
Organization of data gives vent to the understanding of the structures and
characteristics of the data. Data are usually presented in either tabular or
diagrammatic forms.
 Tabular Presentation
This is the presentation of data in tables so as to organize them into a compact
and readily comprehensible form. For example, a frequency distribution table
gives the number of observations at different values or classes of the variable.
Tabular presentation could be handled as:
(a) Single variable frequencies:
• For a qualitative variable (such as the distribution of the state of origin of
100-women who visited the ANC in the last one year).
• For a large data set of a quantitative variable requiring grouping of the
data into classes (such as the distribution of the weight of new born babies
in a Teaching Hospital)
(b) Cross-tabulation:
• Two dimensional tables, in which two variables are cross-tabulated (such
as the cross-classification of weight of babies at birth and economic status
of their parents).
7
• Three-dimensional tables, in which three variables are cross-classified
(such as outcome of treatment by sex and by age group).
 Diagrammatic presentation
Diagrammatic presentation is the use of a diagram to show the distribution of
data. The methods of diagrammatic presentation of data are:
(a) Qualitative or Categorical Data
 Pie Charts
A circle is divided into sectors with areas proportional to the frequencies or the
relative frequencies of the categories of the variable.
 Bar Charts
The bars are constructed to show the frequency or relative frequency for each
category of the attribute. The bars are usually equal in width. It is important
that the vertical scale should start at zero; otherwise the heights of the bars
will not be proportional to the frequencies.
(b) Quantitative data

• Frequency Histograms
The chosen class intervals should not overlap and should cover the full range
of the data. The area of each bar (not just its height) should be proportional to
the frequency. Unequal class intervals are taken into account by the areas of
the bars.
• Frequency Polygons (Line Charts)

This is constructed by joining the midpoints of the top of each bar of a
histogram. This chart provides ease of visual comparison between two or more
distributions drawn on the same chart.
• Cumulative frequency polygons and cumulative frequency charts

(Ogives).
This is the chart in which the cumulative frequencies are plotted against the
upper tabulated limit for each class. In principle, the ogvie can be used to
estimate, by interpolation, the frequency of occurrence of a value of the
variable less than or equal to a specified value.
 Measures of Location:
One of the first statistics usually computed for a set of data is a measure of
central tendency such as the Mean, Median and the Mode.
The Mean:
8
Most frequently used in data analysis. The Mean may be considered as the
center of gravity of the distribution.
Mean:
∑ xi Raw data
i =1
X=
n
k
∑ f i xi
i =1
X= k Group data
∑ fi
i =1
The Median:
It is the point in the distribution with 50% of the measures of scores on each
side of it. That is, it is the midpoint of the distribution for even number of
n n+2
observations; the median occupies the point between th and th
2 2
positions when the values of the observations are arranged in order of
magnitude. When the number of observations is odd, the Median occupies the
n +1
th position in the ordered arrangements. For the grouped data case, the
2
Median is estimated by using the expression:
n 
 −Cf 
2
= L1 +  C
Median
i
fi
Where
L1 = lower class boundary of the median class
n= number of observations
C f = Cumulative frequency of the class just before the median class
Ci = Median class interval
f i = frequency of the median class
The Mode:
This is simply the value that occurs most frequently in the distribution. For
the grouped frequency case, the Mode is estimated by using the expression:
(f − fa ) × c
Mode = L1 +
( f − fa ) + ( f − fb )
9
Where
L1 = lower class boundary of the modal class
f= modal frequency
f a = Frequency of the class after the modal class
f b = Frequency of the class before the modal class
C = Modal class interval
 Measure of Variability (Measure of Spread)

 The Range:
The simplest way to describe the spread of a set of data is to quote the lowest
and highest values. The difference between the highest and lowest values given
the range of the distribution. It is however not satisfactory measure. It is
therefore not widely used.
 Variance:
This is the mean of the squared differences (deviations) between the mean and
each observed value. It is mathematically expressed as:
∑ ( xi − X )
n 2
 k
(
 ∑ f i xi − X ) 2 

Variance, S2 i =1 =  i =1 k 
=  
n−1  ∑ fi − 1 
 i =1 
 Standard Deviation:
The square root of the variance
∑ ( xi − X )
n 2
Standard deviation S i =1
=
n −1
 Inferential Statistics:
Usually when samples are studied, the investigator will be interested in going
beyond the sample and would want to make inference about the population
from which the sample was drawn. Thus, from the knowledge of the
descriptive statistics such as the mean and variance from sample values,
inferences about the same traits in the population are made. The use of
inferential statistics is basic to Medical research. The exploits in inferential
statistics include: Confidence Interval, Test of hypothesis, contingency Tables,
Nonparametric Tests, Regression and Correlation analysis, ANOVA, etc.
 Confidence Interval:
10
Confidence Interval combines the features of estimates from a sample with
known properties of the normal distribution to get an idea about the
uncertainty associated with a single sample estimate of the population
parameter. Confidence interval gives a range of values for which one can be
confident would include the true value.
C I for a Single Mean ( µ )

σ
The 100(1 − α )% C I = X ± Z (α ) .
2 n
s
OR X ± t n −1 (α 2).
n
C I for the Difference of Two Means ( µ1 − µ 2 )
σ 12 σ 22
The 100(1 − α )% C I = X − X 2 ± Z (α ) +
2 n1 n2
1 1
OR C I = X − X 2 ± t n1+ n2 −2 .(α 2) S p + ,
n1 n2
(n1 − 1) S12 + (n2 − 1) S 22

where Sp =
n1 + n2 − 2
C I for the Single Proportion (P)
p0 q0
The 100(1 − α )% C I = P ± Z (α ).
2 n
Difference of Two Proportions ( Ρ1 − Ρ2 )
The 100(1 − α )% C I = Ρ1 − Ρ 2 ± Z (α
( ) (
Ρ 1− Ρ Ρ 1− Ρ
+
)
2
). n1 n2
 Test of Statistical Significance

• Tests of significance are standard statistical procedures for drawing
inferences from sample estimates about unknown population parameters
11
• In medical research, tests of significance allow us to decide whether the sample estimates,
or differences between estimates are within their normal biological variation, commonly
called variability due to chance.
Procedure for testing statistical hypothesis
• State the null hypothesis
• State the alternative hypothesis (indicate 1 – tail or 2 – tail)
• State the level of significance (explain type 2 errors)
• Choose the test statistic (explain parametric and non-parametric tests)
• Compute the numerical value of the statistic from the observed data
• Compare the calculated value of test statistic with tabulated values in
appropriate standard distribution tables at a specified probability level of
significance
• Decide whether or not to reject the null hypothesis according to the p-value
Test for Single Mean:
Hypotheses Test Statistic Decision

Case 1 (right tail) X − µ0 Reject if Z > Z (α )
H 0 : µ = µ0 Z=
σ
n Reject if T > T (α )
H 1 : µ = µ1 > µ 0 OR
X − µ0
T=
S
n
Case 2 (left tail) X − µ0 Reject H0 if Z > Z (α )
H 0 : µ = µ0 Z=
σ
n (α )
Reject if T > t n −1
H 1 : µ = µ1 < µ 0 OR
X − µ0
T=
σ
n
Case 3 (two tailed) X − µ0 Reject H0 if Z > Z (α 2 )
H 0 : µ = µ0 Z=
σ
n
H 1 : µ = µ1 ≠ µ 0 OR Reject H0 if T > T (α 2)
12
X − µ0
T=
S
n
Test for Difference of Two Means:

H 0 : µ1 = µ 2
H 1 : (a) µ1 > µ 2
(b) µ1 < µ 2
(c) µ1 ≠ µ 2
Test statistics are created along the lines given for the test for single mean, and
the decisions follow accordingly.
Finally, Tests of proportions are handled by the use of Z~ test for large samples
or by the use of t – test for small samples.
 Contingency Tables:
Test for Associations between two categorical variables is by the use of the χ ~
2
distribution
The test statistic is:

n ( 0 i − ei ) 2
χ =∑
2
and the null hypothesis of no association is rejected
i =1 ei
whenever the calculated value of χ > χ υ (α )
2 2
where χ υ (α ) is the value of the chi-squared distribution with υ degrees of

2
freedom at α -level of significance.
 Nonparametric Tests:
In the tests for means, proportions and association, there is a fundamental
assumption of the knowledge of the distribution of the test statistics and
indeed the knowledge of the functional form of the distribution of the variables
under consideration. When there is no knowledge of the functional form of the
basic density function of the variables, then it is usually good to resort to the
Nonparametric test such as:
13
• The Wilcoxon (Rank sum) test
• The Mann-Whitney U – test
• The Median test
• The Sign test
 The Wilcoxon Test (Two Samples)

n
Test statistic: SW = ∑ R j where Rj, j = 1, 2, …, n are the ranks of the X S
j =1
m( N + 1)
SW −
2
Reject H0 when Z = > Z (α )
mn( N + 1) 2
12
The Mann-Whitney U – Test (Two Samples)
m(m + 1)
Test statistic: U = SW −
2
Where SW is as in Wilcoxon test
mn
U=
2
Reject H0 when Z = > Z (α )
mn( N + 1) 2
12
 Regression and Correlation:

A high proportion of data analyses are carried out to study the relationship
between two variables. The purposes of such analysis are:
• To assess whether the two variables are associated.

• To enable the value of one variable to be predicted from any known value
of the other variable
• To assess the amount of agreement between the values of the two
variables.
 Correlation:
Correlation is the method of analysis used when studying the measure of
relationship (association) between two continuous variables – e.g. – percentage
of body fat and age or normal adults. The actual measure of the association is
14
done by calculating the correlation coefficient r. The correlation coefficient r
can take any value between –1 and +1.
The Pearson’s measure of correlation coefficient is expressed as:
∑ ( X i − X )(Yi − Y )
n
i =1
r=
∑ ( X i − X ) ∑ (Yi − Y )
n 2 n 2
i =1 i =1
while the Spearman rank correlation coefficient is expressed as:

n
6∑ d i2
i =1
rs = 1 −
n(n + 1) 2
 Regression:
Linear regression describe the linear relationship between variables and can be
used to predict the value of one variable for an individual when we only known
the other variable. Consider a simple case of: Fetal weight (kg) and Non-
pregnant Maternal weight. Here we consider the fetal weight as the response
(or outcome) variable while the maternal weight is the predictor variable.
These are also called the dependent and independent variables respectively.
The linear relationship between the dependent (Y) and the independent (X)
variables is given as:
Y = α + βX
The estimate of α and β are:
∑ ( X i − X )(Yi − Y )
n
∧
β= i =1
∑(Xi − X )
n 2
i =1
∧ ∧
α =Y −β X
∧ ∧
Hence, Y =α+ β X
which is used for prediction.
Multiple Regression:
15
Y = α + β1 X 1 + β 2 X 2 + ... + β p X p
e.g. – obesity, smoking and snoring
YSnoring = α + β1 X Smoking + β 2 X Obesity
Logistic Regression:
Good for prediction for dichotomous variables.
 Simple Experimental Design

 One Way ANOVA
In research work or in the handling of patients, comparisons are often made

between several sets of data collected from basically similar populations, such
as treatments given to some groups of patients having the same ailment except
that different drugs were used for each group. Generally, any experiment
denoted to compare several treatments (source of variation) must embody two
important principles of experimental design viz: (i) Replication and (ii)
Randomization. The simplest experimental design which incorporates those
two principles is the completely Randomized design or simply also called the
one-way classification or the one-way analysis of variance involving one
factor appearing at different levels.
The null hypothesis we would wish to test is:

H0: µ 1 = µ 2 = ... = µ k = µ versus
H1: At least one of the µ k differs from µ .
Test for One-Way Classification
1. State H0 and H1
H0: µ 1 = µ 2 = ... = µ k
2. Choose the level of significance, α
3. Complete the ANOVA table
ANOVA TABLE
S. V. d. f. SS MS F-Ratio
16
Treatment k–1 SStr SStr/k–1 = MStr MS tr
= FCal
MS E
Error k(n – 1) SSE SSE/k(n–1)= MSE
Total kn – 1 SST
5. Under H0 and the assumptions in (3) being correct, Fcal under F – Ratio in
the ANOVA table has Fk-1,(n – 1) – distribution. Hence, we find the critical
point by reading off Fk-1,(n – 1) ( α ) from the F – distribution table for the
appropriate level of significance.
6. Compare the values of Fcal from the ANOVA table and Fk-1,(n – 1) ( α ) – from
the statistical table.
If Fcal > Fk-1,(n – 1) ( α ) then reject the null hypothesis.
7. Draw a conclusion.
Remark
When the sample sizes (i.e. the number of observations in each
treatment) are not all equal, necessary adjustment must be made in the
computation of sums of squares.
Example
Six patients each were tested on four types of oral contraceptive to
investigate the average reaction time.
Risk Estimation:
Disease
Yes No Total
Yes a b c+b
No c d c+d
Exposure Total a +c b+d n = a + b + c +
d
Relative Risk (RR)

RR estimates the magnitude of an association between exposure
and disease. It indicates the likelihood of developing the disease in
17
the exposed group relative to those who are not exposed. It is the
ratio of the incidence of disease in the exposed group divided by the
corresponding incidence of disease in the non-exposed group.
a /( a + b) a c + d a (c + d )
Thus, RR = = . =
c /(c + d ) a + b c c ( a + b)
Remarks:
1. RR of 1.0 indicates that the incidence rates of disease in the
exposed and non-exposed groups are identical and thus
indicates that there is no association observed between the
exposure and the disease.
2. A value of RR greater than 1.0 indicates a positive association
or an increase risk among those exposed (to a factor).
3. Analogously, a RR less than 1.0 means that there is inverse
association or a decrease risk among those exposed.
4. RR may change (in some cases) with time e.g. RR for 1 year
exposure might be different from RR for 10 years exposure.
Odd Ratio (for case – control cases)

Cases where participants are selected on the basis of their disease
status.
OR ≡ ratio of the odds of exposure among the cases to that
among the controls.
18
a
c ad
OR ≡ b
=
d
bc
Worked Examples
Example 1: Blood pressure levels were measured in 100 diabetic and 100
non-diabetic women aged 40 – 49 years. Mean systolic blood pressures were
146.4 mm Hg (with standard deviation of 18.5) among the diabetics and 140.4
mm Hg (with standard deviation of 16.8) among the non-diabetics. By making
the necessary assumptions, calculate the 95% confidence interval for the
difference of means of the blood pressures of the two groups of women.
Solution: Assume that the blood pressures of each of the two groups of
women are normally distributed. Hence, assume that the difference of means
of the blood pressures is also normally distributed.
Given is : 100(1 − α )% = 95%

⇒ 1 − α = 0.95
⇒ α = 0.05
⇒α = 0.025
2
The formula for 100(1 − α )% CI for difference of two means is:
S12 S 22
X 1 − X 2 ± Z (α ) . +
2 n1 n2
This is true since n1 = n2 = 100 are considered to be large values.
Substituting, we have Z (α 2 ) = Z (0.025 ) = 1.96
18.5 2 16.8 2
146.4 − 140.4 ± 1.96 +
100 100
i.e. 6 ± 1.96 × 2498979792
i.e. 6 ± 4.898
(1.102, 10.898)
∴ 95% confidence interval for the difference of mean is: 1.1 to 10.9
19
Example 2: A team of medical researchers wished to measure the level of
weight gained by users of oral contraceptives. The weights of 12 women were
taken before and after the use of the contraceptive within one year interval.
But unfortunately, one of the women died before the end of the year, and
therefore there was no result for her (this is indicated by * in the date set).
Estimate the weight of the woman that died before the experiment was
concluded.
Weights of Women
Before (X) After (Y)
50 61
55 61
60 59
65 71
70 80
75 76
79.5 *
80 90
85 106
90 98
95 100
100 114
Solution: First, we shall find the regression line

Y = α + βX by estimating α and β .
Complete the table:
x y x2 y2 xy
50 61 2500 3721 3050
55 61 3025 3721 3355
60 59 3600 3481 3540
65 71 4225 5041 4615
70 80 4900 6400 5600
75 76 5625 5776 5700
80 90 6400 8100 7200
85 106 7225 11236 9010
20
90 98 8100 9604 8820
95 100 9025 10000 9500
100 114 10000 12996 11400
825 916 64625 80076 71790
Using the result of the table we get
β = ∑ i 2i ∑ i ∑2 i = 1.1236
∧ n x y − x y
n∑ x i − ( ∑ x i )
∧
α = Y − β X = −0.9973
∧ ∧
∴ Y = α + β X = −0.9973 + 1.1236 X
Hence, when X = 79.5 we have
Y = −0.9973 + 1.1236 × 79.5 = 88.3289
That is, the estimated weight of the woman that died (after one year) would
have been 88.33kg.
Example 3: Serum amylase determination were made on a sample of 15

apparently healthy subjects. The sample yielded a mean of 96 units/100ml
and a standard deviation of 35 units/100 ml. The population variance was
unknown. Can one conclude that the mean of the population from which the
sample of Serum amylase determination came is different from 120.
Solution:
H 0 : µ = 120 = µ 0
H 1 : µ ≠ 120 ≠ µ 0
X − µ0
t=
test statistic is S
µ
Let α = 0.05
21
Since we have a two sided test we put α = 0.025 in each tail of the
2
distribution
∴ we find t14 (0.025 ) = 2.1448 (obtained from statistical table)
96 − 120
computed t,
t= = −2.65
35
15
∴ t = 2.65
Decision rule:
Since t = 2.65 > t14 (0.025 ) = 2.1448
We shall reject the null hypothesis.
Conclusion: Based on the given data we shall conclude that the mean of
the population from which the sample came is not 120.
Exercise:
At admission two groups of women on two different family planning methods in
clinical trials show the following characteristics.
Mean SD No. of women

Weight (kg)
Cycloprovera 56.83 12.48 42
HRP 102 59.29 15.47 48
Height (cm)
HRP 102 155.83 6.39 48
Age (years)
HRP 102 28.46 4.66 48
Systolic BP (mm Hg)

HRP 102 121.9 9.8 48
Diastolic BP (mm Hg)

HRP 102 78.9 7.9 48
Find whether the two groups differ substantially at admission
22
23

Data Collection and Analysis in O &amp; G

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Data Collection and Analysis in O &amp; G

Încărcat de

Drepturi de autor:

Formate disponibile

Data Collection and Analysis In Obstetrics and

More than two categories:

Discrete Data: Arise when observations take certain numerical values

Examples: Number of children, number of visits to ANC in a year, number of

Continuous Data: Usually obtained by some form of measurements.

 Other types of Data:

Interval Censored Data: Interval censored data results where there is

Types of Variables: Two types: Continuous and Discrete.

 Collection of Data (In O & G)

 Regular or Routine Data Collection Systems:

Patient’s Name: -----------------------------------------

Sex (1= male, 2 = female):

Blood pressure (mm Hg):

Number of Children Alive:

Number of Children Dead:

• The advantage of this system of data collection is that it guarantees

Ad Hoc Data Collection Systems:

• The Ad hoc data collection systems could be extensive, intensive and

 Good Questionnaire Design.

 Guidelines for Designing a Questionnaire

 ·Use simple language

Step 1 – Preparation of a detailed written statement of

Descriptive Statistics Approach for Data Analysis:

In summary, descriptive statistics described a set of data which will provide a

 Organization and Presentation of data

(b) Quantitative data

• Frequency Polygons (Line Charts)

• Cumulative frequency polygons and cumulative frequency charts

 Measure of Variability (Measure of Spread)

C I for a Single Mean ( µ )

C I for the Difference of Two Means ( µ1 − µ 2 )

(n1 − 1) S12 + (n2 − 1) S 22

C I for the Single Proportion (P)

Difference of Two Proportions ( Ρ1 − Ρ2 )

 Test of Statistical Significance

Hypotheses Test Statistic Decision

Test for Difference of Two Means:

The test statistic is:

where χ υ (α ) is the value of the chi-squared distribution with υ degrees of

freedom at α -level of significance.

 The Wilcoxon Test (Two Samples)

The Mann-Whitney U – Test (Two Samples)

 Regression and Correlation:

• To assess whether the two variables are associated.

The Pearson’s measure of correlation coefficient is expressed as:

while the Spearman rank correlation coefficient is expressed as:

The estimate of α and β are:

 Simple Experimental Design

In research work or in the handling of patients, comparisons are often made

The null hypothesis we would wish to test is:

2. Choose the level of significance, α

3. Complete the ANOVA table

Relative Risk (RR)

Odd Ratio (for case – control cases)

Given is : 100(1 − α )% = 95%

This is true since n1 = n2 = 100 are considered to be large values.

Substituting, we have Z (α 2 ) = Z (0.025 ) = 1.96

Solution: First, we shall find the regression line

Using the result of the table we get

Example 3: Serum amylase determination were made on a sample of 15

Mean SD No. of women

Systolic BP (mm Hg)

Diastolic BP (mm Hg)

Data Collection and Analysis in O & G

Data Collection and Analysis in O & G