BIOstastics CDE 2 7 17

Sunday, July 02, 2017 1
9/12/2017 2
INTRODUCTION
TO
About me
9/12/2017 4
Not a statistician!
So emphasis will be on practical

knowledge and application of
statistics,
rather than theorems and proofs.
9/12/2017 5
Why you should be here
Interest in statistical reasoning
A desire to learn to use statistics properly
in experimental design and data analysis
To develop your ability to critically assess
scientific (or pseudo-scientific) arguments
9/12/2017 6
What is expected of you
1-Attendance at most lectures o

2-Feedback to me -like and o
dislike -improved
Sunday, July 02, 2017 7

Objectives
Understand the
Fundamental principles
Common tests
Impact of violations
Be able to perform standard statistical
analyses.
9/12/2017 8
Introduction
Session ONE
9/12/2017 9
Some Basic concepts
Statistics is a field of study concerned with
1- collection, organization, summarization
and analysis of data.
2- drawing of inferences about a body of
data when only a part of the data is
observed.
Statisticians try to interpret and
communicate the results to others.
Larson/Farber 4th ed
9/12/2017 10
Branches of Statistics
Descriptive Statistics: Involves organizing,
summarizing, and displaying data.
Describes the important characteristics of the
data.
e.g. Tables, charts, averages, percentages
Inferential Statistics: Involves using sample data
to draw conclusions or make inferences about an
entire population.
9/12/2017 11
Data
The raw material of Statistics is data.
We may define data as figures. Figures
result from the process of counting or
from taking a measurement.
For example:
- When a hospital administrator counts
the number of patients (counting).
- When a nurse weighs a patient
(measurement)
9/12/2017 12
types of data
Qualitative
Nominal
Ordinal (rating scale)
Quantitative
Discrete
Continuous
Interval
Ratio
9/12/2017 13
Qualitative Data
Qualitative Data: Consists of non-numeric,
categorical attributes or labels
Major Place of birth Eye color
Common statistic calculated: percentages

9/12/2017 14
Quantitative Data
Quantitative data: Numerical measurements
or counts.
Age Weight of a Temperature
letter
Common statistic calculated: averages

9/12/2017 15
15
Quantitative Data:
Discrete vs. Continuous
Discrete data: finite number of possible data
values: 0, 1, 2, 3, 4.
e.g.: Number of classes a student is taking
Continuous data: infinite number of possible data

values on a continuous scale
e.g.: Weight of a baby
9/12/2017 16
16
Nominal Data
A type of data that are simply names or
labels. Cannot be ranked or ordered by
value objectively.
Gender-male/female
Do we have nominal data as example?
9/12/2017 17
Ordinal Data
A type of data that are names or labels

which can be sensibly ordered or
ranked.
Eg. Socioeconomic status
Do we have nominal data as example?
9/12/2017 18
Methods of Collecting Data
Observational study
Survey
Experiment
Simulation
9/12/2017 19
19
Data Collection
First problem : how to obtain the data.
It is important to obtain good, or
representative, data.
Inferences are made based on statistics
obtained from the data.
Inferences can only be as good as the data.
9/12/2017 20
Process of data collection:
Define the objectives of the survey or
experiment.
Example: Estimate the average life of an
electronic component.
Define the variable and population of interest.
Example: Length of time for anesthesia to wear
off after surgery.
Defining the data-collection and data- .
measuring schemes. (questionnaire, scale, ruler,
etc.).
Determine the appropriate descriptive or .
inferential
9/12/2017
data-analysis techniques. 21
TASK-1
Experiment: The
investigator controls or
modifies the
environment and
observes the effect on
the variable under
study.
Survey: Data are

obtained by sampling
some of the population
of interest. The
9/12/2017
investigator does not 22
modify the.
A variable
It is a characteristic that takes on different
values in different persons, places, or
things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a Dental clinic.
9/12/2017 23
Types of variables
Quantitative Qualitative
Quantitative Variables Qualitative Variables
It can be measured Many characteristics are
in the usual sense. not capable of being
For example: measured. Some of them
- the heights of can be ordered or
adult males, ranked.
- the weights of For example:
preschool children, - classification of people into
- the ages of socio-economic groups,
patients seen in a - social classes based on
- dental clinic. income, education, etc.
9/12/2017 24
Types of quantitative variables
Discrete Continuous
A discrete variable A continuous variable
is characterized by can assume any value within a
gaps or interruptions specified relevant interval of
in the values that it values assumed by the variable.
can assume.
For example:
For example: - Height,
- The number of daily - weight,
admissions to a - skull circumference.
general hospital,
- The number of No matter how close together the
decayed, missing or observed heights of two people,
filled teeth per child we can find another person
whose height falls somewhere
- in an in between.
- elementary
9/12/2017 25
- school.
TASK-2
Which one of the following is an example of
continuous numerical data?
A =Number of runs made by a cricket player.
B =Speed of a car captured by a speed camera.
C =Your favourite secondary school year level.
D= Shoe sizes.
E Labor / Liberal preference of 100 people
surveyed.
9/12/2017 26
THAT WAS GREAT
9/12/2017 27
Session TWO
9/12/2017 28
Strategies
for understanding the
meanings
of
Data
Key words
frequency table, bar chart ,range

width of interval , mid-interval
Histogram , Polygon
9/12/2017 30
Descriptive Statistics
Frequency Distribution
for Discrete Random Variables
Example:
Suppose that we take a
No. of Frequency Relative
sample of size 16 from
children in a primary school decayed Frequency
and get the following data teeth
about the number of their
decayed teeth, 0 1 0.0625
3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1 1 2 0.125
2 4 0.25
To construct a frequency
table: 3 5 0.3125
4 2 0.125
1- Order the values from the
smallest to the largest. 5 2 0.125
0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5
2- Count how many Total 16 1
Representing the simple
frequency table using the bar
chart
We can represent 6
the above simple
frequency table 5
using the bar

5
chart. 4
4
2
2 2 2
Frequency
1
1
0
.00 1.00 2.00 3.00 4.00 5.00
9/12/2017 Number of decayed teeth 32

Frequency Distribution
for Continuous Random Variables
For large samples, we cant use the simple
frequency table to represent the data.
We need to divide the data into groups or intervals
or classes.
So, we need to determine:
The number of intervals (k).

The range (R).
The Width of the interval (w).
9/12/2017 33
The Cumulative Frequency:
computed by adding successive frequencies.
The Cumulative Relative Frequency:

computed by adding successive relative
frequencies.
The Mid-interval:
computed by adding the lower bound of the
interval plus the upper bound of it and then divide
over 2.
9/12/2017 34
TASK-3
The following table represents the cumulative frequency, the
relative frequency, the cumulative relative frequency and the
R.f= freq/n
mid-interval.
Class Mid Frequen Cumulativ Relative Cumulati
interval interval cy e Frequency ve
Freq (f) Frequency R.f Relative
Frequenc
y
30 39 34.5 11 11 0.0582 0.0582

40 49 44.5 46 57 0.2434 -
50 59 54.5 - 127 - 0.6720
60 69 - 45 - 0.2381 0.9101
70 79 74.5 16 188 0.0847 0.9948
80 89 84.5 1 189 0.0053 1

Total
9/12/2017 189 1 35
Solution
From the above frequency table, complete
the table then answer the following
questions:
1-The number of objects with age less than 50
years ?
2-The number of objects with age between 40-69
years ?
3-Relative frequency of objects with age between
70-79 years ?
4-Relative frequency of objects with age more
than 69 years ?
5-The percentage of objects with age between 40-
49 years ?
9/12/2017 36
9/12/2017 37
Session FOUR
9/12/2017 38
Measures of Central
key words:
Descriptive Statistic, measure of
central tendency ,statistic, parameter,
mean () ,median, mode.
9/12/2017 40
The Statistic and The
Parameter
A Statistic:
It is a descriptive measure computed from the
data of a sample.
A Parameter:
It is a a descriptive measure computed from
the data of a population.
9/12/2017 41
Measures of Central
Tendency
A measure of central tendency is a measure which
indicates where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the Mode.
9/12/2017 42
Mean
-Average of a group of numbers

-Helpful to know the mean because
then you can see which numbers are
above and below the mean
-Very easy to find!
9/12/2017 43
Mean Example
Here is an example test scores for PEDO class.
82 97 86 93 82
To find the Mean, first you must add up all of the
numbers.
82+93+86+97+82= 433
Now, since there are 5 test scores, we will next
divide the sum by 5.
4405= 88
9/12/2017
The Mean is 88 44
Median
-The Median is the middle value on the

list.
-The first step is always to put the
numbers in order.
9/12/2017 45
Median Example
First, lets examine these five test scores.
79 97 86 93 78
We need to put them in order.

97 93 86 79 78
The number in the middle is 86

97 93 86 79 78
In this case, the Median is 86

9/12/2017 46
Mode
-The Mode refers to the number that

occurs the most frequently.
-Its easy to remember the first two
numbers are the same! MOde and MOst
Frequently!
9/12/2017 47
Mode Example
Here is an list of temperatures for one week.

Mon. Tues. Wed. Thurs. Fri. Sat. Sun.
77 79 83 77 83 77 82
Again, We will put them in order.

77 77 77 79 82 83 83
77 is the most frequent number, so the mode= 77
9/12/2017 48
Range
-The range is the difference between the

highest and the lowest numbers of the
series.
-All we have to do is put the numbers in
order and subtract!
9/12/2017 49
Range Example
Lets look at the temperatures again.

77 77 77 79 82 83 83
The highest number is 83, and the lowest is

77.
All you need to do is subtract!
83-77= 6
In this case, the Range is 6
9/12/2017 50
Properties of the
Mean:.Median,Mode,Range
Uniqueness. For a given set of data there is
one and only one mean,
Simplicity. It is easy to understand and to
compute.
Affected by extreme values. Since all
values enter into the computation.
9/12/2017 51
TASK-4
Now YOU try it!!!
This is the Stat Family!
Dad Mom
Alex Jack Katie
34
9/12/2017 33 5 5 1 52
Mean
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
What is the Mean?
Remember Mean is the AVERAGE
Try it on your paper and see what you come up

9/12/2017 with! 53
Mean
Remember, to find the mean, we have to first add up
all of the numbers.
34+33+5+5+1= 80
Then, since there are 6 people in the family, we next

divide by 6.
805= 16
The Mean in this case is 16

9/12/2017 54
Median
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie-
1
What is the Median?
Remember Median is the MIDDLE

NUMBER
Try it on your paper and see what you come

up with!
9/12/2017 55
Median
Remember, to find the mean, we have to
first put all of the numbers in order.
1 5 5 33 34
The Mean in this case is 5
9/12/2017 56
Mode
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
What is the Mode?
Remember Mode is the MOST FREQUENT

up with!
9/12/2017 57
Mode
Remember, to find the mode, we have to
first put all of the numbers in order.
1 5 5 33 34
The Mode in this case is 5
9/12/2017 58
Range
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie-
1
What is the Range?
Remember Range is the DIFFERENCE

9/12/2017 59
Range
Remember, to find the range, we have to first
put all of the numbers in order.
1 5 5 33 34
The highest age is 34, and the lowest is 1
Now we need to subtract to find the difference

34-1= 33
The range is 33
9/12/2017 60
9/12/2017 61
Session FIVE
9/12/2017 62
Measures of Dispersion
Important statistical terms
Population:
a set which includes all
measurements of interest
to the researcher
(The collection of all
responses, measurements,
or counts that are of interest)
Sample:
A subset of the population
9/12/2017 64
Learning Objectives
Determine when to use sampling instead of a
census.
Distinguish between random and nonrandom
sampling.
Decide when and how to use various sampling
techniques.
Be aware of the different types of error that can
occur in a study.
Understand the impact of the Central Limit
Theorem
9/12/2017
on statistical analysis . p 65
Reasons for Sampling
Sampling can save money.
Sampling can save time.
For given resources, sampling can broaden
the scope of the data set.
Because the research process is sometimes
destructive, the sample can save product.
If accessing the population is impossible;
sampling is the only option.
9/12/2017 66
Types of Sampling
Simple Random Sample
Stratified Random Sample
Cluster sampling
Systematic
Convenience
Math
Alliance
9/12/2017 Project
67
Simple Random Sample
Every subset of a specified size n
from the population has an equal
chance of being selected
Math
Alliance
9/12/2017 68
Project
Stratified Random Sample
The population is divided into two or more
groups called strata, according to some
criterion, such as geographic location,
grade level, age, or income, and
subsamples are randomly selected from
each strata.
Math
Alliance
Project
9/12/2017 69
Cluster Sample
The population is divided into subgroups
(clusters) like families. A simple random
sample is taken of the subgroups and
then all members of the cluster selected
are surveyed.
Math
Alliance
9/12/2017 70Project
Systematic Sample
Every n th member ( for example: every
10th person) is selected from a list of all
population members.
Math
Alliance
9/12/2017 Project
71
Convenience Sample
Selection of whichever individuals are
easiest to reach
It is done at the convenience of the
researcher
Math
Alliance
9/12/2017 Project
72
Errors in Sampling
1-Interview error- interaction between
interviewer and person being surveyed
2-Respondent error: respondents have difficult
time answering the question
3-Measurement error: inaccurate responses
when person doesnt understand question or
poorly worded question
4-Errors in data collection
Math
Alliance
9/12/2017 73 Project
9/12/2017 74
9/12/2017 75
SAMPLE SIZE FORMULA
95% Confidence interval is given by point
estimate +_ 1.96(standard error)
Single mean:
sd= sd / (n)2
9/12/2017 76
TASK-5
A large elementary school has 15 a) No, because the teachers
classrooms, with 24 children in were not selected
each classroom. A sample of 30 randomly
children is chosen by the following
procedure:
b) No, because not all
possible groups of 30
children had the same
Each of the 15 teachers selects 2 chance of being chosen
children from his or her classroom
to be in the sample by numbering c) No, because not all
the children from 1 to 24, using a children had the same
random digit table to select two chance of being chosen
different random numbers between d) Yes, because each child
01 and 24. The 2 children with had the same chance of
those numbers are in the sample. being chosen
e) Yes, because the
Did this procedure give a simple numbers were assigned Math
random sample of 30 children from randomly to the children 77Alliance
9/12/2017 Project
the elementary school?
9/12/2017 78
Session
SIX
9/12/2017 79
key words:
Descriptive Statistic, measure of
dispersion , range ,variance, coefficient of
variation.
9/12/2017 80
Measures of Dispersion:
A measure of dispersion conveys information
regarding the amount of variability present in a set
of data.
Note:
1. If all the values are the same
There is no dispersion .
2. If all the values are different
There is a dispersion:
3.If the values close to each other
The amount of Dispersion small.
b) If the values are widely scattered
9/12/2017 81
Measures of Dispersion
Variation
Range Variance Standard Coefficien

Deviation t of
Variation
Measures of variation
give information on the
spread or variability or
dispersion of the data
values. Same centre,
different
9/12/2017
variation 82
** Measures of Dispersion are :
1.Range (R).
2. Variance.
3. Standard deviation.
4.Coefficient of variation (C.V).
9/12/2017 83
Range
The range is the difference between the
largest and the smallest values in the dataset
i.e. the maximum difference between data-
points in the list.
It is sensitive to only the most extreme values
in the list. The range of a list is 0 if and only
if all the data-points in the list are equal.
xL xS
4 16 Days
9/12/2017 Range 84
Pros & Cons
Advantages Disadvantages
best for symmetric doesnt use all of the

data with no outliers data, only the extremes
very much affected if
easy to compute and the extremes are outliers
understand
only shows maximum
spread, does not show
good option for shape
ordinal data
9/12/2017 85
TASK-6
Using the student age data find the

range of the data.
18, 19, 18, 25, 22, 20, 21, 45, 33,

20, 18, 18
9/12/2017 86
Inter-quartile Range
(upper quartile lower quartile)
Essentially describes how much the middle 50% of your

dataset varies
example: if all patients in a Dental Surgery took more-

or-less the same time to be treated with only one or two
exceptionally quick or long appointments you would
expect the inter-quartile range to be very small
but if all appointments were either very quick or very

87
long, with few in between then the inter-quartile range
9/12/2017
Pros & Cons
Advantages Disadvantages
Good for ordinal Harder to calculate

data and understand
Doesnt use all the

information (ignores
Ignores extreme half of the data-points,
values not just the outliers)
Tails almost always
matter in data and
More stable than the these arent included
range because it
9/12/2017 88
TASK-7

inter-quartile range.
18, 19, 18, 25, 22, 20, 21, 45, 33, 20,
18, 18
9/12/2017 89
The Variance:
It measure dispersion relative to the scatter of
the values a bout there mean.
a) Sample Variance (S 2 ) :
n
(x
2 ,where
i x)
x
is sample
S
2 i 1
n 1 mean
2
b)Population Variance ( S ) :
where x, is Population mean
9/12/2017 90
Variance
Advantages:
uses all of the data values
Disadvantages:
the variance is measured in the original
units squared
extreme values or outliers effect the
variance considerably
hard to calculate manually
9/12/2017 91
N
i
( x ) 2

2 i 1
The Standard Deviation: N

is the square root of variance= 2
a) Sample Standard Deviation = S =

Varince
b) Population Standard Deviation = = 2
S
2
9/12/2017 92
Standard Deviation
Advantages:
same units of measurement as the values
useful in theoretical work and statistical
methods and inference
Disadvantages:
hard to calculate manually
9/12/2017 93
TASK-8

variance and the standard deviation
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18,
18
9/12/2017 94
.The Coefficient of Variation
(C.V):
Is a measure use to compare the
dispersion in two sets of data which is
independent of the unit of the
measurement .
S
C.V X (100) where S: Sample standard
deviation.
X : Sample mean.
9/12/2017 95
TASK-9
Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
9/12/2017 96
Solution:
c.v (Sample1)= (10/145)*100= 6.9
c.v (Sample2)= (10/80)*100= 12.5
Then age of 11-years old(sample2) is

more variation
9/12/2017 97
9/12/2017 98
Session SEVEN
9/12/2017 99
Measures of Disease
Frequency
FRACTIONS USED IN DESCRIBING
DISEASE FREQUENCY
RATIO
A fraction in which the numerator is not part
of the denominator.
e.g. Fetal death ratio:
Fetal deaths/live births.
Fetal deaths are not included among live births,
by definition.
9/12/2017 101
PROPORTION
A fraction in which the numerator is part of the

denominator.
e.g. Fetal death rate: Fetal deaths/all births
All births includes both live births and fetal
deaths.
Synonyms for proportions are: a risk and, (if

expressed per 100) a percentage.
Most fractions in epidemiology are
proportions.
9/12/2017 102
RATE
Ideally, a proportion in which change
over time is considered, but in
practice, often used interchangeably
with proportion, without reference to
time, (as I did previously for fetal
death rate).
9/12/2017 103
PREVALENCE RATE
Divided into two types:

Point prevalence rate .1
Period prevalence rate .2
9/12/2017 104
POINT PREVALENCE RATE
Proportion of individuals in a
specified population at risk who
have the disease of interest at a
given point in time.
9/12/2017 105
PERIOD PREVALENCE RATE
Proportion of individuals in a
specified population at risk who
have the disease of interest over a
specified period of time.
For example:
annual prevalence rate
lifetime prevalence rate.
(When the type of prevalence rate is not
specified it is usually point prevalence, or its
9/12/2017 106
INCIDENCE RATE
Like prevalence, divided into two
types:
1. Cumulative incidence rate
2. Incidence density
9/12/2017 107
Cumulative incidence rate:
Number of new cases of disease

occurring over a specified period
of time in a population at risk at the
beginning of the interval.
9/12/2017 108
Incidence density:
Number of new cases of disease

occurring over a specified period of
time in a population at risk
throughout the interval.
9/12/2017 109
Session
EIGHT
9/12/2017 110
Probability
The Basis of the
Statistical inference
Key words:
Probability, objective Probability,

subjective Probability,
Conditional Probability,, Bayes
theorem
9/12/2017 112
Introduction
The concept of probability is frequently
encountered in everyday communication.
For example, a physician may say that a
patient has a 50-50 chance of surviving a
certain operation.

Another physician may say that she is 95
percent certain that a patient has a
particular disease.
.
9/12/2017 113

Two views of Probability
Objective Probability
Classical and Relative

9/12/2017 114
Classical Probability : If an event
can occur in N mutually exclusive
and equally likely ways, and if m of
these possess a triat, E, the
probability of the occurrence of event
E is equal to m/ N .
For Example: in the rolling of the
die , each of the six sides is equally
likely to be observed . So, the
probability that a 4 will be observed
9/12/2017
is equal to 1/6. 115
Relative Frequency Probability:
Def: If some posses is repeated a
large number of times, n, and if some
resulting event E occurs m times , the
relative frequency of occurrence of E
, m/n will be approximately equal to
probability of E . P(E) = m/n .
9/12/2017 116
Subjective Probability :

Probability measures the confidence

that a particular individual has in the
truth of a particular proposition.
For Example : the probability that a
cure for cancer will be discovered
within the next 10 years.
9/12/2017 117
Rules of Probability
1-Addition Rule
P(A U B)= P(A) + P(B) P (AB )
2- If A and B are mutually exclusive (disjoint)

,then
P (AB ) = 0
Then , addition rule is
P(A B)= P(A) + P(B) .
3- Complementary Rule
P(A' )= 1 P(A)
where, A' = = complement event
9/12/2017 118
Baye's Theorem
9/12/2017 119
Definition.1
The sensitivity of the symptom
This is the probability of a positive result
given that the subject has the disease. It is
denoted by P(T|D)
Definition.2
The specificity of the symptom
This is the probability of negative result

given that the subject does not have the
disease.
9/12/2017 120
TASK-10
A medical research team wished to evaluate a proposed screening test for

Alzheimers disease. The test was given to a random sample of 450 patients
with Alzheimers disease and an independent random sample of 500 patients
without symptoms of the disease. The two samples were drawn from
populations of subjects who were 65 years or older. The results are as follows.
Test Result Yes (D) No (D ) Total
Positive(T)T 436 5 441
Negativ( ) 14 495 509
Total 450 500 950

9/12/2017 121
In the context of this example
a)What is a false positive?
A false positive is when the test indicates
a positive result (T) when the person
T
does
not have the disease
436
P(T | D) 0.9689
450
9/12/2017 122
What is the false negative?
A false negative is when a test indicates a
negative result ( ) when the person has the
disease (D).
c) Compute the sensitivity of the symptom.
d) Compute the specificity of the symptom.

495
P(T | D) 0.99
9/12/2017
500 123
Session NINE
9/12/2017 124
Cohort Studies
9/12/2017 125
Cohort Studies
Group by common characteristics
Start with a group of subjects who
lack a positive history of the
outcome of interest yet are at risk for
it (cohort).
Think of going from cause to
effect.
When is a cohort
study warranted?
When good evidence suggests an

association of a disease with a certain
exposure or exposures.
Types of Cohort Studies
Prospective (concurrent)
Retrospective (historical)
Restricted (restricted exposures)
Prospective Studies
Also called
longitudinal
concurrent
incidence studies
Looking into the future
Example:
Framingham Study of coronary heart
disease (CHD)
Design of a Cohort Experiment
Advantages of
Prospective Cohort Studies
Captive groups
Large sample sizes
Certain diseases or risk factors targeted
Can be used to prove cause-effect
Assess magnitude of risk
Baseline of rates
Number and proportion of cases that can be
prevented
Disadvantages of Prospective
Cohort Studies
Large study populations required
not easy to find subjects
Expensive
Unpredictable variables
Results not extrapolated to general
population
Study results are limited
Time consuming/results are delayed
Requires rigid design and conditions
Incidence rate
Incidence among exposed =

a
a+b
Incidence among non-exposed =
c
c+d
Estimation of risk
Relative Risk
RR = incidence of disease among
exposed
______________________________
Incidence of disease among non-
exposed
a/ a+b
_________ =
c/c+d
AR = Estimation of Risk
Attributable Risk
Incidence of disease among exposed
incidence of disease among non
exposed
_______________________________
Incidence of disease among exposed
a/a+b c/c+d
AR = _______________
a/a+b
Smoking Lung cancer Total
YES NO
YES 70 6930 7000
NO 3 2997 3000
73 9927 10000
Find out RR and AR for above data
TASK-11
Incidence of lung cancer among smokers
70/7000 = 10 per 1000
Incidence of lung cancer among non-smokers
3/3000 = 1 per thousand
RR = 10 / 1 = 10
(lung cancer is 10 times more common among
smokers than non smokers)
AR = 10 1 / 10 X 100
= 90 %
(90% of the cases of lung cancer among
smokers are attributed to their habit of smoking)
Session TEN
9/12/2017 138
Epidemiologic study
designs
9/12/2017 139
Learning Objectives
To understand the concepts of

different study designs
To learn about the advantages
and disadvantages of several
study designs
Case series
Case Series report new diseases
or health related problems.
They may provide some
descriptive data on exposures to
potential causal factors
Cross-sectional studies
Cross-Sectional Studies
measure existing disease
and current exposure
levels.
They provide some indication
of the relationship between
the disease and exposure or
non-exposure
(contd) Cross Sectional Studies
Advantages
Good design for hypothesis
generation
Can estimate overall and specific
disease prevalence and sometimes
rates
Can estimate exposure proportions in
the population
Can study multiple exposures or
multiple outcomes or diseases
Cross Sectional Studies
Disadvantages
Impractical for rare diseases
Not a useful type of study for
establishing causal relationships
Confounding is difficult to control
No control over sample size for
each exposure by disease subclass
Case-control studies
Case-Control Studies identify existing
disease/s and look back in
previous years to identify previous
exposures to causal factors.
Cases are those who have a disease.
Controls are those without a
disease.
Analyses examine if exposure levels
are different between the groups.
Advantages
Best design for rare diseases
Can be accomplished quickly
since events of interest have already
occurred
Can study several potential
exposures at the same time
Lends itself well to hospital-
based studies and outbreaks
Disadvantages
Problems with temporal sequence of
data
Hard to decide when disease was
actually acquired
Disease may cure the exposure
Miss diseases still in latent period
Cant calculate incidence, population
relative risk or attributable risk
HIGH potential for bias
Experimental Studies
Clinical trials provide the gold
standard of determining the
relationship between garlic and
cardiovascular disease prevention.

Clinical Trials
Randomized
Double-blind
Placebo-controlled
Clinical Trial
Treatment
R Outcomes
Group
a
n
Study d
o
Population
m
i
z Control Group Outcomes
e
Session ELEVEN
9/12/2017 152
Basic Research
Methods
Why do research?
Validate intuition
Improve methods
Demands of the Job
For publication
Choose a subject
Based on an idea
Based on your experience
Based on your reading
Originality
Choose a study design
Case report
Case series
Case controlled study
Cross sectional
Cohort
Retrospective comparison
Prospective Comparison
Session TWELVE
9/12/2017 157
Tests of Significance
9/12/2017 158
In hypothesis testing, the null
hypothesis is either accepted or
rejected, depending on whether the p
value is above or below a
predetermined cut-off point,known as
the Significance level of the test,
usually it is taken as 5% level.
9/12/2017 159
t tests
One sample
Two independent sample
ANOVA
Difference among the group
Turkeys test
Scheffee tests
mann-whitney tests
160
9/12/2017
Two paired sample
ODDS RATIO
ODDS are defined to be the ratio of
probability of success to the probability
of failure.
The estimate of population odds ratio is
a / b ad
OR
cld bc
9/12/2017 161
THANK YOU
11-3-11 162
CONCLUSION
Chances only favors trained mind
Louis
11-3-11 163
Bibliography
11-3-11 164
QUESTIONS?
11-3-11 165

BIOstastics CDE 2 7 17

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

BIOstastics CDE 2 7 17

Încărcat de

Drepturi de autor:

Formate disponibile

Sunday, July 02, 2017 1

So emphasis will be on practical

1-Attendance at most lectures o

Sunday, July 02, 2017 7

Major Place of birth Eye color

Common statistic calculated: percentages

Common statistic calculated: averages

Continuous data: infinite number of possible data

A type of data that are names or labels

Survey: Data are

frequency table, bar chart ,range

using the bar

9/12/2017 Number of decayed teeth 32

The number of intervals (k).

The Cumulative Relative Frequency:

30 39 34.5 11 11 0.0582 0.0582

80 89 84.5 1 189 0.0053 1

-Average of a group of numbers

-The Median is the middle value on the

We need to put them in order.

The number in the middle is 86

In this case, the Median is 86

-The Mode refers to the number that

Here is an list of temperatures for one week.

Again, We will put them in order.

77 is the most frequent number, so the mode= 77

-The range is the difference between the

Lets look at the temperatures again.

The highest number is 83, and the lowest is

What is the Mean?

Remember Mean is the AVERAGE

Try it on your paper and see what you come up

Then, since there are 6 people in the family, we next

The Mean in this case is 16

What is the Median?

Remember Median is the MIDDLE

Try it on your paper and see what you come

The Mean in this case is 5

What is the Mode?

Remember Mode is the MOST FREQUENT

Try it on your paper and see what you come

The Mode in this case is 5

What is the Range?

Remember Range is the DIFFERENCE

Try it on your paper and see what you come

The highest age is 34, and the lowest is 1

Now we need to subtract to find the difference

Simple Random Sample

Stratified Random Sample

estimate +_ 1.96(standard error)

Range Variance Standard Coefficien

best for symmetric doesnt use all of the

Using the student age data find the

18, 19, 18, 25, 22, 20, 21, 45, 33,

(upper quartile lower quartile)

Essentially describes how much the middle 50% of your

example: if all patients in a Dental Surgery took more-

but if all appointments were either very quick or very

Good for ordinal Harder to calculate

Doesnt use all the

Using the student age data find the

The Standard Deviation: N

a) Sample Standard Deviation = S =