Documente Academic
Documente Profesional
Documente Cultură
9/12/2017 2
INTRODUCTION
TO
About me
9/12/2017 4
Not a statistician!
9/12/2017 5
Why you should be here
Interest in statistical reasoning
A desire to learn to use statistics properly
in experimental design and data analysis
To develop your ability to critically assess
scientific (or pseudo-scientific) arguments
9/12/2017 6
What is expected of you
analyses.
9/12/2017 8
Introduction
Session ONE
9/12/2017 9
Some Basic concepts
Statistics is a field of study concerned with
1- collection, organization, summarization
and analysis of data.
2- drawing of inferences about a body of
data when only a part of the data is
observed.
Statisticians try to interpret and
communicate the results to others.
Larson/Farber 4th ed
9/12/2017 10
Branches of Statistics
Descriptive Statistics: Involves organizing,
summarizing, and displaying data.
Describes the important characteristics of the
data.
e.g. Tables, charts, averages, percentages
Inferential Statistics: Involves using sample data
to draw conclusions or make inferences about an
entire population.
9/12/2017 11
Data
The raw material of Statistics is data.
We may define data as figures. Figures
result from the process of counting or
from taking a measurement.
For example:
- When a hospital administrator counts
the number of patients (counting).
- When a nurse weighs a patient
(measurement)
9/12/2017 12
types of data
Qualitative
Nominal
Ordinal (rating scale)
Quantitative
Discrete
Continuous
Interval
Ratio
9/12/2017 13
Qualitative Data
Qualitative Data: Consists of non-numeric,
categorical attributes or labels
9/12/2017 16
16
Nominal Data
A type of data that are simply names or
labels. Cannot be ranked or ordered by
value objectively.
Gender-male/female
Do we have nominal data as example?
9/12/2017 17
Ordinal Data
9/12/2017 18
Methods of Collecting Data
Observational study
Survey
Experiment
Simulation
9/12/2017 19
19
Data Collection
First problem : how to obtain the data.
It is important to obtain good, or
representative, data.
Inferences are made based on statistics
obtained from the data.
Inferences can only be as good as the data.
9/12/2017 20
Process of data collection:
Define the objectives of the survey or
experiment.
Example: Estimate the average life of an
electronic component.
Define the variable and population of interest.
Example: Length of time for anesthesia to wear
off after surgery.
Defining the data-collection and data- .
measuring schemes. (questionnaire, scale, ruler,
etc.).
Determine the appropriate descriptive or .
inferential
9/12/2017
data-analysis techniques. 21
TASK-1
Experiment: The
investigator controls or
modifies the
environment and
observes the effect on
the variable under
study.
9/12/2017 23
Types of variables
Quantitative Qualitative
Quantitative Variables Qualitative Variables
It can be measured Many characteristics are
in the usual sense. not capable of being
For example: measured. Some of them
- the heights of can be ordered or
adult males, ranked.
- the weights of For example:
preschool children, - classification of people into
- the ages of socio-economic groups,
patients seen in a - social classes based on
- dental clinic. income, education, etc.
9/12/2017 24
Types of quantitative variables
Discrete Continuous
A discrete variable A continuous variable
is characterized by can assume any value within a
gaps or interruptions specified relevant interval of
in the values that it values assumed by the variable.
can assume.
For example:
For example: - Height,
- The number of daily - weight,
admissions to a - skull circumference.
general hospital,
- The number of No matter how close together the
decayed, missing or observed heights of two people,
filled teeth per child we can find another person
whose height falls somewhere
- in an in between.
- elementary
9/12/2017 25
- school.
TASK-2
Which one of the following is an example of
continuous numerical data?
A =Number of runs made by a cricket player.
B =Speed of a car captured by a speed camera.
C =Your favourite secondary school year level.
D= Shoe sizes.
E Labor / Liberal preference of 100 people
surveyed.
9/12/2017 26
THAT WAS GREAT
9/12/2017 27
Session TWO
9/12/2017 28
Strategies
for understanding the
meanings
of
Data
Key words
9/12/2017 30
Descriptive Statistics
Frequency Distribution
for Discrete Random Variables
Example:
Suppose that we take a
No. of Frequency Relative
sample of size 16 from
children in a primary school decayed Frequency
and get the following data teeth
about the number of their
decayed teeth, 0 1 0.0625
3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1 1 2 0.125
2 4 0.25
To construct a frequency
table: 3 5 0.3125
4 2 0.125
1- Order the values from the
smallest to the largest. 5 2 0.125
0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5
2- Count how many Total 16 1
Representing the simple
frequency table using the bar
chart
We can represent 6
the above simple
frequency table 5
chart. 4
4
2
2 2 2
Frequency
1
1
0
.00 1.00 2.00 3.00 4.00 5.00
The Mid-interval:
computed by adding the lower bound of the
interval plus the upper bound of it and then divide
over 2.
9/12/2017 34
TASK-3
The following table represents the cumulative frequency, the
relative frequency, the cumulative relative frequency and the
R.f= freq/n
mid-interval.
Class Mid Frequen Cumulativ Relative Cumulati
interval interval cy e Frequency ve
Freq (f) Frequency R.f Relative
Frequenc
y
9/12/2017 38
Descriptive Statistics
Measures of Central
key words:
Descriptive Statistic, measure of
central tendency ,statistic, parameter,
mean () ,median, mode.
9/12/2017 40
The Statistic and The
Parameter
A Statistic:
It is a descriptive measure computed from the
data of a sample.
A Parameter:
It is a a descriptive measure computed from
the data of a population.
9/12/2017 41
Measures of Central
Tendency
A measure of central tendency is a measure which
indicates where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the Mode.
9/12/2017 42
Mean
9/12/2017 43
Mean Example
Here is an example test scores for PEDO class.
82 97 86 93 82
To find the Mean, first you must add up all of the
numbers.
82+93+86+97+82= 433
Now, since there are 5 test scores, we will next
divide the sum by 5.
4405= 88
9/12/2017
The Mean is 88 44
Median
9/12/2017 45
Median Example
First, lets examine these five test scores.
79 97 86 93 78
9/12/2017 47
Mode Example
9/12/2017 48
Range
9/12/2017 49
Range Example
9/12/2017 51
TASK-4
Now YOU try it!!!
This is the Stat Family!
Dad Mom
Alex Jack Katie
34
9/12/2017 33 5 5 1 52
Mean
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
34+33+5+5+1= 80
805= 16
1 5 5 33 34
9/12/2017 56
Mode
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
1 5 5 33 34
9/12/2017 58
Range
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie-
1
1 5 5 33 34
The range is 33
9/12/2017 60
9/12/2017 61
Session FIVE
9/12/2017 62
Descriptive Statistics
Measures of Dispersion
Important statistical terms
Population:
a set which includes all
measurements of interest
to the researcher
(The collection of all
responses, measurements,
or counts that are of interest)
Sample:
A subset of the population
9/12/2017 64
Learning Objectives
Determine when to use sampling instead of a
census.
Distinguish between random and nonrandom
sampling.
Decide when and how to use various sampling
techniques.
Be aware of the different types of error that can
occur in a study.
Understand the impact of the Central Limit
Theorem
9/12/2017
on statistical analysis . p 65
Reasons for Sampling
Sampling can save money.
Sampling can save time.
For given resources, sampling can broaden
the scope of the data set.
Because the research process is sometimes
destructive, the sample can save product.
If accessing the population is impossible;
sampling is the only option.
9/12/2017 66
Types of Sampling
Cluster sampling
Systematic
Convenience
Math
Alliance
9/12/2017 Project
67
Simple Random Sample
Every subset of a specified size n
from the population has an equal
chance of being selected
Math
Alliance
9/12/2017 68
Project
Stratified Random Sample
The population is divided into two or more
groups called strata, according to some
criterion, such as geographic location,
grade level, age, or income, and
subsamples are randomly selected from
each strata.
Math
Alliance
Project
9/12/2017 69
Cluster Sample
The population is divided into subgroups
(clusters) like families. A simple random
sample is taken of the subgroups and
then all members of the cluster selected
are surveyed.
Math
Alliance
9/12/2017 70Project
Systematic Sample
Every n th member ( for example: every
10th person) is selected from a list of all
population members.
Math
Alliance
9/12/2017 Project
71
Convenience Sample
Selection of whichever individuals are
easiest to reach
It is done at the convenience of the
researcher
Math
Alliance
9/12/2017 Project
72
Errors in Sampling
1-Interview error- interaction between
interviewer and person being surveyed
2-Respondent error: respondents have difficult
time answering the question
3-Measurement error: inaccurate responses
when person doesnt understand question or
poorly worded question
4-Errors in data collection
Math
Alliance
9/12/2017 73 Project
9/12/2017 74
9/12/2017 75
SAMPLE SIZE FORMULA
95% Confidence interval is given by point
Single mean:
sd= sd / (n)2
9/12/2017 76
TASK-5
A large elementary school has 15 a) No, because the teachers
classrooms, with 24 children in were not selected
each classroom. A sample of 30 randomly
children is chosen by the following
procedure:
b) No, because not all
possible groups of 30
children had the same
Each of the 15 teachers selects 2 chance of being chosen
children from his or her classroom
to be in the sample by numbering c) No, because not all
the children from 1 to 24, using a children had the same
random digit table to select two chance of being chosen
different random numbers between d) Yes, because each child
01 and 24. The 2 children with had the same chance of
those numbers are in the sample. being chosen
e) Yes, because the
Did this procedure give a simple numbers were assigned Math
random sample of 30 children from randomly to the children 77Alliance
9/12/2017 Project
the elementary school?
9/12/2017 78
Session
SIX
9/12/2017 79
key words:
Descriptive Statistic, measure of
dispersion , range ,variance, coefficient of
variation.
9/12/2017 80
Descriptive Statistics
Measures of Dispersion:
A measure of dispersion conveys information
regarding the amount of variability present in a set
of data.
Note:
1. If all the values are the same
There is no dispersion .
2. If all the values are different
There is a dispersion:
3.If the values close to each other
The amount of Dispersion small.
b) If the values are widely scattered
9/12/2017 81
Measures of Dispersion
Variation
Measures of variation
give information on the
spread or variability or
dispersion of the data
values. Same centre,
different
9/12/2017
variation 82
** Measures of Dispersion are :
1.Range (R).
2. Variance.
3. Standard deviation.
4.Coefficient of variation (C.V).
9/12/2017 83
Range
The range is the difference between the
largest and the smallest values in the dataset
i.e. the maximum difference between data-
points in the list.
It is sensitive to only the most extreme values
in the list. The range of a list is 0 if and only
if all the data-points in the list are equal.
xL xS
4 16 Days
9/12/2017 Range 84
Pros & Cons
Advantages Disadvantages
9/12/2017 86
Inter-quartile Range
18, 19, 18, 25, 22, 20, 21, 45, 33, 20,
18, 18
9/12/2017 89
The Variance:
It measure dispersion relative to the scatter of
the values a bout there mean.
a) Sample Variance (S 2 ) :
n
(x
2 ,where
i x)
x
is sample
S
2 i 1
n 1 mean
2
b)Population Variance ( S ) :
where x, is Population mean
9/12/2017 90
Variance
Advantages:
uses all of the data values
Disadvantages:
the variance is measured in the original
units squared
extreme values or outliers effect the
variance considerably
hard to calculate manually
9/12/2017 91
N
i
( x ) 2
2 i 1
9/12/2017 92
Standard Deviation
Advantages:
same units of measurement as the values
useful in theoretical work and statistical
methods and inference
Disadvantages:
hard to calculate manually
9/12/2017 93
TASK-8
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18,
18
9/12/2017 94
.The Coefficient of Variation
(C.V):
Is a measure use to compare the
dispersion in two sets of data which is
independent of the unit of the
measurement .
S
C.V X (100) where S: Sample standard
deviation.
X : Sample mean.
9/12/2017 95
TASK-9
Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
9/12/2017 96
Solution:
c.v (Sample1)= (10/145)*100= 6.9
9/12/2017 97
9/12/2017 98
Session SEVEN
9/12/2017 99
Measures of Disease
Frequency
FRACTIONS USED IN DESCRIBING
DISEASE FREQUENCY
RATIO
A fraction in which the numerator is not part
of the denominator.
e.g. Fetal death ratio:
Fetal deaths/live births.
Fetal deaths are not included among live births,
by definition.
9/12/2017 101
PROPORTION
9/12/2017 103
PREVALENCE RATE
9/12/2017 104
POINT PREVALENCE RATE
Proportion of individuals in a
specified population at risk who
have the disease of interest at a
given point in time.
9/12/2017 105
PERIOD PREVALENCE RATE
Proportion of individuals in a
specified population at risk who
have the disease of interest over a
specified period of time.
For example:
annual prevalence rate
lifetime prevalence rate.
(When the type of prevalence rate is not
specified it is usually point prevalence, or its
9/12/2017 106
INCIDENCE RATE
Like prevalence, divided into two
types:
1. Cumulative incidence rate
2. Incidence density
9/12/2017 107
Cumulative incidence rate:
9/12/2017 108
Incidence density:
9/12/2017 109
Session
EIGHT
9/12/2017 110
Probability
The Basis of the
Statistical inference
Key words:
Objective Probability
Classical and Relative
9/12/2017 114
Classical Probability : If an event
can occur in N mutually exclusive
and equally likely ways, and if m of
these possess a triat, E, the
probability of the occurrence of event
E is equal to m/ N .
For Example: in the rolling of the
die , each of the six sides is equally
likely to be observed . So, the
probability that a 4 will be observed
9/12/2017
is equal to 1/6. 115
Relative Frequency Probability:
Def: If some posses is repeated a
large number of times, n, and if some
resulting event E occurs m times , the
relative frequency of occurrence of E
, m/n will be approximately equal to
probability of E . P(E) = m/n .
9/12/2017 116
Subjective Probability :
9/12/2017 117
Rules of Probability
1-Addition Rule
P(A U B)= P(A) + P(B) P (AB )
9/12/2017 119
Definition.1
The sensitivity of the symptom
This is the probability of a positive result
given that the subject has the disease. It is
denoted by P(T|D)
Definition.2
The specificity of the symptom
9/12/2017 122
What is the false negative?
A false negative is when a test indicates a
negative result ( ) when the person has the
disease (D).
9/12/2017 124
Cohort Studies
9/12/2017 125
Cohort Studies
Group by common characteristics
Start with a group of subjects who
lack a positive history of the
outcome of interest yet are at risk for
it (cohort).
Think of going from cause to
effect.
When is a cohort
study warranted?
Prospective (concurrent)
Retrospective (historical)
Restricted (restricted exposures)
Prospective Studies
Also called
longitudinal
concurrent
incidence studies
Looking into the future
Example:
Framingham Study of coronary heart
disease (CHD)
Design of a Cohort Experiment
Advantages of
Prospective Cohort Studies
Captive groups
Large sample sizes
Certain diseases or risk factors targeted
Can be used to prove cause-effect
Assess magnitude of risk
Baseline of rates
Number and proportion of cases that can be
prevented
Disadvantages of Prospective
Cohort Studies
Large study populations required
not easy to find subjects
Expensive
Unpredictable variables
Results not extrapolated to general
population
Study results are limited
Time consuming/results are delayed
Requires rigid design and conditions
Incidence rate
Relative Risk
RR = incidence of disease among
exposed
______________________________
Incidence of disease among non-
exposed
a/ a+b
_________ =
c/c+d
AR = Estimation of Risk
Attributable Risk
Incidence of disease among exposed
incidence of disease among non
exposed
_______________________________
Incidence of disease among exposed
a/a+b c/c+d
AR = _______________
a/a+b
Smoking Lung cancer Total
YES NO
NO 3 2997 3000
73 9927 10000
TASK-11
Incidence of lung cancer among smokers
70/7000 = 10 per 1000
Incidence of lung cancer among non-smokers
3/3000 = 1 per thousand
RR = 10 / 1 = 10
(lung cancer is 10 times more common among
smokers than non smokers)
AR = 10 1 / 10 X 100
= 90 %
(90% of the cases of lung cancer among
smokers are attributed to their habit of smoking)
Session TEN
9/12/2017 138
Epidemiologic study
designs
9/12/2017 139
Learning Objectives
Advantages
Good design for hypothesis
generation
Can estimate overall and specific
disease prevalence and sometimes
rates
Can estimate exposure proportions in
the population
Can study multiple exposures or
multiple outcomes or diseases
Cross Sectional Studies
Disadvantages
Impractical for rare diseases
Not a useful type of study for
establishing causal relationships
Confounding is difficult to control
No control over sample size for
each exposure by disease subclass
Case-control studies
Case-Control Studies identify existing
disease/s and look back in
previous years to identify previous
exposures to causal factors.
Cases are those who have a disease.
Controls are those without a
disease.
Analyses examine if exposure levels
are different between the groups.
Case-control studies
Advantages
Best design for rare diseases
Can be accomplished quickly
since events of interest have already
occurred
Can study several potential
exposures at the same time
Lends itself well to hospital-
based studies and outbreaks
Case-control studies
Disadvantages
Problems with temporal sequence of
data
Hard to decide when disease was
actually acquired
Disease may cure the exposure
Miss diseases still in latent period
Cant calculate incidence, population
relative risk or attributable risk
HIGH potential for bias
Experimental Studies
Randomized
Double-blind
Placebo-controlled
Clinical Trial
Treatment
R Outcomes
Group
a
n
Study d
o
Population
m
i
z Control Group Outcomes
e
Session ELEVEN
9/12/2017 152
Basic Research
Methods
Why do research?
Validate intuition
Improve methods
For publication
Choose a subject
Based on an idea
Originality
Choose a study design
Case report
Case series
Case controlled study
Cross sectional
Cohort
Retrospective comparison
Prospective Comparison
Session TWELVE
9/12/2017 157
Tests of Significance
9/12/2017 158
In hypothesis testing, the null
hypothesis is either accepted or
rejected, depending on whether the p
value is above or below a
predetermined cut-off point,known as
the Significance level of the test,
usually it is taken as 5% level.
9/12/2017 159
t tests
One sample
Two independent sample
ANOVA
Difference among the group
Turkeys test
Scheffee tests
mann-whitney tests
160
9/12/2017
Two paired sample
ODDS RATIO
ODDS are defined to be the ratio of
probability of success to the probability
of failure.
The estimate of population odds ratio is
a / b ad
OR
cld bc
9/12/2017 161
THANK YOU
11-3-11 162
CONCLUSION
Louis
11-3-11 163
Bibliography
11-3-11 164
QUESTIONS?
11-3-11 165