Lecture 7.descriptive and Inferential Statistics

Descriptive Analysis and
Inferential Analysis
By
Amir Iqbal
Mean
o The mean for quantitative data is obtained by dividing
the sum of all values by the number of values in the data
set.
x
x
o The following are the ages of all eight employees of a

small company:
53
32 61 27 39 44 49 57
Find the mean age of these employees.
x
x
n
362
45.25 years
8
Cont
o Thus, the mean age of all eight
employees of this company is 45.25
years, or 45 years and 3 months.
Median
o The median is the value of the middle term in a data set that
has been ranked in increasing order.
The calculation of the median consists of the following two
steps:
1. Sort/Arrange the data set in increasing order
2. Find the middle term in a data set with n values. The value of this term is the
median.
o The following data give the weight lost (in pounds) by a sample
of five members of a health club at the end of two months of
membership:
10 5 19 8 3
Find the median.
Cont
o First, we rank the given data in increasing order as follows:
3 5 8 10 19
o Therefore, the median is 3 5 8 10 19
The median weight loss for this sample of five members of this
health club is 8 pounds.
o The median gives the center with half the data values to the left of
the median and half to the right of the median.
o The advantage of using the median as a measure of central tendency
is that it is less influenced by outliers & skewness.
o Consequently, the median is preferred over the mean as a measure of
central tendency for data sets that contain outliers and/or skewness.
Mode
The mode is the value that occurs with the highest
frequency in a data set.
Range
o Range = Largest value Smallest Value

o The range, like the mean has the
disadvantage of being influenced by
outliers.
o Its calculation is based on two values
only: the largest and the smallest.
Standard Deviation
The standard deviation is the most used

measure of dispersion.
The value of the standard deviation tells how
closely the values of a data set are clustered
around the mean.
Cont
x
deviation
82
95
67
92
82 84 = -2
95 84 = +11
67 84 = -17
92 84 = +8
(deviation) = 0
Calculation
standard deviation stdev = sqrt (sum squared deviations
divided by n-1)
Example :: sqrt[(4 + 121 + 289 + 64)/3]
sqrt(478/3) = sqrt(159.3) = 12.62
Population parameter and Sample Statistics

A numerical measure such as the mean, median,
mode, range, variance, or standard deviation
calculated for a population data set is called a
population parameter, or simply a parameter.
A summary measure calculated for a sample data set
is called a sample statistic, or simply a statistic.
Inferential Statistics
Inferential statistics
Allow researchers to generalize to a population of
individuals based on information obtained from a
sample of those individuals
Assess whether the results obtained from a sample
are the same as those that would have been
calculated for the entire population
Sampling Distributions
A distribution of sample statistics
A distribution of mean scores
A distribution of the differences between two mean scores
A distribution of the ratio of two variances
Known statistical properties of sampling distributions

The mean of the sampling distribution of means is an excellent estimate of
the population mean
The standard error of the mean is an excellent estimate of the standard
deviation of the sampling distribution of the mean
Null and Alternative Hypotheses

The null hypothesis represents a statistical
tool important to inferential tests of
significance
The alternative hypothesis usually
represents the research hypothesis related
to the study

Comparisons between groups
Null: no difference between the mean scores of the groups
Alternative: differences between the mean scores of the
groups
Relationships between variables

Null: no relationship exists between the variables being
studied
Alternative: a relationship exists between the variables
being studied

Acceptance of
the null
hypothesis
The difference
between groups is too
small to attribute it to
anything but chance
The relationship
between variables is
too small to attribute
it to anything but
chance
Rejection of the
null hypothesis
The difference between
groups is so large it can be
attributed to something
other than chance (e.g.,
experimental treatment)
The relationship between
variables is so large it can
be attributed to something
other than chance (e.g., a
real relationship)
Tests of Significance
Statistical analyses to help decide whether to
accept or reject the null hypothesis
Alpha level
An established probability level which serves as
the criterion to determine whether to accept or
reject the null hypothesis
Common levels in education
.01
.05
.10
Type I and Type II Errors

Correct decisions
The null hypothesis is true and it is accepted
The null hypothesis is false and it is rejected
Incorrect decisions
Type I error - the null hypothesis is true and it is
rejected
Type II error - the null hypothesis is false and it is
accepted
One-Tailed and Two-Tailed Tests

One-tailed an anticipated outcome in a
specific direction
Treatment group is significantly higher than the control group
Treatment group is significantly lower than the control group
Two-tailed anticipated outcome not directional

Treatment and control groups are equal
Ample justification needed for using one-tailed

tests
Two types
Parametric
Nonparametric
Four assumptions of parametric tests
Normal distribution of the dependent variable

Interval or ratio data
Independence of subjects
Homogeneity of variance
Advantages of parametric tests

More statistically powerful
More adaptable
Assumptions of nonparametric tests
No assumptions about the shape of the
distribution of the dependent variable
Ordinal or categorical data
Disadvantages of nonparametric tests

Less statistically powerful
Require large samples
Cannot answer some research questions
Steps in Statistical Testing

State the null and alternative hypotheses
Set alpha level
Identify the appropriate test of significance
Identify the sampling distribution
Identify the test statistic
Compute the test statistic

Identify the criteria for significance
If computing by hand, identify the critical value of the test statistic
If using SPSS-Windows, identify the probability level of the observed
test statistic
Compare the computed test statistic to the

criteria for significance
If computing by hand, compare the observed test statistic to the
critical value
If using SPSS-Windows, compare the probability level of the
observed test statistic to the alpha level

Accept or reject the null hypothesis
Accept
The observed test statistic is smaller than the
critical value
The observed probability level of the observed
statistic is smaller than alpha
Reject
The observed test statistic is larger than the critical
value
The observed probability level of the observed
statistic is smaller than alpha
Parametric and Non-Parametric

A parametric statistical test is one that makes assumptions about
the parameters (defining properties) of the population
distribution(s) from which one's data are drawn.
A non-parametric test is one that makes no such assumptions. In
this strict sense, "non-parametric" is essentially a null category,
since virtually all statistical tests assume one thing or another
about the properties of the source population(s).
30
Parametric
Non-parametric
Assumed distribution
Normal
Any
Assumed variance
Homogeneous
Any
Typical data
Ratio or Interval
Ordinal or Nominal
Data set relationships
Independent
Any
Usual central measure
Mean
Median
Benefits
Can draw more conclusions
Simplicity; Less affected by outliers
31
Choosing
Choosing parametric test
Choosing a non-parametric test
Correlation test
Pearson
Spearman
Independent measures, 2 groups
Independent-measures t-test
Mann-Whitney test
Independent measures, >2 groups
One-way, independent-measures
ANOVA
Kruskal-Wallis test
Repeated measures, 2 conditions
Matched-pair t-test
Wilcoxon test
Repeated measures, >2 conditions
One-way, repeated measures

ANOVA
Friedman's test
32
PARAMETRIC TESTS
STATISTICAL SIGNIFICANCE
As most analysis is carried out on data from only a sample
of the population,
How likely is it that the results indicate the situation for
the whole population.
Are the results simply occasioned by chance or are they truly
representative,
i.e. are they statistically significant?
To estimate the likelihood that the results are relevant

to the population as a whole one has to use statistical
inference
33
ANALYSIS OF VARIANCE
Another common requirement is to look for differences
between values obtained under two or more different
conditions,
e.g. a group before and after a training course, or three
groups after different training courses.
There are a range of tests that can be applied to discern the
variance depending on the number of groups.
34
Z Test
Uses Z distribution
Based on assumption of normal distribution
Sample size greater then 30
SD of the population known
Used for parameters
F Test
Based on assumption of normal distribution
Uses F distribution
Used to compare the variances of two independent samples
Used in ANOVA to determine the model strength and significance
T Test
Also known as students t-test
Uses T distribution
Applicable even if Sample size less them 30
SD of sample is unknown
Used for hypothesis testing
35
Compare Means
1. Mean :
What it does: The Means procedure calculates
subgroup means and related uni-variate statistics
for dependent variables within categories of one
or more independent variables.
Optionally, you can obtain a one-way analysis of
variance
36
2. One Sample T- test

What it does: The One-Sample T Test compares the mean score
of a sample to a known value. Usually, the known value is a
population mean.
Assumption: The dependent variable is normally distributed. You
can check for normal distribution through explore command
Hypotheses: Null: There is no significant difference between the
sample mean and the population mean.
Decisions: if P Value is less than critical value, reject Ho,
Otherwise do, not reject.
37
3. Independent Sample T-Test

What it does: The Independent Samples T Test compares the mean
scores of two groups on a given variable.
Assumptions:
The dependent variable is normally distributed.
The two groups have approximately equal variance on the dependent
variable. You can check this by looking at the Levene's Test.
The two groups are independent of one another.
Hypotheses: Null: The means of the two groups are not significantly
different.
Decision: If P Value less than 0.05 reject Ho, Otherwise do not reject.
38
4. Paired Sample T test:

What it does: The Paired Samples T -Test compares the means of
two variables. It computes the difference between the two variables
for each case, and tests to see if the average difference is
significantly different from zero
For Example: We compared the mean test scores before (pre-test) and after
(post-test) the subjects completed a test preparation course. We want to see
if our test preparation course improved people's score on the test.
Assumption: Both variables should be normally distributed. You can

check for normal distribution with a Q-Q plot.
Hypothesis: Null: There is no significant difference between the
means of the two variables.
Decision: If the significance value is less than .05, there is a
significant difference. If the significance value is greater than. 05,
there is no significant difference.
39
5. One Way ANOVA

What it does: The One-Way ANOVA compares the mean of one
or more groups based on one independent variable (or factor).
Assumptions:
The dependent variable(s) is normally distributed. You can
check for normal distribution with a Q-Q plot.
The two groups have approximately equal variance on the
dependent variable. You can check this by looking at the
Levene's Test.
Hypotheses: Null: There are no significant differences between
the groups' mean scores.
Decision: If the significance value is less than .05, there is a
significant difference. If the significance value is greater than. 05,
there is no significant difference.
40
Non-Parametric Tests
Nonparametric tests may be, and often are, more
powerful in detecting population differences
when certain assumptions are not satisfied.
Non-parametric statistical tests are used when:
The sample size is very small;
Few assumptions can be made about the data; x data
are rank ordered or nominal;
Samples are taken from several different populations.
41
Mann-Whitney test:
Used to compare differences between two independent
groups when the dependent variable is either ordinal or
interval/ratio, but not normally distributed.
The attitudes towards pay discrimination, where attitudes
are measured on an ordinal scale, differ based on gender
Your dependent variable would be "attitudes towards pay
discrimination" and your independent variable would be
"gender", which has two groups: "male" and "female".
42
Or salaries, measured using an interval scale, differed based on

education level
your dependent variable would be "salary" and your
independent variable would be "educational level", which has
two groups: "high school" and "university".
The Mann-Whitney U test is the nonparametric alternative to
the independent t-test.
43
THANKS

Lecture 7.descriptive and Inferential Statistics

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture 7.descriptive and Inferential Statistics

Încărcat de

Drepturi de autor:

Formate disponibile

Descriptive Analysis and

o The following are the ages of all eight employees of a

o Range = Largest value Smallest Value

The standard deviation is the most used

Population parameter and Sample Statistics

Known statistical properties of sampling distributions

Null and Alternative Hypotheses

Null and Alternative Hypotheses

Relationships between variables

Null and Alternative Hypotheses

Type I and Type II Errors

One-Tailed and Two-Tailed Tests

Two-tailed anticipated outcome not directional

Ample justification needed for using one-tailed

Normal distribution of the dependent variable

Advantages of parametric tests

Disadvantages of nonparametric tests

Steps in Statistical Testing

Steps in Statistical Testing

Compare the computed test statistic to the

Steps in Statistical Testing

Parametric and Non-Parametric

Data set relationships

Usual central measure

Can draw more conclusions

Simplicity; Less affected by outliers

Choosing parametric test

Choosing a non-parametric test

Independent measures, 2 groups

Independent measures, >2 groups

Repeated measures, 2 conditions

Repeated measures, >2 conditions

One-way, repeated measures

To estimate the likelihood that the results are relevant

2. One Sample T- test

3. Independent Sample T-Test

4. Paired Sample T test:

Assumption: Both variables should be normally distributed. You can

5. One Way ANOVA

Or salaries, measured using an interval scale, differed based on

S-ar putea să vă placă și