Sunteți pe pagina 1din 44

Descriptive Analysis and

Inferential Analysis
By

Amir Iqbal

Mean
o The mean for quantitative data is obtained by dividing
the sum of all values by the number of values in the data
set.
x
x

o The following are the ages of all eight employees of a


small company:
53
32 61 27 39 44 49 57
Find the mean age of these employees.
x

x
n

362
45.25 years
8

Cont
o Thus, the mean age of all eight
employees of this company is 45.25
years, or 45 years and 3 months.

Median
o The median is the value of the middle term in a data set that
has been ranked in increasing order.
The calculation of the median consists of the following two
steps:
1. Sort/Arrange the data set in increasing order
2. Find the middle term in a data set with n values. The value of this term is the
median.

o The following data give the weight lost (in pounds) by a sample
of five members of a health club at the end of two months of
membership:
10 5 19 8 3
Find the median.

Cont
o First, we rank the given data in increasing order as follows:
3 5 8 10 19
o Therefore, the median is 3 5 8 10 19
The median weight loss for this sample of five members of this
health club is 8 pounds.
o The median gives the center with half the data values to the left of
the median and half to the right of the median.
o The advantage of using the median as a measure of central tendency
is that it is less influenced by outliers & skewness.
o Consequently, the median is preferred over the mean as a measure of
central tendency for data sets that contain outliers and/or skewness.

Mode
The mode is the value that occurs with the highest
frequency in a data set.

Range

o Range = Largest value Smallest Value


o The range, like the mean has the
disadvantage of being influenced by
outliers.
o Its calculation is based on two values
only: the largest and the smallest.

Standard Deviation

The standard deviation is the most used


measure of dispersion.
The value of the standard deviation tells how
closely the values of a data set are clustered
around the mean.

Cont
x

deviation

82
95
67
92

82 84 = -2
95 84 = +11
67 84 = -17
92 84 = +8
(deviation) = 0

Calculation
standard deviation stdev = sqrt (sum squared deviations
divided by n-1)
Example :: sqrt[(4 + 121 + 289 + 64)/3]
sqrt(478/3) = sqrt(159.3) = 12.62

Population parameter and Sample Statistics


A numerical measure such as the mean, median,
mode, range, variance, or standard deviation
calculated for a population data set is called a
population parameter, or simply a parameter.
A summary measure calculated for a sample data set
is called a sample statistic, or simply a statistic.

Inferential Statistics

Inferential statistics
Allow researchers to generalize to a population of
individuals based on information obtained from a
sample of those individuals
Assess whether the results obtained from a sample
are the same as those that would have been
calculated for the entire population

Sampling Distributions
A distribution of sample statistics
A distribution of mean scores
A distribution of the differences between two mean scores
A distribution of the ratio of two variances

Known statistical properties of sampling distributions


The mean of the sampling distribution of means is an excellent estimate of
the population mean
The standard error of the mean is an excellent estimate of the standard
deviation of the sampling distribution of the mean

Null and Alternative Hypotheses


The null hypothesis represents a statistical
tool important to inferential tests of
significance
The alternative hypothesis usually
represents the research hypothesis related
to the study

Null and Alternative Hypotheses


Comparisons between groups
Null: no difference between the mean scores of the groups
Alternative: differences between the mean scores of the
groups

Relationships between variables


Null: no relationship exists between the variables being
studied
Alternative: a relationship exists between the variables
being studied

Null and Alternative Hypotheses


Acceptance of
the null
hypothesis
The difference
between groups is too
small to attribute it to
anything but chance
The relationship
between variables is
too small to attribute
it to anything but
chance

Rejection of the
null hypothesis
The difference between
groups is so large it can be
attributed to something
other than chance (e.g.,
experimental treatment)
The relationship between
variables is so large it can
be attributed to something
other than chance (e.g., a
real relationship)

Tests of Significance
Statistical analyses to help decide whether to
accept or reject the null hypothesis
Alpha level
An established probability level which serves as
the criterion to determine whether to accept or
reject the null hypothesis
Common levels in education
.01
.05
.10

Type I and Type II Errors


Correct decisions
The null hypothesis is true and it is accepted
The null hypothesis is false and it is rejected

Incorrect decisions
Type I error - the null hypothesis is true and it is
rejected
Type II error - the null hypothesis is false and it is
accepted

One-Tailed and Two-Tailed Tests


One-tailed an anticipated outcome in a
specific direction
Treatment group is significantly higher than the control group
Treatment group is significantly lower than the control group

Two-tailed anticipated outcome not directional


Treatment and control groups are equal

Ample justification needed for using one-tailed


tests

Tests of Significance
Two types
Parametric
Nonparametric

Tests of Significance
Four assumptions of parametric tests

Normal distribution of the dependent variable


Interval or ratio data
Independence of subjects
Homogeneity of variance

Advantages of parametric tests


More statistically powerful
More adaptable

Tests of Significance
Assumptions of nonparametric tests
No assumptions about the shape of the
distribution of the dependent variable
Ordinal or categorical data

Disadvantages of nonparametric tests


Less statistically powerful
Require large samples
Cannot answer some research questions

Steps in Statistical Testing


State the null and alternative hypotheses
Set alpha level
Identify the appropriate test of significance
Identify the sampling distribution
Identify the test statistic
Compute the test statistic

Steps in Statistical Testing


Identify the criteria for significance
If computing by hand, identify the critical value of the test statistic
If using SPSS-Windows, identify the probability level of the observed
test statistic

Compare the computed test statistic to the


criteria for significance
If computing by hand, compare the observed test statistic to the
critical value
If using SPSS-Windows, compare the probability level of the
observed test statistic to the alpha level

Steps in Statistical Testing


Accept or reject the null hypothesis
Accept
The observed test statistic is smaller than the
critical value
The observed probability level of the observed
statistic is smaller than alpha
Reject
The observed test statistic is larger than the critical
value
The observed probability level of the observed
statistic is smaller than alpha

Parametric and Non-Parametric


A parametric statistical test is one that makes assumptions about
the parameters (defining properties) of the population
distribution(s) from which one's data are drawn.
A non-parametric test is one that makes no such assumptions. In
this strict sense, "non-parametric" is essentially a null category,
since virtually all statistical tests assume one thing or another
about the properties of the source population(s).

30

Parametric

Non-parametric

Assumed distribution

Normal

Any

Assumed variance

Homogeneous

Any

Typical data

Ratio or Interval

Ordinal or Nominal

Data set relationships

Independent

Any

Usual central measure

Mean

Median

Benefits

Can draw more conclusions

Simplicity; Less affected by outliers

31

Choosing

Choosing parametric test

Choosing a non-parametric test

Correlation test

Pearson

Spearman

Independent measures, 2 groups

Independent-measures t-test

Mann-Whitney test

Independent measures, >2 groups

One-way, independent-measures
ANOVA

Kruskal-Wallis test

Repeated measures, 2 conditions

Matched-pair t-test

Wilcoxon test

Repeated measures, >2 conditions

One-way, repeated measures


ANOVA

Friedman's test

32

PARAMETRIC TESTS
STATISTICAL SIGNIFICANCE
As most analysis is carried out on data from only a sample
of the population,
How likely is it that the results indicate the situation for
the whole population.
Are the results simply occasioned by chance or are they truly
representative,
i.e. are they statistically significant?

To estimate the likelihood that the results are relevant


to the population as a whole one has to use statistical
inference
33

ANALYSIS OF VARIANCE
Another common requirement is to look for differences
between values obtained under two or more different
conditions,
e.g. a group before and after a training course, or three
groups after different training courses.
There are a range of tests that can be applied to discern the
variance depending on the number of groups.

34

Z Test
Uses Z distribution
Based on assumption of normal distribution
Sample size greater then 30
SD of the population known
Used for parameters

F Test
Based on assumption of normal distribution
Uses F distribution
Used to compare the variances of two independent samples
Used in ANOVA to determine the model strength and significance

T Test
Also known as students t-test
Uses T distribution
Applicable even if Sample size less them 30
SD of sample is unknown
Used for hypothesis testing
35

Compare Means
1. Mean :
What it does: The Means procedure calculates
subgroup means and related uni-variate statistics
for dependent variables within categories of one
or more independent variables.
Optionally, you can obtain a one-way analysis of
variance
36

2. One Sample T- test


What it does: The One-Sample T Test compares the mean score
of a sample to a known value. Usually, the known value is a
population mean.
Assumption: The dependent variable is normally distributed. You
can check for normal distribution through explore command
Hypotheses: Null: There is no significant difference between the
sample mean and the population mean.
Decisions: if P Value is less than critical value, reject Ho,
Otherwise do, not reject.
37

3. Independent Sample T-Test


What it does: The Independent Samples T Test compares the mean
scores of two groups on a given variable.
Assumptions:
The dependent variable is normally distributed.
The two groups have approximately equal variance on the dependent
variable. You can check this by looking at the Levene's Test.
The two groups are independent of one another.
Hypotheses: Null: The means of the two groups are not significantly
different.
Decision: If P Value less than 0.05 reject Ho, Otherwise do not reject.
38

4. Paired Sample T test:


What it does: The Paired Samples T -Test compares the means of
two variables. It computes the difference between the two variables
for each case, and tests to see if the average difference is
significantly different from zero
For Example: We compared the mean test scores before (pre-test) and after
(post-test) the subjects completed a test preparation course. We want to see
if our test preparation course improved people's score on the test.

Assumption: Both variables should be normally distributed. You can


check for normal distribution with a Q-Q plot.
Hypothesis: Null: There is no significant difference between the
means of the two variables.
Decision: If the significance value is less than .05, there is a
significant difference. If the significance value is greater than. 05,
there is no significant difference.
39

5. One Way ANOVA


What it does: The One-Way ANOVA compares the mean of one
or more groups based on one independent variable (or factor).
Assumptions:
The dependent variable(s) is normally distributed. You can
check for normal distribution with a Q-Q plot.
The two groups have approximately equal variance on the
dependent variable. You can check this by looking at the
Levene's Test.
Hypotheses: Null: There are no significant differences between
the groups' mean scores.
Decision: If the significance value is less than .05, there is a
significant difference. If the significance value is greater than. 05,
there is no significant difference.
40

Non-Parametric Tests
Nonparametric tests may be, and often are, more
powerful in detecting population differences
when certain assumptions are not satisfied.
Non-parametric statistical tests are used when:
The sample size is very small;
Few assumptions can be made about the data; x data
are rank ordered or nominal;
Samples are taken from several different populations.

41

Mann-Whitney test:
Used to compare differences between two independent
groups when the dependent variable is either ordinal or
interval/ratio, but not normally distributed.
The attitudes towards pay discrimination, where attitudes
are measured on an ordinal scale, differ based on gender
Your dependent variable would be "attitudes towards pay
discrimination" and your independent variable would be
"gender", which has two groups: "male" and "female".

42

Or salaries, measured using an interval scale, differed based on


education level
your dependent variable would be "salary" and your
independent variable would be "educational level", which has
two groups: "high school" and "university".
The Mann-Whitney U test is the nonparametric alternative to
the independent t-test.

43

THANKS

S-ar putea să vă placă și