Documente Academic
Documente Profesional
Documente Cultură
Interpretation of Correlation
Correlation refers to a technique used to measure the relationship between two or
more variables.When two things are correlated, it means that they vary together.Positive
correlation means that high scores on one are associated with high scores on the other, and
that low scores on one are associated with low scores on the other. Negative correlation, on
the other hand, means that high scores on the first thing are associated with low scores on the
second. Negative correlation also means that low scores on the first are associated with high
scores on the second. An example is the correlation between body weight and the time spent
on a weight-loss program. If the program is effective, the higher the amount of time spent on
the program, the lower the body weight. Also, the lower the amount of time spent on the
program,
the
higher
the
body
weight.
Pearson r is a statistic that is commonly used to calculate bivariate correlations.
For an Example Pearson r = -0.80, p < .01. What does this mean?
To
interpret
correlations,
four
pieces
of
information
are
necessary.
1. The numerical value of the correlation coefficient.Correlation coefficients can vary
numerically between 0.0 and 1.0. The closer the correlation is to 1.0, the stronger the
relationship between the two variables. A correlation of 0.0 indicates the absence of a
relationship. If the correlation coefficient is 0.80, which indicates the presence of a strong
relationship.
2. The sign of the correlation coefficient.A positive correlation coefficient means that as
variable 1 increases, variable 2 increases, and conversely, as variable 1 decreases, variable 2
decreases. In other words, the variables move in the same direction when there is a positive
correlation. A negative correlation means that as variable 1 increases, variable 2 decreases
and vice versa. In other words, the variables move in opposite directions when there is a
negative correlation. The negative sign indicates that as class size increases, mean reading
scores decrease.
3. The statistical significance of the correlation.A statistically significant correlation is
indicated by a probability value of less than 0.05. This means that the probability of obtaining
such a correlation coefficient by chance is less than five times out of 100, so the result
indicates the presence of a relationship. For -0.80 there is a statistically significant negative
relationship between class size and reading score (p < .001), such that the probability of this
correlation occurring by chance is less than one time out of 1000.
4. The effect size of the correlation.For correlations, the effect size is called the coefficient
of determination and is defined as r2. The coefficient of determination can vary from 0 to 1.00
and indicates that the proportion of variation in the scores can be predicted from the
relationship between the two variables. For r = -0.80 the coefficient of determination is 0.65,
which means that 65% of the variation in mean reading scores among the different classes
can be predicted from the relationship between class size and reading scores. (Conversely,
35% of the variation in mean reading scores cannot be explained.)
A correlation can only indicate the presence or absence of a relationship, not the nature of the
relationship. Correlation is not causation. There is always the possibility that a third variable
influenced the results. For example, perhaps the students in the small classes were higher in
verbal ability than the students in the large classes or were from higher income families or
had higher quality teachers.
F test assumption and uses
Stats: F-Test
The F-distribution is formed by the ratio of two independent chi-square variables divided by
their respective degrees of freedom.
Since F is formed by chi-square, many of the chi-square properties
carry over to the F distribution.
There are two independent degrees of freedom, one for the numerator, and one for the
denominator.
There are many different F distributions, one for each pair of degrees of freedom.
F-Test
The F-test is designed to test if two population variances are equal. It does this by comparing
the ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.
All hypothesis testing is done under the assumption the null hypothesis is true
If the null hypothesis is true, then the F test-statistic given above can be simplified
(dramatically). This ratio of sample variances will be test statistic used. If the null hypothesis
is false, then we will reject the null hypothesis that the ratio was equal to 1 and our
assumption that they were equal.
There are several different F-tables. Each one has a different level of
significance. So, find the correct level of significance first, and then look up the
numerator degrees of freedom and the denominator degrees of freedom to find
the critical value.
You will notice that all of the tables only give level of significance for right tail tests. Because
the F distribution is not symmetric, and there are no negative values, you may not simply take
the opposite of the right critical value to find the left critical value. The way to find a left
critical value is to reverse the degrees of freedom, look up the right critical value, and then
take the reciprocal of this value. For example, the critical value with 0.05 on the left with 12
numerator and 15 denominator degrees of freedom is found of taking the reciprocal of the
critical value with 0.05 on the right with 15 numerator and 12 denominator degrees of
freedom.
Assumptions / Notes
Divide alpha by 2 for a two tail test and then find the right critical value
When the degrees of freedom aren't given in the table, go with the value with the
larger critical value (this happens to be the smaller degrees of freedom). This is so that
you are less likely to reject in error (type I error)
The populations from which the samples were obtained must be normal.
F-Test
Explorable.com 39.7K reads 0 Comments
Share this page on your website:
<a href="https://explorable.com/f-test">F-Test</a>
Any statistical test that uses F-distribution can be called a F-test. It is used when the sample
size is small i.e. n < 30.
form-mYC_ftRtsW guide_courses_o
1ANOVA
2Correlation
3Two-Way ANOVA
4Multiple Regression
5One-Way ANOVA
2Relationships
3Correlation
4Regression
5Students T-Test
6ANOVA
7Nonparametric Statistics
guide_course_sta
8.3 F-Test
For example suppose one is interested to test if there is any significant difference between the
mean height of male and female students in a particular college. In such a situation, t-test for
difference of means can be applied.
However one assumption of t-test is that the variance of the two populations is equal- here
two populations are the population of heights of male and female students. Unless this
assumption is true, the t-test for difference of means cannot be carried out.
The F-test can be used to test the hypothesis that the population variances are equal.
For example suppose that the efficacy of a drug is sought to be tested at three levels
say 100mg, 250mg and 500mg. A test is conducted among fifteen human subjects
taken at random- with five subjects being administered each level of the drug.
To test if there are significant differences among the three levels of the drug in terms
of efficacy, the ANOVA technique has to be applied. The test used for this purpose is
the F-test.
3. F-test for testing significance of regression is used to test the significance of the
regression model. The appropriateness of the multiple regression model as a whole
can be tested by this test. A significant F indicates a linear relationship between Y and
at least one of the X's.
Assumptions
Irrespective of the type of F-test used, one assumption has to be met. The populations from
which the samples are drawn have to be normal. In the case of F-test for equality of variance,
a second assumption has to be satisfied in that the larger of the sample variances has to be
placed in the numerator of the test statistic.
Like t-test, F-test is also a small sample test and may be considered for use if sample size is <
30.
Deciding
In attempting to reach decisions, we always begin by specifying the null hypothesis against a
complementary hypothesis called alternative hypothesis. The calculated value of the F-test
with its associated p-value is used to infer whether one has to accept or reject a null
hypothesis.
All software's provide these p-values. If the associated p-value is small i.e. (<0.05) we say
that the test is significant at 5% and one may reject the null hypothesis and accept the
alternative one.
On the other hand if associated p-value of the test is >0.05, one may accept the null
hypothesis and reject the alternative. Evidence against the null hypothesis will be considered
very strong if p-value is less than 0.01. In that case, we say that the test is significant at 1%.
Any statistical test that uses the chi square distribution can be called chi square test. It is
applicable both for large and small samples-depending on the context.
Statistical Tests
Submit
form-h8guBPNZKE guide_courses_ot
guide_course_sta
For example suppose a person wants to test the hypothesis that success rate in a
particular English test is similar for indigenous and immigrant students.
If we take random sample of say size 80 students and measure both indigenous/immigrant
as well as success/failure status of each of the student, the chi square test can be applied to
test the hypothesis.
There are different types of chi square test each for different purpose. Some of the popular
types are outlined below.
Chi square test for testing goodness of fit is used to decide whether there is any
difference between the observed (experimental) value and the expected
(theoretical) value.
For example given a sample, we may like to test if it has been drawn from a normal
population. This can be tested using chi square goodness of fit procedure.
2.
Chi square test for independence of two attributes. Suppose N observations are
considered and classified according two characteristics say A and B. We may be
interested to test whether the two characteristics are independent. In such a case,
we can use Chi square test for independence of two attributes.
The example considered above testing for independence of success in the English test vis
a vis immigrant status is a case fit for analysis using this test.
3.
Chi square test for single variance is used to test a hypothesis on a specific
value of the population variance. Statistically speaking, we test the null hypothesis
H0: = 0 against the research hypothesis H1: # 0 where is the population
mean and 0 is a specific value of the population variance that we would like to
test for acceptance.
In other words, this test enables us to test if the given sample has been drawn from a
population with specific variance 0. This is a small sample test to be used only if sample
size is less than 30 in general.
Assumptions
The Chi square test for single variance has an assumption that the population from which
the sample has been is normal. This normality assumption need not hold for chi square
goodness of fit test and test for independence of attributes.
However while implementing these two tests, one has to ensure that expected frequency in
any cell is not less than 5. If it is so, then it has to be pooled with the preceding or
succeeding cell so that expected frequency of the pooled cell is at least 5.
Since these tests do not involve any population parameters or characteristics, they are also
termed as non parametric or distribution free tests. An additional important fact on these
two tests is they are sample size independent and can be used for any sample size as long
as the assumption on minimum expected cell frequency is met.
Hypothesis Testing
The process of hypothesis testing can seem to be quite varied with a multitude of test
statistics. But the general process is the same. Hypothesis testing involves the statement of a
null hypothesis, and the selection of a level of significance. The null hypothesis is either true
or false, and represents the default claim for a treatment or procedure. For example, when
examining the effectiveness of a drug, the null hypothesis would be that the drug has no
effect on a disease.
After formulating the null hypothesis and choosing a level of significance, we acquire data
through observation. Statistical calculations tell us whether or not we should reject the null
hypothesis.
In an ideal world we would always reject the null hypothesis when it is false, and we would
not reject the null hypothesis when it is indeed true. But there are two other scenarios that are
possible, each of which will result in an error.
Type I Error
The first kind of error that is possible involves the rejection of a null hypothesis that is
actually true. This kind of error is called a type I error, and is sometimes called an error of the
first kind.
Type I errors are equivalent to false positives. Lets go back to the example of a drug being
used to treat a disease. If we reject the null hypothesis in this situation, then our claim is that
the drug does in fact have some effect on a disease. But if the null hypothesis is true, then in
reality the drug does not combat the disease at all. The drug is falsely claimed to have a
positive effect on a disease.
Type I errors can be controlled. The value of alpha, which is related to the level of
significance that we selected has a direct bearing on type I errors. Alpha is the maximum
probability that we have a type I error. For a 95% confidence level, the value of alpha is 0.05.
This means that there is a 5% probability that we will reject a true null hypothesis. In the long
run, one out of every twenty hypothesis tests that we perform at this level will result in a type
I error.
Type II Error
The other kind of error that is possible occurs when we do not reject a null hypothesis that is
false. This sort of error is called a type II error, and is also referred to as an error of the
second kind.
Type II errors are equivalent to false negatives. If we think back again to the scenario in
which we are testing a drug, what would a type II error look like? A type II error would occur
if we accepted that the drug had no effect on a disease, but in reality it did.
The probability of a type II error is given by the Greek letter beta. This number is related to
the power or sensitivity of the hypothesis test, denoted by 1 beta.
There are two types of test data and consequently different types of analysis. As
the table below shows, parametric data has an underlying normal distribution
which allows for more conclusions to be drawn as the shape can be
mathematically described. Anything else is non-parametric.
Parametric
Non-parametric
Assumed
distribution
Normal
Any
Assumed variance
Homogeneous
Any
Typical data
Ratio or Interval
Ordinal or Nominal
Data set
relationships
Independent
Any
Usual central
measure
Mean
Median
Benefits
Simplicity; Less
affected by outliers
Choosing
Choosing
parametric test
Correlation test
Pearson
Spearman
Independent
measures, 2 groups
Independentmeasures t-test
Mann-Whitney test
Independent
measures, >2
groups
One-way,
independentmeasures ANOVA
Kruskal-Wallis test
Repeated
measures, 2
conditions
Matched-pair t-test
Wilcoxon test
Repeated
measures, >2
conditions
One-way, repeated
measures ANOVA
Friedman's test
Tests
Know the data set. Are your data longitudinal? If they are, consider using a
regression technique such as a GEE that is designed for longitudinal data.
Do your data come from a survey? Have you read the codebook and
documentation for information about how the data were collected? How are
the variables of interest coded? Are there missing values? How are they
coded? How will you handle missing data? You will probably need to recode
and construct new variables. Are there sample weights in the data set? Will
you use a split-sample approach to verify the model?
Check the distributions of the variables. Use q-q plots and box-andwhisker plots. Run descriptive statistics. Is your outcome variable distributed
normally? What about the explanatory variables? If your variables are not
distributed normally, consider using a transformation such as the natural
logarithm, square root, or quadratic (depending onthe direction and degree
of skewness). If the outcome variable is binary, use logistic regression. Are
there outliers or other influential observations that you can see? Consider
their source. Do you need to compute dummy variables and include them in
your model? Check to make sure youve taken care of missing values they
can throw everything off if they are not adjusted for in the model!
Assess the bivariate associations in the data. Use scatterplots for
continuous variables. Plot linear and nonlinear lines to determine the
bivariate associations. Compute bivariate correlations for continuous
variables. Look for outliers and potential collinearity problems.
Estimate the regression model. Avoid automated variable selection
procedures unless your goal is simply to find the best prediction model.
Assess the results. Are there unusual coefficients (overly large or small;
negative when they should be positive)? Save the collinearity diagnostics;
save the influential observation diagnostics (studentized residuals, leverage
values, Cooks D, and DFFITS); run a scatterplot of deleted studentized
residuals by standardized predicted values; estimate partial residual plots;
ask for a normal probability plot of the residuals; ask for the Durbin-Watson
statistic, if appropriate; compute Morans I, if using spatial data and it is
available. Assess the goodness-of-fit statistics (adjusted R 2; F-value and its
accompanying p-value; SE). Run nested models if appropriate and use nested
F-tests to compare them. These are particularly useful for assessing potential
specification errors.
Check the diagnostics. Are there any collinearity problems? If yes, you
might need to combine variables, collect more data, or, as last resort, drop
variables. If the collinearity problem involves interaction or quadratic terms,
use centered values such as z-scores to recompute them. Are the residuals
normally distributed? If not, consider a transformation. Do the partial
residual plots provide evidence of nonlinear associations? Is there evidence
of heteroscedasticity? (If theplot is inconclusive visually, try Whites test or
Glejsers test.) If yes, consider transforming a variable, weighted least
squares regression, or using Huber-White sandwich estimators. Is there
evidence of autocorrelation? Consider the source and try to correct for it. If
you have spatial data, a spatial regression model may be needed. Use PraisWinsten regression or time-series techniques for data collected over time, if
appropriate. Are there influential observations? If yes, consider their source.
Are there coding errors? Will a transformation help? If not, use a robust
regression technique to adjust for influential observations.
Interpret and present the results. Interpret the unstandardized slopes and pvalues. What do the goodness-of-fits statistics tell you about the model?
Compare the results to the guiding hypotheses. Given the decision rules, are the
hypotheses or the conceptual model supported by the analysis? Consider
presenting predicted values, especially from models that include interaction
terms. Consider graphical presentations of coefficients, nonlinear associations,
and interactions. These can provide intuitive information that is often lost when
presenting only numbers.
Descriptive statistics are summative methods to depict the data in succinct ways. I will
have you know it was very difficult to write a definition of descriptive statistics that did not
include the word 'descriptive' or 'describe.' My sixth grade teacher always told us never use
the word we're defining in the definition.
Here is a list of descriptive statistics, and then we will move onto talking more about them.
Some of these you will find very familiar, and some may be new. Some may have wordings
that you aren't used to:
Mean
Median
Mode
Range
Standard deviation
Coefficient of variation
Mean
In statistics, mean simply means average score of the sample. This is where you add up all
your values and then divide by the number of participants. To expose you to some
additional terms, sum means simply to add up. Using sum may be new to you if you haven't
taken many math classes, but the term makes it easier to write and more educated
sounding.
A quick example would be:
Number of bites it takes me to eat a fun-sized candy bar: 4, 2, 1, 1, 4, 1, 2, 1.
The sum total is 16, which makes the mean 2. Fairly simple, right?
Mean is useful for helping us understand the average participant's score in your study. It
gives us, the readers, a quick way to formulate what a typical or normal variable is in your
study. This is most often used to describe the average age of the participants but could also
be used to describe the average scores on a test or number of years involved in something.
With this in mind, we can take an individual score and compare it to an average. For
example, if I said the average height of the islanders is 5 feet tall and I'm 6 foot 3, then we
know that in comparison, I am much taller than the average islander.
Median
Median is the middle score after the scores have been arranged in numerical order. For
example, if we looked at the candy bar eating numbers from before: 4, 1, 1, 1, 4, 2, 2, 1, we
will need to reorganize them into 4, 4, 2, 2, 1, 1, 1, 1. As you can see, I put the highest
number first, but you could put the lowest number first, and your median will be the same.
We have eight numbers, so we count halfway. Counting obviously works better with odd
numbers since you land on a single number. With even numbers, you will take the middle
two numbers and then average them. Our middle numbers are 2 and 1. This means our
median is 1.5.
Median is useful for similar reasons to mean: It provides the reader with an understanding
of an average, or normal, participant or measured variable. Median, however, reduces the
effect of outliers, or a point of data that is distant from the others, either extremely high or
extremely low. In our candy-eating example, if it took me 15 bites to eat a really chewy and
delicious candy bar, then this would be an outlier. If you add 15 to the scores, our new mean
would be 3.4, with our median at 2. Which better describes the data?
Mode
Mode, defined as the most often occurring value, is by far the easiest to compute,
particularly if you have scores in numerical order. In our candy-chewing example, the most
often occurring number is 1. Another way of thinking about mode is what is the most
common number.
But who cares? Well, you should. Why care? Because the candy bar example is not very good
when we're trying to make important decisions, but there are many studies out there where
individual descriptions matter.
For example, if we were looking at a school with financial issues, and the most common type
of staff is administration, what does that tell us? (That like almost everywhere in the world,
bureaucracy thrives so much so that it makes me crazy.) Or what if you were going to be a
senator, and you needed to know what ethnicity was most common in your district? So
there are a lot of ways that mode can help us describe the world we live in.
Range
Range is simply a single number representing the spread of the data. This is where you take
the highest number you have - in our candy example it's 4 - and subtract the smallest
number. In this example, it's 1. This gives us a total range of candy bar bites at 3.
Range is kind of a funny thing. While it does tell us how spread out the data is, if you want a
lot of variability, then you want a high range. For example, if you're wondering if a new
teaching program educates both intellectually high and low children, you should have a
wider range. If you want little variability, like a study on people with severe depression, then
you want the range to be relatively small. It all depends on what you're looking at.