Sunteți pe pagina 1din 5

# Pearson r correlation

In order to determine how strong the relationship is between two variables, a formula must be
followed to produce what is referred to as the coefficient value. The coefficient value can range
between -1.00 and 1.00. If the coefficient value is in the negative range, then that means the
relationship between the variables isnegatively correlated, or as one value increases, the other
decreases. If the value is in the positive range, then that means the relationship between the variables
is positively correlated, or both values increase or decrease together. Let's look at the formula for
conducting the Pearson correlation coefficient value.

Ex: Let's say you were analyzing the relationship between your participant's age and reported level of
income. You're curious as to if there is a positive or negative relationship between someone's age and
their income level. After conducting the test, your Pearson correlation coefficient value is +.20.
Therefore, you would have a slightly positive correlation between the two variables, so the strength of
the relationship is also positive and considered strong. You could confidently conclude there is a
strong relationship and positive correlation between one's age and their income. In other words, as
people grow older, their income tends to increase as well.

## Spearmans correlation coefficient is a statistical measure of the strength of a monotonic relationship

between paired data. In a sample it is denoted by and is by design constrained as follows And its
interpretation is similar to that of Pearsons, e.g. the closer is to the stronger the monotonic
relationship.

Chi-square

A chi square (X2) statistic is used to investigate whether distributions of categorical variables differ
from one another. Basically categorical variable yield data in the categories and numerical variables
yield data in numerical form

Ex: Simple monhybrid cross between two individuals that were heterozygous for the trait of interest

Paired t-test
- A paired t-test is used to compare two population means where you have two samples in
which observations in one sample can be paired with observations in the other sample.

## Examples of where this might occur are:

Before-and-after observations on the same subjects (e.g. students diagnostic test results
before and after a particular module or course).
A comparison of two different methods of measurement or two different treatments where the
measurements/treatments are applied to the same subjects (e.g. blood pressure measurements
using a stethoscope and a dynamap).

Independent t-test
Statwing represents t-test results as distribution curves. Assuming there is a large enough sample
size, the difference between these samples probably represents a real difference between the
populations from which they were sampled.
Note: The below discusses the unranked independent samples t-test, the most common form of t-
test.
A t-test helps you compare whether two groups have different average values (for example, whether
men and women have different average heights).
Ex:

Lets say youre curious about whether New Yorkers and Kansans spend a different amount of money
per month on movies. Its impractical to ask every New Yorker and Kansan about their movie
spending, so instead you ask a sample of eachmaybe 300 New Yorkers and 300 Kansansand the
averages are \$14 and \$18. The t-test asks whether that difference is probably representative of a real
difference between Kansans and New Yorkers generally or whether that is most likely a meaningless
statistical fluke.

Degrees
- Source of of
Sums of Squares (SS) Mean Squares (MS) F
Variation Freedom
(df)

Between
k-1
Treatments

Error (or
N-k
Residual)

Total N-1

Technically, it asks the following: If there were in fact no difference between Kansans and New
Yorkers generally, what are the chances that randomly selected groups from those populations would
be as different as these randomly selected groups are? For example, if Kansans and New Yorkers as
a whole actually spent the same amount of money on average, its very unlikely that 300 randomly
selected Kansans each spend exactly \$14 and 300 randomly selected New Yorkers each spend
exactly \$18. So if youre sampling yielded those results, you would conclude that the difference in the
sample groups is most likely representative of a meaningful difference between the populations as a
whole.

Analysis of variance

- which is used to do the analysis of variance between and within the groups whenever the
groups are more than two. If you set the Type one error to be .05, and you had several
groups, each time you tested a mean against another there would be a .05 probability of
having a type one error rate. This would mean that with six T-tests you would have a 0.30
(.056) probability of having a type one error rate. This is much higher than the desired .05.
Where:

X = individual observation,

## N = total number of observations or total sample size.

Linear regression

In a cause and effect relationship, the independent variable is the cause, and the dependent
variable is the effect. Least squares linear regression is a method for predicting the value of a
dependent variable Y, based on the value of an independent variable X.

Prerequisites:

The dependent variable Y has a linear relationship to the independent variable X. To check
this, make sure that the XY scatterplot is linear and that the residual plot shows a random
pattern. (Don't worry. We'll cover residual plots in a future lesson.)

For each value of X, the probability distribution of Y has the same standard deviation . When
this condition is satisfied, the variability of the residuals will be relatively constant across all
values of X, which is easily checked in a residual plot.

## For any given value of X,

The Y values are independent, as indicated by a random pattern on the residual plot.

The Y values are roughly normally distributed (i.e., symmetric and unimodal). A
littleskewness is ok if the sample size is large. A histogram or a dotplot will show the
shape of the distribution.

Ex:
A researcher uses a regression equation to predict home heating bills (dollar cost), based on home
size (square feet). The correlation between predicted bills and home size is 0.70. What is the correct
interpretation of this finding?

(A) 70% of the variability in home heating bills can be explained by home size.
(B) 49% of the variability in home heating bills can be explained by home size.
(C) For each added square foot of home size, heating bills increased by 70 cents.
(D) For each added square foot of home size, heating bills increased by 49 cents.
(E) None of the above.

Solution

The correct answer is (B). The coefficient of determination measures the proportion of variation in the
dependent variable that is predictable from the independent variable. The coefficient of determination
is equal to R2; in this case, (0.70)2 or 0.49. Therefore, 49% of the variability in heating bills can be
explained by home size.

## Wilcoxon rank-sum test

- A popular nonparametric test to compare outcomes between two independent groups is the Mann
Whitney U test. The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon Test or the
Wilcoxon Rank Sum Test, is used to test whether two samples are likely to derive from the same
population (i.e., that the two populations have the same shape). Some investigators interpret this test
as comparing the medians between the two populations. Recall that the parametric test compares the
means (H0: 1=2) between independent groups.

In contrast, the null and two-sided research hypotheses for the nonparametric test are stated as
follows:

## H1: The two populations are not equal.

This test is often performed as a two-sided test and, thus, the research hypothesis indicates that the
populations are not equal as opposed to specifying directionality. A one-sided research hypothesis is
used if interest lies in detecting a positive or negative shift in one population as compared to the other.
The procedure for the test involves pooling the observations from the two samples into one combined
sample, keeping track of which sample each observation comes from, and then ranking lowest to
highest from 1 to n1+n2, respectively.

## Wilcoxon sign-rank test

- The Wilcoxon signed-rank test is the nonparametric test equivalent to the dependent t-test. As the
Wilcoxon signed-rank test does not assume normality in the data, it can be used when this
assumption has been violated and the use of the dependent t-test is inappropriate. It is used to
compare two sets of scores that come from the same participants. This can occur when we wish to
investigate any change in scores from one time point to another, or when individuals are subjected to
more than one condition.
For example, you could use a Wilcoxon signed-rank test to understand whether there was a
difference in smokers' daily cigarette consumption before and after a 6 week hypnotherapy
programme (i.e., your dependent variable would be "daily cigarette consumption", and your two
related groups would be the cigarette consumption values "before" and "after" the hypnotherapy
programme). You could also use a Wilcoxon signed-rank test to understand whether there was a
difference in reaction times under two different lighting conditions (i.e., your dependent variable would
be "reaction time", measured in milliseconds, and your two related groups would be reaction times in
a room using "blue light" versus "red light").

Sign test