Documente Academic
Documente Profesional
Documente Cultură
ranges between -1 and +1 and quantifies the direction and strength of the linear
association between the two variables. The correlation between two variables can be positive
(i.e., higher levels of one variable are associated with higher levels of the other) or negative (i.e.,
higher levels of one variable are associated with lower levels of the other).
The sign of the correlation coefficient indicates the direction of the association. The magnitude
of the correlation coefficient indicates the strength of the association.
For example, a correlation of r = 0.9 suggests a strong, positive association between two
variables, whereas a correlation of r = -0.2 suggest a weak, negative association. A correlation
close to zero suggests no linear association between two continuous variables.
It is important to note that there may be a non-linear association between two continuous
variables, but computation of a correlation coefficient does not detect this. Therefore, it is always
important to evaluate the data carefully before computing a correlation coefficient. Graphical
displays are particularly useful to explore associations between variables.
The figure below shows four hypothetical scenarios in which one continuous variable is plotted
along the X-axis and the other along the Y-axis.
Each point represents an (x,y) pair (in this case the gestational age, measured in weeks, and
the birth weight, measured in grams). Note that the independent variable is on the horizontal
axis (or X-axis), and the dependent variable is on the vertical axis (or Y-axis). The scatter plot
shows a positive or direct association between gestational age and birth weight. Infants with
shorter gestational ages are more likely to be born with lower weights and infants with longer
gestational ages are more likely to be born with higher weights.
The variances of x and y measure the variability of the x scores and y scores around their
respective sample means (
, considered separately). The covariance measures the variability of the (x,y) pairs
around the mean of x and mean of y, considered simultaneously.
Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Regression analysis is also
used to understand which among the independent variables are related to the
dependent variable, and to explore the forms of these relationships. In restricted
circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can
lead to illusions or false relationships, so caution is advisable; [2] for
example, correlation does not imply causation.
Many techniques for carrying out regression analysis have been developed. Familiar
methods such as linear regression and ordinary least squares regression
are parametric, in that the regression function is defined in terms of a finite number
of unknown parameters that are estimated from the data. Nonparametric
regression refers to techniques that allow the regression function to lie in a specified
set of functions, which may be infinite-dimensional.
Positive Correlation
Positive correlation occurs when an increase in one variable increases the value in
another.
Negative correlation occurs when an increase in one variable decreases the value of
another.
No Correlation
Strong Correlation
A correlation is stronger the closer the points are located to one another on the line.
Weak Correlation
A correlation is weaker the farther apart the points are located to one another on
the line.