Sunteți pe pagina 1din 14

STATISTICS (THEORETICAL)

A report submitted to the

Department of Electrical and Computer Engineering,

College of Engineering

University of Duhok

Student name: Mustafa Badeea Abdulaziz

Moodle Email: w47hgqce@gmail.com

Year: 1st

Course: Statistics (Theoretical)

Course Code: GS2101

Instructor: Dr. Mohammed Ali Hussein

Date:
TABLE OF CONTENTS
CORRELATION (3)
DEFINITION
TYPES OF CORRELATION
IMPORTANT NOTES
ADVANTAGES OF CORRELATION
PROBLEM OF CORRELATION DATA
HOW TO CALCULATE CORRELATION
CORRELATION FORMULA
EXAMPLE
CORRELATION COEFFICIENT (7)
DEFINITION
IMPORTANT NOTES
CORRELATION COEFFICIENT EQUATION
THE USE OF CORRELATION COEFFICIENT
CORRELATION COEFFICIENT FORMULA
EXAMPLE
SIMPLE REGRESSION (12)
DEFINITION
THE IMPORTANCE OF REGRESSION
THE GENERAL FORM OF EACH TYPE OF REGRESSION
DEFFERENCES

REFERENCES
www.Easycalculation.com

www.Wikipedia.com

www.purplemath.com

www.emathzone.com

www.investopedia.com

www.easy-math.net
CORRELATION
Correlation, in the finance and investment industries, is a statistic that measures the
degree to which two securities move in relation to each other. Correlations are used
in advanced portfolio management, computed as the correlation coefficient, which
has a value that must fall between -1.0 and +1.0. It shows the strength of a
relationship between two variables and is expressed numerically by the correlation
coefficient. The correlation coefficient's values range between -1.0 and 1.0. A
perfect positive correlation means that the correlation coefficient is exactly 1. This
implies that as one security moves, either up or down, the other security moves in
lockstep, in the same direction. A perfect negative correlation means that two assets
move in opposite directions, while a zero correlation implies no linear relationship at
all.

 Correlation does not imply causation.


 Correlation is a statistic that measures the degree to which two variables
move in relation to each other.
 In finance, the correlation can measure the movement of a stock with that of a
benchmark index, such as the S&P 500.
 Correlation measures association, but doesn't show if x causes y or vice
versa, or if the association is caused by a third–perhaps unseen–factor.
 Correlation. Correlation is a statistical technique that can show whether and
how strongly pairs of variables are related. For example, height and weight are
related; taller people tend to be heavier than shorter people. ... An
intelligent correlation analysis can lead to a greater understanding of your
data.
 Correlation means association - more precisely it is a measure of the extent to
which two variables are related. ... Therefore, when one variable increases as
the other variable increases, or one variable decreases while the other
decreases. An example of positive correlation would be height and weight.
 The strongest linear relationship is indicated by a correlation coefficient of -1
or 1. The weakest linear relationship is indicated by a correlation coefficient
equal to 0. A positive correlation means that if one variable gets bigger, the
other variable tends to get bigger.
 Positive correlation is a relationship between two variables in which both
variables move in tandem—that is, in the same direction. A positive
correlation exists when one variable decreases as the other variable
decreases, or one variable increases while the other increases.
 Correlation is used to describe the linear relationship between two continuous
variables (e.g., height and weight). In general, correlation tends to
be used when there is no identified response variable. It measures the
strength (qualitatively) and direction of the linear relationship between two or
more variables.
 A correlation matrix is a table showing correlation coefficients between sets of
variables. ... A correlation matrix showing correlation coefficients for
combinations of 5 variables B1:B5. The diagonal of the table is always a set of
ones, because the correlation between a variable and itself is always 1.

 Scatter diagram and correlation graph are the two important


graphic methods while coefficient of correlation is an algebraic method used
for measuring correlation. a) Scatter diagram This is a graphical method of
studying the correlation between two variables.
 The P-value is the probability that you would have found the current result if
the correlation coefficient were in fact zero (null hypothesis). If this probability
is lower than the conventional 5% (P<0.05) the correlation coefficient is called
statistically significant.
 The weakest linear relationship is indicated by a correlation coefficient equal to
0. A positive correlation means that if one variable gets bigger, the other
variable tends to get bigger.
 To determine whether the correlation between variables is significant, compare
the p-value to your significance level. Usually, a significance level (denoted as
α or alpha) of 0.05 works well. An α of 0.05 indicates that the risk of concluding
that a correlation exists—when, actually, no correlation exists—is 5%.
 The greater the absolute value of the Pearson product-
moment correlation coefficient, the stronger the linear relationship.
The strongest linear relationship is indicated by a correlation coefficient of -1
or 1. The weakest linear relationship is indicated by a correlation coefficient
equal to 0.
 A zero correlation exists when there is no relationship between two variables.
For example their is no relationship between the amount of tea drunk and level
of intelligence.
 An example of positive correlation would be height and weight. Taller people
tend to be heavier. A negative correlation is a relationship between two
variables in which an increase in one variable is associated with a decrease in
the other.

Types of Correlation

There are three different types of correlations: positive, negative, and neutral or


no correlation. A perfect positive correlation would mean that if you increased the one
variable by one unit you could predict with 100% accuracy how far the other variable
would increase. Usually, in statistics, we measure four types of correlations:
Pearson correlation, Kendall rank correlation, Spearman correlation, and the Point-
Biserial correlation.

Advantages of correlation
Another benefit of correlational research is that it opens up a great deal of further
research to other scholars. It allows researchers to determine the strength and
direction of a relationship so that later studies can narrow the findings down and, if
possible, determine causation experimentally.

Problems of correlation data

While correlational research can suggest that there is a relationship between two


variables, it cannot prove that one variable causes a change in another variable. In
other words, correlation does not equal causation.

How to Calculate a Correlation:


1. Find the mean of all the x-values.
2. Find the standard deviation of all the x-values (call it s x) and the standard
deviation of all the y-values (call it sy). ...
3. For each of the n pairs (x, y) in the data set, take.
4. Add up the n results from Step 3.
5. Divide the sum by sx ∗ sy.

Correlation’s Formula

where:
r=the correlation coefficient

X=the average of observations of variable X

Y=the average of observations of variable Y

EXAMPLE

Investment managers, traders, and analysts find it very important to calculate


correlation because the risk reduction benefits of diversification rely on this statistic.
Financial spreadsheets and software can calculate the value of correlation quickly.

As a hypothetical example, assume that an analyst needs to calculate the correlation


for the following two data sets:

X: (41, 19, 23, 40, 55, 57, 33)

Y: (94, 60, 74, 71, 82, 76, 61)

There are three steps involved in finding the correlation. The first is to add up all the
X values to find SUM(X), add up all the Y values to fund SUM(Y) and multiply each X
value with its corresponding Y value and sum them to find SUM(X,Y):

SUM(X) = (41 + 19 + 23 + 40 + 55 + 57 + 33) = 268

SUM(Y) = (94 + 60 + 74 + 71 + 82 + 76 + 61) = 518

SUM(X,Y) = (41 x 94) + (19 x 60) + (23 x 74) + ... (33 x 61) = 20,391

The next step is to take each X value, square it, and sum up all these values to find
SUM(x^2). The same must be done for the Y values:

SUM(X^2) = (41^2) + (19^2) + (23^2) + ... (33^2) = 11,534

SUM(Y^2) = (94^2) + (60^2) + (74^2) + ... (61^2) = 39,174

Use correlation’s formula to resolve the Question:

In this example, the correlation would be:

r = (7 x 20,391 - (268 x 518) / SquareRoot((7 x 11,534 - 268^2) x (7 x 39,174 -


518^2)) = 3,913 / 7,248.4 = 0.54
CORRELATION COEFFIECIENT
The correlation coefficient is a statistical measure of the strength of the relationship
between the relative movements of two variables. The values range between -1.0
and 1.0. A calculated number greater than 1.0 or less than -1.0 means that there was
an error in the correlation measurement. A correlation of -1.0 shows a
perfect negative correlation, while a correlation of 1.0 shows a perfect positive
correlation. A correlation of 0.0 shows no linear relationship between the movement
of the two variables. There are several types of correlation coefficients, but the one
that is most common is the Pearson correlation (r). This measures the strength and
direction of the linear relationship between two variables. It cannot capture nonlinear
relationships between two variables and cannot differentiate between dependent and
independent variables.

 Correlation coefficients are used to measure the strength of the


relationship between two variables.
 Pearson correlation is the one most commonly used in statistics. This
measures the strength and direction of a linear relationship between two
variables.
 Values always range between -1 (strong negative relationship) and +1
(strong positive relationship). Values at or close to zero imply weak or
no linear relationship.
 Correlation coefficient values less than +0.8 or greater than -0.8 are not
considered significant.
 The strength of the relationship varies in degree based on the value of the
correlation coefficient. For example, a value of 0.2 shows there is a positive
correlation between two variables, but it is weak and likely unimportant.
Analysts in some fields of study do not consider correlations important until the
value surpasses at least 0.8. However, a correlation coefficient with an
absolute value of 0.9 or greater would represent a very strong relationship.
 A value of exactly 1.0 means there is a perfect positive relationship between
the two variables. For a positive increase in one variable, there is also a
positive increase in the second variable. A value of -1.0 means there is a
perfect negative relationship between the two variables. This shows that the
variables move in opposite directions - for a positive increase in one variable,
there is a decrease in the second variable. If the correlation between two
variables is 0, there is no linear relationship between them.
 Investors can use changes in correlation statistics to identify new trends
in the financial markets, the economy, and stock prices.
 The P-value is the probability that you would have found the current result if
the correlation coefficient were in fact zero (null hypothesis). If this probability
is lower than the conventional 5% (P<0.05) the correlation coefficient is called
statistically significant
 When the r value is closer to +1 or -1, it indicates that there is a stronger linear
relationship between the two variables. A correlation of -0.97 is
a strong negative correlation while a correlation of 0.10 would be
a weak positive correlation.
 The strongest correlations (r = 1.0 and r = -1.0 ) occur when data points fall
exactly on a straight line. The correlation becomes weaker as the data points
become more scattered.
 The correlation coefficient is a statistical measure of the strength of the
relationship between the relative movements of two variables. The values
range between -1.0 and 1.0. A calculated number greater than 1.0 or less than
-1.0 means that there was an error in the correlation measurement.
 he weakest linear relationship is indicated by a correlation coefficient equal to
0. A positive correlation means that if one variable gets bigger, the other
variable tends to get bigger.
 50 is moderate validity. The possible range of the validity coefficient is the
same as other correlation coefficients (0 to 1) and so, in general, validity
coefficients tend not to be that strong; this means that other tests are usually
required. It's not unusual for validity coefficients to max out at around . 30.
 A correlation heatmap uses colored cells, typically in a monochromatic scale,
to show a 2D correlation matrix (table) between two discrete dimensions or
event types. ... Correlation heatmaps are ideal for comparing the
measurement for each pair of dimension values.
Correlation Coefficient Equation

To calculate the Pearson product-moment correlation, one must first determine the
covariance of the two variables in question. Next, one must calculate each variable's
standard deviation. The correlation coefficient is determined by dividing the
covariance by the product of the two variables' standard deviations.

The use of Correlation Coefficient

Correlation statistics can be used in finance and investing. For example, a correlation
coefficient could be calculated to determine the level of correlation between the price
of crude oil and the stock price of an oil-producing company, such as Exxon Mobil
Corporation. Since oil companies earn greater profits as oil prices rise, the correlation
between the two variables is highly positive.

Correlation Coefficient Formula

Example:

Find the value of the correlation coefficient from the following table:

SU B JEC T AGE X GL UC OSE L EVEL Y


1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Solution:

Step 1: Make a chart. Use the given data, and add three more columns: xy, x2,
and y2.

SU B JEC T AGE X GL UC OSE L EVEL Y XY X2 Y2


1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 2: Multiply x and y together to fill the xy column. For example, row 1
would be 43 × 99 = 4,257.

SU B JEC T AGE X GL UC OSE L EVEL Y XY X2 Y2


1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779

Step 3: Take the square of the numbers in the x column, and put the result in
the x2 column.

SU B JEC T AGE X GL UC OSE L EVEL Y XY X2 Y2


1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Step 4: Take the square of the numbers in the y column, and put the result in
the y2 column.

SU B JEC T AGE X GL U C OSE L EVEL Y XY X2 Y2


1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561

Step 5: Add up all of the numbers in the columns and put the result at the
bottom of the column. The Greek letter sigma (Σ) is a short way of saying “sum
of.”

SU B JEC T AGE X GL UC OSE L EVEL Y XY X2 Y2


1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Step 6: Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809


SIMPLE REGRESSION
Simple linear regression uses one independent variable to explain or predict the
outcome of the dependent variable Y, while multiple linear regression uses two or
more independent variables to predict the outcome. Regression can help finance and
investment professionals as well as professionals in other businesses. The two basic
types of regression are simple linear regression and multiple linear regression,
although there are non-linear regression methods for more complicated data and
analysis. Simple linear regression uses one independent variable to explain or
predict the outcome of the dependent variable Y, while multiple linear regression
uses two or more independent variables to predict the outcome.Regression can help
finance and investment professionals as well as professionals in other businesses.
Regression can also help predict sales for a company based on weather, previous
sales, GDP growth, or other types of conditions. The capital asset pricing
model (CAPM) is an often-used regression model in finance for pricing assets and
discovering costs of capital. Regression takes a group of random variables, thought
to be predicting Y, and tries to find a mathematical relationship between them. This
relationship is typically in the form of a straight line (linear regression) that best
approximates all the individual data points. In multiple regression, the separate
variables are differentiated by using subscripts.

 The purpose of regression analysis is to predict an outcome based on a


historical data. ... So regression analysis is used to predict the behavior of an
dependent variable(people who buy a wine) based on the behavior of a
few/large no. of independent variables(age, height, financial status).

 Regression is often used to determine how many specific factors such as the
price of a commodity, interest rates, particular industries, or sectors influence
the price movement of an asset. The aforementioned CAPM is based on
regression, and it is utilized to project the expected returns for stocks and to
generate costs of capital. A stock's returns are regressed against the returns
of a broader index, such as the S&P 500, to generate a beta for the particular
stock.

 Regression takes a group of random variables, thought to be predicting Y, and


tries to find a mathematical relationship between them. This relationship is
typically in the form of a straight line (linear regression) that best approximates
all the individual data points. In multiple regression, the separate variables are
differentiated by using subscripts.
 Regression helps investment and financial managers to value assets and
understand the relationships between variables
 variables such as the market capitalization of a stock, valuation ratios, and
recent returns can be added to the CAPM model to get better estimates for
returns. These additional factors are known as the Fama-French factors,
named after the professors who developed the multiple linear regression
model to better explain asset returns.

 A regression equation is used in stats to find out what relationship, if any,


exists between sets of data. For example, if you measure a child's height
every year you might find that they grow about 3 inches a year.

 Regression can help finance and investment professionals as well as


professionals in other businesses. 
 A simple linear regression plot for amount of rainfall. Regression analysis is
used in stats to find trends in data. For example, you might guess that there's
a connection between how much you eat and how much you
weigh; regression analysis can help you quantify that.

 A low predicted R-squared is a good way to check. P-values, predicted and


adjusted R-squared, and Mallows' Cp can suggest different
models. Stepwise regression and best subsets regression are great tools and
can get you close to the correct model.

The Importance of Regression

It analysis lies in the fact that it provides a powerful statistical method that allows a
business to examine the relationship between two or more variables of interest.

The General Form of Each Type of Regression

Simple linear regression

Y = a + bX + u

Multiple linear regression

Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + u


Where:

Y = the variable that you are trying to predict (dependent variable).

X = the variable that you are using to predict Y (independent variable).

a = the intercept.

b = the slope.

u = the regression residual.

S-ar putea să vă placă și