Documente Academic
Documente Profesional
Documente Cultură
CHAPTER 1 : CORRELATION
1. Definition of correlation
Correlation analysis attempts to determine the degree of relationship between variables. In other word, correlation is an analysis of the co-variation between two or more variables. The correlation expresses the relationship or inter-dependence of two sets of variables One variable may be called independent and the other dependent variable. For instance, agricultural production depends on rainfall. Rainfall causes increase in the agricultural production. But increase in the agricultural production has no impact on the rainfall. Therefore, rainfall is independent and production is dependent.
2. Usefulness of correlation
Correlation is useful in natural and social sciences. We shall, however, study the uses of relation in business and economics.
1. Correlation is very useful to economists to study the relationship between variables, like price and quantity demanded. 2. Correlation analysis helps in measuring the degree of relationship between the variables like supply and demand, price and supply, income and expenditure, etc. 3. Correlation analysis help to reduce the range of uncertainty of our prediction. 4. Correlation analysis is the basis for regression analysis.
There are different types of correlation, but the important types are:
o o o o
Positive and negative Simple and multiple Complex (Partial and total) Linear and non-linear
Positive and negative correlations depend upon the direction of change of the variables. If two variables tend to move together in the same direction, it called is positive or direct correlation. Example : Rainfall and yield of crops, price and supply are examples of positive correlation.
If two variables tend to move together in opposite directions, it is called negative or inverse correlation. Example : Price and demand, yield of crops and price, etc., are examples of negative correlation.
When we study only two variables, the relationship is described as simple correlation; Example : supply of money and price level, demand and price, etc. In a multiple correlation we study more than two variables simultaneously, Example : the relationship of price, demand and supply of a commodity.
When there is a multifactor relationship, but we study the correlation of only two variables excluding other, it is called partial correlation. For example : there is relationship among price, demand and supply, but when we study the correlation between price and demand, excluding the impact of the supply it is a partial study of correlation. In total correlation, all the factors affecting the result are taken into account.
If the ratio of change between two variables is constant (same), then there is a linear correlation between these variables. In such case, if we plot the variables on the graph, we get a straight line. The example below shows linear correlation between X and Y variables:
X Y
5 4
10 8
15 12
20 16
25 20
In a non-linear (or curvilinear) correlation the ratio of change between two variables is not constant (same). The graph of non-linear (or curvilinear) relationship forms a curve.
There are following different methods of finding out the relationship between two variables: The different methods of finding out the relationship between two variables are
Graphic method
Mathematical method
o Karl Pearson's Coefficient of Correlation o Spearman's Rank coefficient of Correlation o Coefficient of Concurrent Deviation
o Method of least squares.
4.1.1 Scatter Diagram method This is the simplest method of finding out whether there is any relationship present between two variables. In this method, the given data are plotted on a graph paper in the form of dots. X variables are plotted on the horizontal axis and Y variables on the vertical axis. As a result, we have the dots and we can find out the relationship of different points.
Diagram 1
Diagram 2
Diagram 3
Diagram 4
Diagram 5
4.1.2 Simple graph or Correlogram If the values of the two variables X and Y are plotted on a graph paper, we get two curves, one for X variables and another for Y variables. These two curves reveal whether or not the variables are related. If both the curves move in the same direction parallel to each other, upward or downward the correlation between X and Y is said to be positive. On the other hand, if they move in the opposite directions, then the correlation is said to be negative.
Karl Pearson a statistician suggested a mathematical method for measuring the magnitude of linear relationship between two variables. It is the most widely used method, which is denoted by the symbol r. The formulas for calculating Pearsonian Coefficient of Correlation are: xy r= Nxy (1)
Coefficient of correlation is more reliable if the error of measurement is reduced to the minimum. 1 - r2 Probable Error (P.E.r) = 0.6745 N
If the value of r is less than the probable error, the value of r is not at all significant. If the value of r is more than six times the probable error, the value of r is significant. If the probable error is less than 0.3, the correlation should not be considered at all. If the probable error is small, the correlation definitely exists.
Compute the difference of the two ranks (R1 and R2) and denote by D. Square the D to get D2. Compute D2 Apply the formula.
CHAPTER 2 : REGRESSION
1. Definition
Regression estimates or predicts the value of one variable from the given value of other variable.
For instance, price and supply are correlated. So, we can find out the expected amount of supply for a given price.
In ' other words, regression helps us to estimate one variable or the dependent variable from the other variable or the independent variable.
Following are the uses of regression analysis: 1. Regression analysis is used in many scientific studies. It is widely used in social sciences like economics, natural and physical sciences.
2.
It is used to estimate the relation between two economic variables like income and expenditure. If we know the income, we can find out the probable expenditure.
3.
Regression analysis predicts the value of dependent variables from the values of independent variables.
Correlation
Regression
helps from the the
Correlation estimates the relationship Regression analysis between two or more variables, which dependent variable move in the same or opposite direction. independent variable. Both the variables (dependent independent) vary random.
Correlation estimates the degree of Regression analysis indicates cause relationship between two (more variables) and effect of the relationship between but not the cause and effect the variables The measure of coefficient of correlation is The measure of regression coefficient relative and varies between + 1 and 1. is absolute. If the value of one variable is known, the other could be measured. In correlation estimation there may be In regression there is no senseless senseless correlation between two (more) relationship. variables. Correlation has limited application. It is Regression has wider application. It is confined only to linear relationship. applicable to linear and non-linear relationship. Correlation is not applicable for further Regression is applicable for further mathematical treatment. mathematical treatment. If coefficient of correlation is positive, the Regression coefficient explains that if two variables are positively correlated and one variable increases other vice versa. decreases.
4.1 Graphic Method In this method the corresponding values of the independent and dependent variables are plotted on a graph paper. The independent variables are plotted on horizontal axis and the dependent variables on the vertical axis.
4.2 Algebraic Method In graphical expression, a regression line is a straight line fitted to the data by the method of least squares. A regression line is the line of best fit. Always there are two regression lines between two variables, say, X and Y. One regression line shows regression of X upon Y, and the other shows the regression of Y upon X. The regression line of X on Y gives the most probable values of X for any given value of Y. In the same manner, the regression line of Y on X gives the most probable values of Y for any given value of X.
Following are the conditions of correlation between two variables: If there is perfectly positive correlation (+1) between two variables, the two regression lines (X of Y & Y of X) coincide with each other (Annex-2). If there is perfectly negative correlation (-1) between two variables, the two regression lines (X of Y & Y of X) coincide with each other (Annex-2). If there is a higher degree of correlation, the regression lines are nearer to each other (Figure 1). If there is lesser degree of correlation, the two lines are away from each other (Figure 2). If coefficient of correlation is zero (r = 0), the two variables are independent. In this case, both the regression lines cut each other at right angle (Figure 3).
The two regression lines (X of Y & Y of X) cut each other at the point of average of X and Y. If we draw a perpendicular from the point to the X-axis we get the average value of X and if we draw a perpendicular from the point to the Y-axis we get the average value of Y.
CHAPTER 3 : INDEX
An index number may be described as a specialised average designed to measure the change in the level of a phenomenon with respect to time, geographic location or other characteristics such as income, etc. For proper understanding of the term index number, the following points are to consider:
An index number represents the general level of magnitude of the changes between two or more situations of a number of variables taken as a whole.
Index numbers arc quantitative measures of the general level of growth of prices, production, inventory and other quantities of economic interest
Index numbers help in framing suitable policies. Many of the economic and business policies are guided by index numbers. For example, for deciding the increase in dearness allowance of the employees, the employer has to depend primarily upon the cost of living index. If wages and salaries are not adjusted in accord the cost of living, often strikes and lockouts cause considerable waste of resources. Though index numbers are most widely used in the evaluation of business and economic conditions there are a large number of other fields also where index numbers are useful. For example, sociologist may speak of population indices; psychologists measure intelligence quotients, which are essentially index numbers comparing a person's intelligence score with that of an average for his or her age; health authorities prepare indices to display changes in the adequacy of hospital facilities; and educational research organizations have devised formulae to measure changes in effectiveness of school systems.
Index numbers are very useful in defining. Index numbers are used to adjust the original data for price changes, or to adjust wage for cost of living changes and thus transform nominal wages into real wages.
A large number of formulae have been devised for constructing index numbers. They can be grouped under following two heads (Figure below):
In the unweighted indices weights are not expressly assigned whereas in the weighted indices weights are assigned to the various items. Each of these types may further be divided under following two heads (Figure below)
Simple Aggregative Unweighted indices Simple Aggregative Index Numbers Index Numbers Unweighted indices Simple Average
This is the simplest method of constructing index numbers. When this method is used to construct a price index, the total of current year prices for the various commodities in question is divided by the total of base year prices and the quotient is multiplied by 100. In other word it means:
Total of current year prices for various commodities Index = Total of base year prices for various commodities
Symbolically,
P1 I= P0 Where: I = Index P1 = Total of current year prices for various commodities (Current year prices of all commodities are added) P0 = Total of base year prices for various commodities (Prices of all commodities from the base year are added)
When this method is used to construct price index, relative prices for the items those are included in the index are calculated and then an average of these relative prices is calculated using any one of the methods for the calculation of average (central tendency), i.e., arithmetic mean, median, mode, geometric mean or harmonic mean. If arithmetic mean is used for the calculation of relative (averaging) price, the formula for computing the index is:
P1 P0 I= N Where: I = Index P1 = Current year prices of the commodity P0 = Base year prices of the commodity N = Number of commodities relative price change are considered 100
Although any methods of average (central tendency) can be used to obtain the overall index, price relatives are generally averaged either by the arithmetic or the geometric mean.
Weighted Aggregative Index Numbers, and Weighted Average of Relative Index Numbers
Weighted Aggregative Index Numbers are of the simple aggregative type. There are various methods of assigning weights.
Laspeyres method Paasche method Dorbish and Bowleys method Fishers Ideal method Marshal-Edgeworth method, and Kelly's method
All these methods carry the name of persons who have suggested them.
What is the change in aggregate value of the base period list of goods when valued at given period prices.
The formula for constructing index is p1q0 I= p0q0 Steps:
100
Multiply the base year prices of the commodities with base year weights and obtain p1q0 Multiply the base year prices of the commodities with base year weights and obtain p0q0 Divide p1q0 by p0q0 and multiply the quotient by 100. This gives us the price index
100
Multiply the current year prices of the commodities with current year weights and obtain p1q1 Multiply the base year prices of the commodities with current year weights and obtain p0q1 Divide p1q1, by p0q1, and multiply the quotient by 100.
CHAPTER 6 : HYPOTHESIS
1. What Is a Hypothesis?
Step 1: Specify the Null Hypothesis The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more groups or factors. In research studies, a researcher is usually interested in disproving the null hypothesis.
Step 2: Specify the Alternative Hypothesis The alternative hypothesis (H1) is the statement that there is an effect or difference. This is usually the hypothesis the researcher is interested in proving. The alternative hypothesis can be one-sided(only provides one direction, e.g., lower) or two-sided.
Step 3: Set the Significance Level (a) The significance level (denoted by the Greek letter alpha a) is generally set at 0.05. This means that there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is actually true. The smaller the significance level, the greater the burden of proof needed to reject the null hypothesis
Step 4: Calculate the Test Statistic and Corresponding P-Value In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing generally uses a test statistic that compares groups or examines associations between variables. When describing a single sample without
establishing relationships between variables, a confidence interval is commonly used. The p-value describes the probability of obtaining a sample statistic as or more extreme by chance alone if your null hypothesis is true. This p-value is determined based on the result of your test statistic. Your conclusions about the hypothesis are based on your p-value and your significance level.
1. P-value <= significance level (a) => Reject your null hypothesis in favor of your alternative hypothesis. Your result is statistically significant. 2. P-value > significance level (a) => Fail to reject your null hypothesis. Your result is not statistically significant.
LEVEL OF SIGNIFICANCE
The methods of inference used to support or reject claims based on sample data are known as tests of significance.
Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been put forward. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. We would write H0: there is no difference between the two drugs on average.
The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to establish. For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new drug has a different effect, on average, compared to that of the current drug. We would write Ha: the two drugs have different effects, on average. The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In this case we would write Ha: the new drug is better than the current drug, on average.
ONE-SIDED or TWO-SIDED
Hypotheses are always stated in terms of population parameter, such as the mean An alternative hypothesis may be one-sided or two-sided. .
A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis. A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis -- the direction does not matter.
Hypotheses for a one-sided test for a population mean take the following form: H0: Ha: or H0: Ha: =k >k =k < k.
Hypotheses for a two-sided test for a population mean take the following form: H0: Ha: =k k.
Type I Error
Rejecting the null hypothesis when it is in fact true is called a Type I error.
When a hypothesis test results in a p-value that is less than the significance level, the result of the hypothesis test is called statistically significant.
Type II Error Not rejecting the null hypothesis when in fact the alternate hypothesis is true is called a Type II error.
Note: "The alternate hypothesis" in the definition of Type II error may refer to the alternate hypothesis in a hypothesis test, or it may refer to a "specific" alternate hypothesis.
Example: In a t-test for a sample mean , with null hypothesis "" = 0" and alternate hypothesis " > 0", we may talk about the Type II error relative to thegeneral alternate hypothesis " > 0", or may talk about the Type II error relative to the specific alternate hypothesis " > 1". Note that the specific alternate hypothesis is a special case of the general alternate hypothesis.