Sunteți pe pagina 1din 5

Session 1 & Session 2

Variate is a linear combination of variables with empirical weights. It is the building block of
multivariate.
Multivariate Techniques:
1. Multiple Linear Regression
2. Logistic Regression (Targeting)
3. Factor Analysis (Data Reduction)
4. Cluster Analysis (Market Segmentation)
5. Conjoint Analysis
6. Discriminant Analysis
7. Multi-Dimensional Scaling (Brand Positioning)

Multivariate Techniques are classified into two parts –


1. Interdependence Techniques – We cannot classify variables as dependent variable and
independent variable. We analyse all the variables simultaneously in order to find out
relationship and underlying structure amongst them. Further, the interdependence
techniques that we are going to study are as follows:
 Factor Analysis (Data is in the form of variables) (It is performed on reducing
data)
 Cluster Analysis (Data is the form of cases or number of respondents) (It is
performed on segmentation)
 Multi-Dimensional Scaling (Data is in the form of objects) (It is performed on
brand positioning)
All the above techniques have metric data

2. Dependence Techniques – In this technique, we can classify the variables as


independent and dependent variables. Amongst dependence techniques, a further
classification can be made:
 Metric (Interval, Ratio) – Multiple Regression, Conjoint Analysis
 Nonmetric (Nominal, Ordinal) – Discriminant Analysis, Logistic Regression

Multiple Linear Regression


 Only one dependent variable – unique
 Dependent variable is metric
 There are more than 1 independent variables
 Independent variables can be metric or nonmetric
 Independent variables – causes; dependent variable – effect
 Can be used for prediction or forecasting e.g. Sales forecast
Logistic Regression (Targeting)
 Only one dependent variable
 Dependent variable is nonmetric
 Can be one or more independent variable – can be metric or nonmetric
 There is a cause and effect relationship and we need to apply logistic regression

Discriminant Analysis
 Only one dependent variable – nonmetric
 Can be one or more independent variable – must be metric

Conjoint Analysis
 Only one dependent variable – nonmetric or metric
 One or more independent variables – must be nonmetric
 Can be used for new product/service development

Factor Analysis
 Factor means common underline dimensions.
 It is just simplifying data into different factors.
 The score of different dimensions in one factor is summated score.
 For every factor, we can have summated score.
 We have to then only deal with summated score per factor.
 Factor analysis is performed to achieve Parsimony by reducing dimensions with
minimum variables. (Parsimony: Avoiding variables indiscriminately)
 Specification Error means omitting a critical predictor variable which are important
for Factor analysis.
 Overfitting and Multicollinearity
 Overfitting is defined as sample conclusions which are not generalizable for
population.
 Multicollinearity is main villain for Multiple Regression.
Cluster Analysis
 Cluster Analysis is an analytical technique for developing meaningful subgroups of
individuals.
 Segmentation is done in different segments or subgroups.
Session 3 & Session 4
Multi-Dimension Scaling
 It is used for Brand Positioning

Multi-Variate Data Analysis


 It is working on a Macro Level.
 Macro Level- Dependence technique.
 Macro Level- Interdependence Technique.
Data Mining
 It is working on Micro Level.
 Micro Level- Supervised Learning
 Micro Level- Unsupervised learning
Applications of Regression
 Whenever there is cause and effect relationship, we use regression.
 Multiple R is Coefficient of Correlation.
 Correlation is the strength of linear relationship between two variables.
 Correlation is completely unaffected by the units of measurement.
 All correlations are between -1 and +1.
 Closer to +1/-1(stronger relationship) (close to +/- 0.7)
 Closer to 0(weaker relationship) (close to +/-0.3)
 Higher degree of coefficient of correlation there is a possibility of a cause and effect
relationship.
 R square gives goodness of fit.
 Standard Error is also goodness of fit.
 R square should be as high as possible.
 Standard Error is as low as possible.
 Whenever the value of residual is greater than double of standard error that is known
as Outlier.
 Always remove/ignore Outliers and rerun the regression.
 F statistic is square of t-statistic.
 The independent variables do not cause dependent variables.
 If F sig is lesser than 0.05, then null hypothesis rejected
 If Observed F > critical F then reject
 H0: area is not significant in resulting price of the plot.
 Type 1 error is probability of committing False positive.
 Tstat> tcritical= H0 rejected, Sig (2 tailed) < Alpha value; H0 rejected, Travel from
lower tail to upper tail if 0 is crossed then h0 not rejected (Three ways to test null
hypothesis(H0)).
 Mean of error terms is zero
 Error terms are normally distributed around the regression line
 Levine's test
 KS test for normality

Session 5 & Session 6


 Beta of a Stock: Slope of the least square line (Regression line).
 Least Square = (Observed value - Predicted value) Sq
 To predict the monthly return for a stock from monthly return for the market.
 Beta is the risk of a stock which is prevalent even after diversification.
 Govt Bonds and Gold are very low Beta.
 Put options have a very low Beta.
 Outlier bring non-normality.
 The regression coefficient of a variable estimates the effect, after adjusting for all other
independent variables used to estimate the regression equation, of a unit increase in the
independent variable.
 Heteroskedasticity: relationship between dependent variable and error terms
 Dependent variable of given data becomes the independent variable while checking
hetero…
 Autocorrelation
 Skewness kurtosis test
 Multi Collinearity
 Assumptions in Regression:
 Linearity
 Normality
 No Heteroscedasticity
 No Autocorrelation

Session 8
 Variance Inflation Factor when using SPSS for each and every independent variable.
 If VIF> 5, then it causes multi-collinearity for any independent variable.
 If the chocolates are kept at 5ft height, the sales will decrease as compared to 6ft height.
 If the chocolates are kept at 6ft height, the sales will increase to 20.25 units as compared
to kept at 7ft height.

Session 9
 Durbin Watson is used to check Auto-Correlation in SPSS.
 Durbin Watson should be between 1 and 3.
 Durbin Watson if exactly at 2 is very good, then there is no auto-correlation.
 VIF is used to check multicollinearity, it should be below 5 in SPSS.
Session 10
 Next week starts with Logistic regression

Session 11
 Formula for likelihood = if (subscription cell=1, probability cell, 1-probability cell)

Session 12
 Logistic Regression= Logarithm of Odds
 Odds = Probability of happening of an event/Probability of not happening of an event
Odds = Prob/(1-Prob)
 Prob = Odds/(1+Odds)

Session 13
 Discriminant Analysis using SPSS.

Session 14
 Conjoint Analysis

Session 15

 Conjoint Analysis
 Logistic
 MDS(MultiDimensionalScaling)
 Conjoint
 Discriminant analysis

S-ar putea să vă placă și