Documente Academic
Documente Profesional
Documente Cultură
Examples ......
Nominal Scale
Ordinal Scale
Ratio Scale
Metric
Nominal Ordinal
Interval Ratio
Data Types
Order Nominal Ordinal Interval Ratio yes yes yes Equal Step Size yes yes Absolute zero yes
Types of Analysis
Univariate Bivariate Multivariate
Univariate Analysis
Single Variable
Bivariate Analysis
Two Variables Covariance Chi-square Simple Regression Correlation Coefficient
Multivariate Analysis
Many Variables Multiple Regression Correspondence Analysis Factor Analysis Cluster Analysis Discrimnant Analysis Multi Dimension Scaling AID
10
11
12
MVA Analysis
13
MVA Analysis
14
Data Files
15
Data files files come in wide variety of forms SPSS is designed to handle many of them
Spreadsheets created in Excel Database files created in Dbase Database files created in MS Access Tab delimited and other type of ASCII text files
Opening a file
- File
Open Data
16
Reads variable names from first row of spreadsheet Variable names should not be more than 8 characters If name is longer than 8 characters then it is truncated If first 8 characters dont create unique name then it is modified to create a unique variable name By default the data is read from first worksheet To read from different sheet select worksheet from drop down list In open file dialog box select folder in Look in Select file type Excel Select Excel file (district.xls in SPSSTrain folder) Select worksheet Click open
17
Field names are automatically translated in to variable names Variable names greater than 8 characters are truncated Records marked for deletion are also read but an additional field D_R is created which contain asterisk for cases marked for deletion Command File, Open, Data In open file dialog box select file type as Dbase Select folder in Look in: Select Dbase file (EFED.DBF in SPSSTrain folder)
18
MS ACCESS File
Command : File Open Database New Query In database wizard select MS Access Database Click Next In ODBC Driver Login dialog box click Browse Select Pay.mdb in SPSSTrain folder and click OK You can select one table or fields from different tables Tables are shown on left Double click on fields from various Tables Click Next Specify relationships and click Next Click Next for all cases to be retrieved In Define Variables one can change type of field from string to numeric for fields having numeric value Results shows the syntax which can be pasted in syntax editor Click Finish
19
Text Wizard can read text files formatted in variety of ways - Tab-delimited - Space-delimited - Comma-delimited - Fixed-Field format files Command : File Read Text Data Select Pay.txt In Read Text Wizard - Step1 - select no predefined format - click next - Step2 select delimited and yes for variable names click next - Step3 let there be default selection click next - Step4 Select comma as delimiter - Step5 can change type of fields - Step6 Click Finish Repeat it with reg.txt which is fixed format
20
Data Transformations
21
Compute Variable - computes values for a variable based on numeric transformations of other variables Recode Values - One can modify values by recoding them. This is useful for combining or collapsing categories. One can recode values within existing variable or create a new one Visual Bander - One can make intervals on different criteria Count - Counts occurrences of values within cases Rank cases - You can select multiple ranking methods. Separate ranking variable for each method is created Automatic Recode - Converts string and numeric values into consecutive integers
22
Compute Variable
Use Employees data.sav file Compute increase in salary as salary begsalary Create new variable as sal_inc Command : Transform - Compute
23
Recode Values
Use Employees data.sav Recode salary into salary groups Command : Transform Recode into different variable Double Click on current salary Type Salary_Gps in Output variable Name Type Salary Groups in Label Click old & New values Click Range Lowest through Type 25000 Click value in new value type 1 click Add Click Range type 25001 50000 new value 2 click add Click Range type 50001 75000 new value 3 click add Click Range type 75001 100000 new value 4 click add Click range through Highest type 100001 new value 5 click Add Click continue click Change click OK
Copyright 2001 ACNielsen
24
Visual Bander
Open Employees data.sav Transform Bander Double click current salary and click Continue Click current salary Type banded variable name as Salary_B Click Make Cut points Choose equal width intervals Type 20000 in First cut point location Type Number of Cut points as 4 Click on width you will see width is automatically calculated Click Apply Click make labels Click OK
25
Count
Use S3data.sav Click Transform Count Type Target variable as TopBox_Count Type in Target label as Count of attributes rated 5 Select variables q16r1 q16r18 Click define values Type 5 click Add Click Continue Click OK
26
Rank Cases
Rank cases creates new variables containing ranks, normal & savage scores and percentile values for numeric variables One can rank in ascending or descending order Ranking can be within sub-groups Open Employee data.sav Click Transform Rank cases Choose variable Current Salary Click assign Rank 1 to Largest value Select By: Employee category Click rank Type select Rank click Continue Click Ties click sequential ranks to unique values Click OK
27
Automatic Recode
Open Employee data.sav Auto Code Current salary This will tell us how many different values of Salary exist Click Transform Automatic Recode Select variable Current Salary Click starting from lowest value Type new name Click add new name Click OK
28
Command Syntax
29
Syntax File
A syntax file is simply a text file that contains Commands While it is possible to Open a syntax window and Type in the commands, it is easier to let the SPSS build your syntax file There are three methods of doing this - Pasting Command Syntax from dialog boxes - Copying Syntax from the output log - Copying Syntax from journal file For on line help click help select Command Syntax Reference It will give Reference guide for all syntax and the options available
30
Each Command must begin on a new line and end with a period (.). Most sub commands are separated by slash ( / ). The slash before the first sub command on a command is usually optional Variable names must be spelled out fully Text included within apostrophes or quotation marks must be contained on a single line Each line of Command syntax cannot exceed 80 Characters A period (.) must be used to indicate decimals regardless of windows regional settings Variable names should not end with period (.)
31
Command syntax is case insensitive Three letter abbreviations can be used for many commands Can use any number of lines to specify single command You can add space or break lines at almost any point where a single blank is allowed, such as around slashes, parentheses, arithmetic operators or between variable names For Example FREQUENCIES VARIABLES=JOBCAT GENDER /PERCENTILES=25 50 75 /BARCHART.
And freq var=jobcat gender /percent=25 50 75 /bar. Are both acceptable alternatives that generate the same results.
32
To paste the syntax open the dialog box Make selections Click Paste For Example S3data.sav Click Analyze Descriptive Select Variables Click Options Select statistics required Click continue Click paste
33
You can build a syntax file by copying from the log that appears in the viewer Before running the analysis from Edit menu choose Options Click Viewer Tab Select Display commands in the log As you run analysis commands for your dialog box selections are recorded in the log
34
By default, all commands executed during a session are recorded in spss.jnl file You can edit the journal file and save it as a syntax file Journal file is a text file and can be edited like any other text file Since error messages and warnings are also recorded in journal file along with command syntax it should be edited Save edited journal file with different file name because journal file is automatically overwritten for each session To locate file Open File Other c:\windows\temp\spss.jnl
35
Syntax Files
Syntax files are saved with *.sps extention To open a syntax file click File Open Syntax Select a *.sps file To run the commands from syntax file To run the command Select the commands and click Run button (the right pointing triangle
36
Regression Analysis
37
What is it ?
Regression analysis is a method used to develop an equation relating a single metric criterion variable with a set of predictor variables
38
What does it do ?
Estimates overall relationships between the criterion variable and set of predictor variables Estimates magnitude, relative importance, and statistical significance of the contribution of each of the predictors to the relationship Derives a predictive equation for the criterion variable on the basis of known values of the predictors
39
To forecast sales, market share, profitability. To model choice, buying patterns, impact of marketing programs. To estimate elasticities, response functions.
40
Identify clearly the criterion ( dependent ) variable to be explained Isolate and identify an exhaustive list of predictor ( independent ) variables that may explain the criterion variable Screen and carefully select the predictor variables and decide which to retain Estimate the parameters of the regression equation Perform diagnostic checks on the output of analysis to test for the adequacy of the model proposed by researcher If model appears to be okay, understand and interpret the analysis Monitor, maintain and update the model.
41
Assumptions
The predictor/independent variables are are metric and the criterion/dependent variable is metric, continuous and unbounded All variables are measured without error All independent variables have non-zero variance There is no exact linear relationship (perfect multi-collinearity ) between two or more independent variables At each set of values for the K-independent variables, the error terms are normally distributed, with mean zero and constant variance ( i.e. homoscedasticity ). Each independent variable is uncorrelated with the error term The error terms for different observations are uncorrelated ( e.g. no autocorrelation
42
Example
A Market Researcher is interested in consumers attitude towards nutritional additives in ready-to-eat cereals. A set of written concept description are prepared in which two characteristics are varied - Amount of Protein - percentage of minimum daily requirement of Vitamin D Researcher obtains consumers interval-scaled evaluations of ten concept descriptions on a preference rating scale of 1-dislike extremely to 9-like extremely well Multiple regression analysis is done by taking Preference rating as criterion (dependent ) variable and Amount of Protein, Percentage of Vitamin as predictor (independent ) variables
43
Data collected
44
SPSS Output
45
SPSS output
46
SPSS output
47
Importance of predictors
48
Hetroscedasticity - Large variance associated with certain groups while conducting a cross-sectional study e.g. larger variance in consumption is associated with heavy users than light users. Autocorrelation - Correlated errors from adjacent time-periods ( Panel data), or cross-sectional observations ( such as errors related due to different ethnic groups in a market segmentation study), are called autocorrelation or serial correlation. DurbinWatson statistic is computed using residuals and compared to table value which gives two limits - the number of independent variables and number of observations.
49
Partial regression coefficient (b) beta coefficient - b when all variables are transformed into standard unit variates ( zero mean and unit standard deviation) multiple correlation (R) Coefficient of multiple determination (R^2) Adjusted (R^2) - deflation of R-Square by taking into account the number of predictor variablesand the number of observations in the analysis Part Correlation - Correlation coefficient between Y and the residual that remain after X has been regressed on all other predictors. Square of part correlation represents the absolute increase in R-Suare
50
Some terms
Partial correlation - Simple correlation between two residuals, the residual of Y and the residual of X, the residuals having been obtained by regressing both Y and X separately on all the remaining predictor variables Standard Error - is simply the standard deviation of actual Y values from corresponding values fitted by least squares regression analysis. - It is conditional standard deviation, measured from the regression line rather than sample mean. Tolerance - measure of multi-collinearity and is 1-Rj^2 where Rj is the multiple correlation using variable j as criterion and other independent variables as predictors Variance Inflation Factor (VIF) - Reciprocal of tolerance
51