Documente Academic
Documente Profesional
Documente Cultură
R.Ganesan
Associate Professor (Statistics)
Rajiv Gandhi College of Veterinary & Animal Sciences,
Kurumbapet, Puducherry – 605 009.
Email : ashwinarvind@sify.com
Sl. Page
Title
No. No.
1 Introduction to SPSS
DATA MANAGEMENT USING SPSS
2 Creating new SPSS data file
3 Opening other type files in SPSS
4 Sorting cases
5 Merging files – add cases & add variables
6 Selecting cases
7 Computing new variables
8 Recoding variables
DATA COMPILATION & GRAPHS USING SPSS
9 Ungrouped frequency table
10 Grouped frequency table
Diagrammatic Representation of Data
11
(Histogram, Multiple Bar Diagram, Pie Diagram)
STATISTICAL DATA ANALYSIS USING SPSS
12 Descriptive Statistics
13 Simple Correlation Coefficient
14 Linear Regression
15 T-test for Two Means
16 Paired t-test
17 One way ANOVA
18 Chi-Square test
2
1. INTRODUCTION TO SPSS
a. Introduction
The abbreviation SPSS stands for Statistical Package for the Social Sciences
and is a comprehensive system for analyzing data. This package of programs is available
for both personal and multi-user computers. SPSS package consists of a set of software
tools for data entry, data management, statistical analysis and presentation. SPSS
integrates complex data and file management, statistical analysis and reporting functions.
SPSS can take data from almost any type of file and use them to generate tabulated
reports, charts, and plots of distributions and trends, descriptive statistics, and complex
statistical analyses. It is easy to learn and use, it includes a full range of data management
system and editing tools, it provides in-depth statistical capabilities, it offers complete
plotting, reporting and presentation features.
To invoke SPSS in the windows environment, select the appropriate SPSS icon or
click Start – Programs – SPSS Inc – Statistics 17.0- SPSS Statistics 17.0
There are number of different types of editors (windows) in SPSS, the most important
being the Data Editor window and the Viewer window.
i) Data Editor
The data editor offers a simple and efficient spreadsheet-like facility for entering
data and browsing the working data file.
This window displays the contents of the data file. One can create new data files
or modify existing ones. The Data Editor window opens automatically when an
SPSS session is started. This editor provides two views of the data.
Data view. Displays the actual data values or defined value labels.
Variable view. Displays variable definition information, including defined
variable and value labels, data type etc.
With the Data Editor, the data values can be modified in the Data view in many
ways like change data values; cut, copy and paste data values; add and delete
cases; add and delete variables, change the order of variables.
ii) Viewer
All statistical results (outputs) like tables, charts and analysis results are displayed
in the Viewer. The output can be edited and saved for later use. A Viewer
window opens automatically the first time you run a procedure that generates
output.
3
c. Menu Bar in SPSS
Many of the tasks that are to be performed with SPSS start with menu selections.
Each window has its own menu bar with menu selections appropriate for that
window type. The various main menus under SPSS are
File
Edit
View
Data
Transform
Analyze
Graphs
Utilities
Windows
Help
Analyze and Graphs menus are available on all windows, making it easy to
generate new output without having to switch windows. Most menu selections
open dialog boxes. One can use dialog boxes to select variables and options for
analysis. Since most procedures provide a great deal of flexibility, not all of the
possible choices can be contained in a single dialog box. The main dialog box
usually contains the minimum information required to run a procedure.
Additional specifications are made in sub-dialog boxes. All these above
mentioned options have further sub-options. To see those applications, we simply
move the cursor to a particular option and press, when a drop-down menu will
appear. To cancel a drop-down menu, place the cursor anywhere outside the
option and press the left button.
Most frequently used options are provided in the tool bar as small icons. We can
customize the tool bar by adding and removing icons in the tool bar. Tool tip
facilitates the function of the icon.
4
2. CREATING A NEW SPSS DATA FILE
This section deals with creating a new SPSS data file, saving the data file and
closing, reopening the created data file. Consider the data given below on 3
variables, namely, Age, Sex & BP.
Age 56 42 72 36 63 47 55 49 38 42 68 60
Sex 1 2 1 1 1 2 2 2 2 1 1 1
BP 147 125 160 118 149 128 150 145 115 140 152 155
STEPS
Open SPSS software.
Click on File-New-Data
A blank data screen looking like a spreadsheet (Data Editor Window) appears.
At the bottom are two tabs called “Data View” and “Variable View”.
Click on “Variable View”
5
To enter the value labels in the “Label” column, place the cursor in the cell
under Label column, click (…). Provide the details in the “Value label dialog
box”.
Type Value = 1, Label = Male, Click Add. Similarly,
Type Value = 2, Label = Female, Click Add.
Click OK to close the Value Label dialog box.
Third Variable - BP
Enter BP in the “Name” column. Type, Width and Decimal columns are
filled automatically. The width and decimals can be reset if required.
Enter additional description of the variable BP (like “Systolic Blood Pressure)
under “Label” column.
Value column can be left blank.
Now that the variables have been defined, click on the tab called “Data View” at
the bottom and enter the data given above into the appropriate cells.
Start SPSS
Click File menu – Open – Data – Browse & locate the file – Click Open.
The data file gets loaded in the Data Editor Window
6
3. OPENING OTHER TYPE FILES IN SPSS
We can open other type of data files in SPSS. Here we shall consider opening an
MS-Excel data file in SPSS data editor for statistical analysis.
STEPS
Open SPSS.
Click on File-Open-Data
In the open data dialog box, click Desktop on the left-hand side (if your
excel file is in the desktop)
Choose Excel in the Files of type.
Click on the excel file – Click Open
Opening Excel Data Source Dialog Box appears.
Enable (Tick) Read variables from the first row,
Worksheet : Sheet1.
Now the entire contents of data from excel file will be available in the SPSS data
editor for statistical analysis.
Similarly, text files and database files can also be easily opened in SPSS.
Thus, without entering the data again in SPSS, we can easily bring data from
other type of files to the SPSS data editor.
7
4. SORTING CASES
This is a data management feature available in SPSS. This allows sorting a data
file in an ascending or descending order of values of a particular variable.
STEPS
Open SPSS.
Click on File-Open-Data
In the open data dialog box, choose the data file - Click Open.
Now the data file will be in the sorted order of variable Age.
OUTPUT
Age Bp Age Bp
56 147 36 118
42 125 38 115
72 160 42 125
36 118 42 140
63 149 47 128
47 128 49 145
55 150 55 150
49 145 56 147
38 115 60 155
42 140 63 149
68 152 68 152
60 155 72 160
8
5. MERGING FILES – ADD CASES, ADD VARIABLES
Many times, data on different variables will be collected at different times and
will be stored in different data files. At the time of statistical analysis, we may
need the entire data stored in different files. Merge files feature available in SPSS
facilitates merging different data files.
Here we shall see how to merge two SPSS files – Add cases, Add variables.
9
6. SELECTING CASES
If we need to select a few cases from a data file for analysis (For eg., age > 30),
then Select cases option available in SPSS can be very useful. To select a few
cases from the data file for analysis follow the steps given below.
STEPS
Open SPSS.
Click on File-Open-Data
In the open data dialog box, Choose the data file – Click Open.
(To get back all cases, click Data menu – select cases – All cases – OK)
10
7. COMPUTING NEW VARIABLES
If there is a need to compute a new variable using the data on the available
variables, then Compute option will be very handy. Let us compute a new
variable ‘tot’ (which is the total of 5 subject marks) using the statistical function
“sum”.
STEPS
Open SPSS.
Click on File-Open-Data
In the open data dialog box, Choose the data file– Click Open.
(Assume that marks data is available on 5 subjects under the variable names,
‘lang’, ’eng’, ’maths’, ’sci’, ’social’ )
In the same manner, we can also compute the average of the 5 marks by
creating a new variable ave = tot / 5
11
8. RECODING VARIABLES
The “Recode into Different Variables” option in SPSS allows reassigning the
values of existing variables into new values for a new variable. For example, you
could collapse salaries into a new variable containing salary-range categories.
Here we shall consider, converting the marks in to grade. Recode the variable
MATHSMARK in to a new variable, MATHSGRADE as follows
Up to 49 – F; 50-59 – B; 60-74 – A; 75 and above - O
STEPS
Open SPSS.
Click on File-Open-Data
In the open data dialog box, Choose the data file – Click Open.
Now click Transform – Recode in to different variable
The Recode into different variable dialog box opens
Choose the input variable as MATHSMARK
Type MATHSGRADE as the output variable name and Click change
button below.
Now click the “old and new values…” button
The Recode into different variable – old and new values dialog box opens
Tick the Output Variables are Strings
Choose Range, Lowest through value and type 49
Type F (to denote F grade) as the new value and click Add
Choose Range and type 50 and 59 in the two boxes
Type B (to denote B grade) as the new value and click Add
Choose Range and type 60 and 74 in the two boxes
Type A (to denote A grade) as the new value and click Add
Choose Range, value through Highest and type 75
Type O (to denote O grade) as the new value and click Add
Click Continue and OK
We can now see the original variable MATHSMARK and new coded variable
MATHSGRADE in the data file.
12
9. UNGROUPED FREQUENCY TABLE
The raw data is usually very large and it is very difficult to understand the
data. In order to understand the data better, data compilation procedures (One-
way frequency table, two-way frequency table) can be followed to reduce the
complexity of the voluminous data and also to bring out the information hidden in
the raw data. Consider the raw data given below on the number of eosinophils
encountered in 100 WBC in 67 smears. We shall compile this data to a one-way
frequency table.
eosinophils encountered in 100 WBC in 67 smears
0 1 0 1 2 0 3 2 1 4 1 2
1 1 0 1 2 0 1 2 2 1 2 1
3 2 0 0 0 1 1 0 0 4 1 0
1 2 1 5 1 2 1 4 1 6 1 2
1 2 3 2 2 2 2 3 2 3 2 5
2 3 2 3 5 6 5
STEPS
Enter the data given above in SPSS.
Now from the menu, choose:
Analyze Descriptive Statistics Frequencies…
Select the variable eosinophils and move it to “Variable(s)” box
Now click “Ok” to display the Univariate frequency table as given below.
eosinopils Frequency Percent Cumulative Percent
0 11 16.4 16.4
1 20 29.9 46.3
2 20 29.9 76.1
3 7 10.4 86.6
4 3 4.5 91.0
5 4 6.0 97.0
6 2 3.0 100.0
Total 67 100.0
13
10. GROUPED FREQUENCY TABLE
STEPS
Enter the data given above in SPSS
From the menu, choose:
Transform Recode into Different Variables…
Create new variables for recoding (Eosinogroup) and by clicking
“Old and New values” tab, give the range (9-9.9, 10-10.9, …) for the
recoded (Output) variables and then click “Ok”.
Now from the menu, choose:
Analyze Descriptive Statistics Frequencies…
Select the recoded variable (Eosinogroup) and move it to
“Variable(s)” box
Now click “Ok” to display the Grouped frequency table as given
below.
14
11. DIAGRAMMATIC REPRESENTATION OF DATA
The following table shows the distribution of different types of leprosy cases.
15
B. MULTIPLE BAR DIAGRAM
The following table shows the type of leprosy, sex wise. We shall convert
this data to a multiple bar diagram.
Tuberculoid 77 74 151
Lepromatous 35 33 68
Indeterminate 10 8 18
Borderline 7 5 12
STEPS
Enter the data given above in SPSS and from the menu, choose:
Graphs Legacy Dialogs Bars…
Select the “Clustered” icon and the Choose “Values of individual
cases” option in the given “Data in Chart Are” option from the “Bar
Charts” dialogue box that appears and Click “Define”.
Select the variables Male, Female and move it to “Bars Represent”
box.
Choose ‘Variable’ under Category Labels, select the variable “Type”
and move it to “Variable” box.
Now, click “Ok” to generate the Multiple Bar diagram as given below.
16
C. PIE CHART
The following table shows the percentage of people died due to various
causes.
% of death 36 29 11 24
STEPS
Enter the data given above in SPSS and from the menu, choose:
Graphs Legacy Dialogs Pie…
Choose “Values of individual cases” option in the given “Data in
Chart Are” option from the “Pie Charts” dialogue box that appears and
Click “Define”.
Select the variable “Percent” and move it to “Slices Represent” box.
Choose ‘variables’ for ‘Slice Lebels’ and Select the variable,
‘Diseases’ and move it to “Variable” box.
Now, click “Ok” to generate the Pie Chart as given below.
PIE DIAGRAM
17
12. DESCRIPTIVE STATISTICS
STEPS
18
Select the options “Mean, Median, Mode, Standard Deviation,
Minimum, Maximum” from the dialogue box that appears and click
“Continue” tab.
Now click “Ok” to get the output given below.
Statistics
Hb
N 24
Mean 12.38
Median 12.55
Mode 13.00
Minimum 10.00
Maximum 14.60
Hb(gm%) 6.2 12.2 6.4 7.3 13 8.4 9.9 7.2 10.3 9.7
PCV 18 37 21.4 23 36 28 33 24 32 30
STEPS
Enter the data given above in SPSS
From the menu, choose:
Analyze Correlate Bivariate…
Select the variables Hb and PCV and move it to “Variable(s)” box.
Choose the option “Pearson”.
19
Now click “OK” to generate the Pearson’s Correlation coefficient
between the entered variables.
OUTPUT
Correlations
Hb PCV
Hb Pearson Correlation 1 .967**
Sig. (2-tailed) .000
N 10 10
PCV Pearson Correlation .967** 1
Sig. (2-tailed) .000
N 10 10
**. Correlation is significant at the 0.01 level (2-tailed).
INTERPRETATION
The correlation between Hb & PCV is 0.967 which is a very high positive
correlation indicating that higher values of Hb will result in higher values of
PCV and vice-versa.
Marks 56 59 72 70 80 85 89 98
Number of Hours
1 2 3 4 5 6 7 8
studied daily
STEPS
Enter the data given above in SPSS
From the menu, choose:
Analyze Regression Linear…
Select the variable Marks and move it to “Dependent” box.
Select the variable Hours and move it to “Independent(s)” box.
20
Now click “Ok” to fit simple linear regression equation.
OUTPUT
Coefficients
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 49.714 2.114 23.519 .000
hours 5.869 .419 .985 14.021 .000
INTERPRETATION
Students’ t-test is used to test the equality of two population means by taking
independent samples from each of the two populations. We shall use this test
to test whether there is any difference between the Mean age at onset of
symptoms in males and females suffering from lung cancer. In order to test
this, data on age at onset of symptom has been collected from 12 male and 12
female cancer patients as given below.
Males 58 52 50 49 56 52 54 48 41 37 67 70
Females 26 41 57 66 36 55 41 61 53 50 52 37
Hypothesis
21
STEPS
Click “Continue”
OUTPUT
INTERPRETATION
Since the probability value is 0.269 (p > .05), there is no evidence to reject
the null hypothesis and conclude that the mean age at onset of symptoms of
lung cancer does not differ with sex.
22
16. PAIRED t – TEST
Student No. 1 2 3 4 5 6 7 8 9 10
Mark before
34 45 30 23 56 67 50 33 23 46
training
Mark after
54 60 33 26 70 80 60 45 35 68
training
Hypothesis
H0: 1 2 , i.e., There is no difference in mean marks before and after the
training ie., the training programme is not effective
23
H1: μ1≠ μ2 i.e., There is difference in mean marks before and after the
training.
STEPS
OUTPUT
INTERPRETATION
Since the probability value is 0.000 (p < 0.01), we reject the null hypothesis
and conclude that the mean marks obtained after the training programme is
significantly higher than the mean marks obtained before the training
24
programme. Thus the training programme is effective in significantly
increasing the knowledge on health awareness.
Group1 11.6 10.3 10 11.5 11.8 11.8 12.1 10.8 11.9 10.7 11.5
Group2 11.2 8.9 9.2 8.8 8.4 9.1 6.3 9.3 7.8 8.8 10 9.7
Group3 9.8 9.7 11.5 11.6 10.8 9.1 10.5 10 12.4 10.7
Hypothesis
25
ie., the diets are homogenous. There is no difference between the 3
diets.
H1 : Diets are not homogenous.
STEPS
Select the variable “group” and move it to “Factor” box and then click
“Ok”.
OUTPUT
Groups Mean Hb
Group 1 11.22
Group 2 8.93
Group 3 10.61
Total 10.21
ANOVA
Hb
Sum of Mean
Squares df Square F Sig.
Between Groups 29.9 2 14.95 14.27 .000
Within Groups 29.3 28 1.047
Total 59.2 30
26
INTERPRETATION
Since the probability value is 0.000 (p < 0.01), we reject the null hypothesis
and conclude that the mean Hb levels are significantly different among the 3
groups.
Survival
Vaccination Dead Survived Total
Vaccinated 30 70 100
27
Not Vaccinated 55 55 110
Total 85 125 210
Hypothesis
STEPS
Clicking the “Cells” tab, a “Crosstabs: Cell Display” dialogue box will
appear and choose “Expected” option from it and click “Continue”.
OUTPUT
Chi-Square Tests
28
Asymp. Sig.
Value Df (2-sided)
INTERPRETATION
Since the probability value is 0.003 (p < 0.01), we reject the null hypothesis
and conclude that the two attributes, vaccination & deaths due to the disease
are not independent. Therefore, we further conclude that the vaccination is
effective in controlling the disease.
29