Sunteți pe pagina 1din 19

Data Analysis from Questionnaires

In this tutorial, you will learn simple research using MS excel Data Analysis – Descriptive Statistics
and Pivot Table to analyze your data from questionnaire survey. You may download the spreadsheet example
of this tutorial here.

Click any topics below to enter the tutorial:

Data Example & Research Questions

Data Coding

Descriptive Statistics Data Analysis

Contingency Table

Data Example & Research Questions


We have the following data from Parks visitors (extracted from several Parks study, the actual data
may include hundreds of families and hundreds of variables to more than 10 parks in the city). Here
is the meaning of each variable

• Family is the observation number of the questionnaire. One questionnaire was filled by for
one family, not for individual.
• Time is a quantitative data type, measured in minutes. It is the activity time of the family in
the park
• Mode is a nominal data type consist of four choice of mode to go to park:
1. walk,
2. car,
3. cycling,
4. bus.

The choice is mutually exclusive, that is only one mode for one family.

• Activity is a nominal data type consist of 6 choices of activity in the park:


1. sport,
2. picnic,
3. reading,
4. walk (including with the dog),
5. meditation,
6. jog.

The choices are multiple choices, that one family may have several activities in the
park.

• Satisfaction is an ordinal scale with 5 values:


o -2 = Very dissatisfied,
o - 1 = dissatisfied,
o 0 = indifference,
o 1 = satisfied,
o 2 = Very satisfied.

It measures family satisfaction toward the park's services.


• Playground is a nominal scale (Yes or No) about the existence of children playground

The data is show in this table

Family Time Mode Activity Satisfaction Playground

1 30 1 1, 2, 3 0 N
2 30 3 4,6 1 Y
3 60 2 1, 2 2 Y
4 45 1 5 -1 N
5 30 1 6 1 N
6 60 2 2 2 Y
7 30 3 4 1 N
8 45 2 3, 4 -1 N
9 15 1 6 1 Y
10 60 2 2 2 Y
11 180 4 1, 2, 3, 4 2 Y
12 120 2 1,2,4 2 Y

With these very limited data, we want to know the following research questions :

In general, what is the level of visitors' satisfaction toward parks in the city?

1. What activity most people do in the park?


2. What mode most people use to go to park?
3. Is the existence of children playground in the park may gain higher visitors satisfaction ?
4. Is there any relationship between people activity and their activity time of the family in the
park
5. Is there any relationship between activity time and their mode to go to park?

How are you going to do to answer those research questions? Of course, with this simple data, you
can even use calculator to analyze the data manually. However, we will use MS Excel to
demonstrate what kind of tools is available in MS Excel for your research.

We assume the data above is valid sample (though they are actually too little to be real sample) for
the shake of demonstration purposes only.

Data Coding from Questionnaires


We are going to use Microsoft Excel. Firstly, we need to transform the data into another format that
Microsoft Excel could understand, that is to convert them into numbers one for each of value. Only
variable Activity and Playground need modifications while the others are already in the correct
format. Variable Activities need to be converted to binary format (1 and 0, 1 = chose that activity, 0
= did not do that activity) for each value. Thus, we will convert it into 6 columns according to the
number of values in the variable Activity . Variable Playground need to be converted to binary (1 =
yes, and 0 = no) so that we can use Data Analysis tool in MS excel that can only accept numerical
data. You may download the spreadsheet example of this tutorial here.

Activity
Family Time Mode 1 2 3 4 5 6 Satisfaction Playground
1 30 1 1 1 1 0 0 0 0 0
2 30 3 0 0 0 1 0 1 1 1
3 60 2 1 1 0 0 0 0 2 1
4 45 1 0 0 0 0 1 0 -1 0
5 30 1 0 0 0 0 0 1 1 0
6 60 2 0 1 0 0 0 0 2 1
7 30 3 0 0 0 1 0 0 1 0
8 45 2 0 0 1 1 0 0 -1 0
9 15 1 0 0 0 0 0 1 1 1
10 60 2 0 1 0 0 0 0 2 1
11 180 4 1 1 1 1 0 0 2 1
12 120 2 1 1 0 1 0 0 2 1

Descriptive Statistics Data Analysis


Microsoft Excel has excellent Data Analysis menu and one of them contain Descriptive statistics. To
use the data analysis menu (suppose your MS excel do not have it), you need to install the Add-ins:
Go to menu Tools – Add Ins – check the Analysis Tool pack and then press OK button. Next
time when you open the Tools menu again, you will see Data Analysis in the bottom of Tools menu.

To use Descriptive statistics tool from the Data Analysis:

• Click menu Tools – Data Analysis and you will see Data Analysis dialog. Scroll down and
you will see Descriptive Statistics. Select it and click OK button.

You will get Descriptive Statistics dialog form. In the Input range, select range of your
data that you want to be analyzed. Include the label in the first row and check that check
box. Check also the Summary statistics check box and then click OK button.
The result of Descriptive Statistics tool, after a few formatting, is shown in the figure below. You may
download the spreadsheet example of this tutorial here.

There are many results of data analysis, but do not be confused with all those numbers. Be focus
only on your research questions . (That is why you need to formulate your research questions
before to do any research as an important step). Which one of your research questions could be
answered using the results of your analysis?

From the figure above, you see that the mean satisfaction is 1 (from the scale - 2 to 2 ). You may
conclude that people are quite satisfied with the park conditions. This answers the first research
question of “In general, what is the level of visitors' satisfaction toward parks in the city?”

You may also see that the sum of values in each activity may produce frequency that you may
graph it further.

Relative
Activity frequency frequency
(1) sport 4 18.2%
(2) picnic 6 27.3%
(3) reading 3 13.6%
(4) walk (including with the dog) 5 22.7%
(5) meditation 1 4.5%
(6) jog 3 13.6%
Sum 22 100.0%

We may conclude that the majority of the family who visit the parks are doing picnic or walking with
their dog. This answers the second research question of “What activity most people do in the park?”

From the Mode of Mode, you may conclude that majority of family who visit the parks are using car.
This answers the third research question of “What mode most people use to go to park?”
Contingency Table
Using the data that we have formatted, we want to answer the next research questions:

4. Is the existence of children playground in the park may gain higher visitors satisfaction ?

5. Is there any relationship between people activity and their activity time of the family in the park?

6. Is there any relationship between activity time and their mode to go to park?

Because those research problems are about relationship between two variables, we need to do what
is called Cross Tabulation. Cross Tabulation (or CrossTab for short) is a frequency table between
two or more variables. For readability, it is normally involve less than 4 variables. Cross Tabulation
table has many name for different people. Actually they refer to the same thing. Some statistician
called it Contingency Table while MS excel call it Pivot Table .

In Microsoft Excel, CrossTabs can be automated using Pivot Table. You may use either Pivot Table
icon in the toolbar or using MS Excel Menu Data – Pivot Table and Pivot Chart Report .

When you click the toolbar or menu, Pivot Table wizard will pop up, click Next
In the step 2 of the wizard, you highlight the data including the label of the data in the top as
shown in the following figure

In step 3 of the Pivot Table Wizard, select Layout button


To answer the relationship between variable Playground and Satisfaction , drag and drop the name
of the variables on the right into the diagram. Put Satisfaction button in the row and Playground
button in the column and make another drop to put Satisfaction once again to the Data . It will
appear as Sum of Satisfaction . After that, double click the last button ( Sum of Satisfaction ) and
Pivot Table Field dialog will appear. Select summarized by Count and then click the OK button
twice.
When you go back to the Step 3 of Pivot table wizard, click Finish button.
MS excel will automatically create the Cross Tabulation table. Personally, I don't like to use it
directly because it may contain very long formula. Thus, I prefer to highlight this Pivot Table and
use Menu Edit Copy (CTRL-C).

Then select another cell, and use menu Edit - Paste Special . Click Values options and click OK
button
After we reformat, we need to count the Independent values of the table. We need to do so because
we want to know whether variable Playground has relationship with variable Satisfaction , or not.
We will perform some simple test, called Chi-square test . If the result of the chi-square test
shows that, variable Playground is independent from variable Satisfaction, and then we cannot
conclude any relation between the two variables. Otherwise, we can conclude the relationship.
Sound easy, isn't it?

To get the independent values, we to compute

Don't bother with the formula if you don't understand notation . It is just a symbol for
summation. (click here to learn more about this sigma notation) The meaning is like this: to get the
cell of independent table, we need to multiply the total of rows with the total of columns then divide
this with the total of all data.
Example, for Satisfaction = 1 and Playground = 1, we have data of 2 respondents. The total of rows
is 7 and total of columns is 4, the total of all data is 12. Then, the independent value for that cell is
7*4/12 = 2.333. Do the same thing for all cells in the table we get the table of independent values.
This table has meaning that if variable Playground is 100% independent from variable Satisfaction ,
then the contents of the cells must be equal to these values.

To make sure that variable Playground has relationship (not independent) from variable Satisfaction
, we need to put the degree of independent as small as possible, say less than 5%. The number of
5% is called the error or mistake that may happen by chance (who can avoid that we get it out of
luck). Some people called it significant level .

The problem now is how to get the index to indicate degree of independent? The mathematician are
very smart, they invented probability to represent that “degree of independent”. To determine that
probability, we need to compute the difference between the observe values (from the Pivot table)
and the expected values (from the Independent table), square this difference, and then divide with
the expected value (to get the same unit back) and we sum for all entries of the table. In short,
To get the probability, we need to compute degree of freedom (df), that is

The probability can be obtained using MS Excel function = CHIDIST( , ). Now the Chi-
square test is sound like this:

If probability is lower than 0.05, the two variables have relationship, otherwise we cannot conclude
any relationship between the two variables in the contingency table.

As shown in the figure above, the value of probability is 0.048, thus we may conclude that there is a
relationship between existences of playground with satisfaction level of parks' visitors. This answers
the research question of “Is the existence of children playground in the park may gain higher
visitors satisfaction ?” positively.

Note that if follow old statistical book, it will ask you to compare the Chi-square value ( ) with
the Chi-square from the table. You can get Chi-square value from the table in MS Excel using
function =CHIINV(probability, degree of freedom). If you put the probability 0.05 and 3 degree of
freedom, you will get 7.8147 (look at your Chi square table if you have it, whether it is the same
number). Because you have put probability here, you should use the value of the Chi-square to
compare. The chi-square test will sound like this

If your computed chi-square is larger than the Chi-square value from the table, the
two variables has relationship, otherwise we cannot conclude any relationship between
the two variables in the contingency table.

Another note that MS excel also provide function =CHITEST (actual-range, expected range) to
simplify the computation of probability. In this case, you don't need to create the third table to
compute the value of Chi-square. The result is directly probability that you can compare with the
value of significant level (i.e. 0.05).

For the next research questions, we can do the same steps as above in the new worksheet.

To get the Pivot table of Activity and activity time , drag and drop the time button in the column
and all activities (1 to 6) button in the data . Let it be in the sum field because our activity data is
binary.
Following the same steps as above, we gain the probability that variable Activity is independent
from Time is 46.2%. Since it is larger than 5% as required in the Chi-square test, we conclude that
there is no relationship between people activity and their activity time of the family in the park. This
answers the research questions of “Is there any relationship between people activity and their
activity time of the family in the park?”
For the last research question, we use variable Time in the Column of Pivot table and count of Mode
in the data diagram. The results is shown in the figure below.
Since the chi-square probability is larger than 0.05, we conclude that there is no relationship
between mode to go to parks with the activity time in the park. This answers negatively to the sixth
research questions of “Is there any relationship between activity time and their mode to go to
park?”

You have seen simple data analysis using MS excel Data Analysis – Descriptive Statistics and Pivot
Table can be used to analyze your data from questionnaire survey.