0 evaluări0% au considerat acest document util (0 voturi)

122 vizualizări19 paginiAug 19, 2010

© Attribution Non-Commercial (BY-NC)

DOC, PDF, TXT sau citiți online pe Scribd

Attribution Non-Commercial (BY-NC)

0 evaluări0% au considerat acest document util (0 voturi)

122 vizualizări19 paginiAttribution Non-Commercial (BY-NC)

Sunteți pe pagina 1din 19

In this tutorial, you will learn simple research using MS excel Data Analysis – Descriptive Statistics

and Pivot Table to analyze your data from questionnaire survey. You may download the spreadsheet example

of this tutorial here.

Data Coding

Contingency Table

We have the following data from Parks visitors (extracted from several Parks study, the actual data

may include hundreds of families and hundreds of variables to more than 10 parks in the city). Here

is the meaning of each variable

• Family is the observation number of the questionnaire. One questionnaire was filled by for

one family, not for individual.

• Time is a quantitative data type, measured in minutes. It is the activity time of the family in

the park

• Mode is a nominal data type consist of four choice of mode to go to park:

1. walk,

2. car,

3. cycling,

4. bus.

The choice is mutually exclusive, that is only one mode for one family.

1. sport,

2. picnic,

3. reading,

4. walk (including with the dog),

5. meditation,

6. jog.

The choices are multiple choices, that one family may have several activities in the

park.

o -2 = Very dissatisfied,

o - 1 = dissatisfied,

o 0 = indifference,

o 1 = satisfied,

o 2 = Very satisfied.

• Playground is a nominal scale (Yes or No) about the existence of children playground

1 30 1 1, 2, 3 0 N

2 30 3 4,6 1 Y

3 60 2 1, 2 2 Y

4 45 1 5 -1 N

5 30 1 6 1 N

6 60 2 2 2 Y

7 30 3 4 1 N

8 45 2 3, 4 -1 N

9 15 1 6 1 Y

10 60 2 2 2 Y

11 180 4 1, 2, 3, 4 2 Y

12 120 2 1,2,4 2 Y

With these very limited data, we want to know the following research questions :

In general, what is the level of visitors' satisfaction toward parks in the city?

2. What mode most people use to go to park?

3. Is the existence of children playground in the park may gain higher visitors satisfaction ?

4. Is there any relationship between people activity and their activity time of the family in the

park

5. Is there any relationship between activity time and their mode to go to park?

How are you going to do to answer those research questions? Of course, with this simple data, you

can even use calculator to analyze the data manually. However, we will use MS Excel to

demonstrate what kind of tools is available in MS Excel for your research.

We assume the data above is valid sample (though they are actually too little to be real sample) for

the shake of demonstration purposes only.

We are going to use Microsoft Excel. Firstly, we need to transform the data into another format that

Microsoft Excel could understand, that is to convert them into numbers one for each of value. Only

variable Activity and Playground need modifications while the others are already in the correct

format. Variable Activities need to be converted to binary format (1 and 0, 1 = chose that activity, 0

= did not do that activity) for each value. Thus, we will convert it into 6 columns according to the

number of values in the variable Activity . Variable Playground need to be converted to binary (1 =

yes, and 0 = no) so that we can use Data Analysis tool in MS excel that can only accept numerical

data. You may download the spreadsheet example of this tutorial here.

Activity

Family Time Mode 1 2 3 4 5 6 Satisfaction Playground

1 30 1 1 1 1 0 0 0 0 0

2 30 3 0 0 0 1 0 1 1 1

3 60 2 1 1 0 0 0 0 2 1

4 45 1 0 0 0 0 1 0 -1 0

5 30 1 0 0 0 0 0 1 1 0

6 60 2 0 1 0 0 0 0 2 1

7 30 3 0 0 0 1 0 0 1 0

8 45 2 0 0 1 1 0 0 -1 0

9 15 1 0 0 0 0 0 1 1 1

10 60 2 0 1 0 0 0 0 2 1

11 180 4 1 1 1 1 0 0 2 1

12 120 2 1 1 0 1 0 0 2 1

Microsoft Excel has excellent Data Analysis menu and one of them contain Descriptive statistics. To

use the data analysis menu (suppose your MS excel do not have it), you need to install the Add-ins:

Go to menu Tools – Add Ins – check the Analysis Tool pack and then press OK button. Next

time when you open the Tools menu again, you will see Data Analysis in the bottom of Tools menu.

• Click menu Tools – Data Analysis and you will see Data Analysis dialog. Scroll down and

you will see Descriptive Statistics. Select it and click OK button.

You will get Descriptive Statistics dialog form. In the Input range, select range of your

data that you want to be analyzed. Include the label in the first row and check that check

box. Check also the Summary statistics check box and then click OK button.

The result of Descriptive Statistics tool, after a few formatting, is shown in the figure below. You may

download the spreadsheet example of this tutorial here.

There are many results of data analysis, but do not be confused with all those numbers. Be focus

only on your research questions . (That is why you need to formulate your research questions

before to do any research as an important step). Which one of your research questions could be

answered using the results of your analysis?

From the figure above, you see that the mean satisfaction is 1 (from the scale - 2 to 2 ). You may

conclude that people are quite satisfied with the park conditions. This answers the first research

question of “In general, what is the level of visitors' satisfaction toward parks in the city?”

You may also see that the sum of values in each activity may produce frequency that you may

graph it further.

Relative

Activity frequency frequency

(1) sport 4 18.2%

(2) picnic 6 27.3%

(3) reading 3 13.6%

(4) walk (including with the dog) 5 22.7%

(5) meditation 1 4.5%

(6) jog 3 13.6%

Sum 22 100.0%

We may conclude that the majority of the family who visit the parks are doing picnic or walking with

their dog. This answers the second research question of “What activity most people do in the park?”

From the Mode of Mode, you may conclude that majority of family who visit the parks are using car.

This answers the third research question of “What mode most people use to go to park?”

Contingency Table

Using the data that we have formatted, we want to answer the next research questions:

4. Is the existence of children playground in the park may gain higher visitors satisfaction ?

5. Is there any relationship between people activity and their activity time of the family in the park?

6. Is there any relationship between activity time and their mode to go to park?

Because those research problems are about relationship between two variables, we need to do what

is called Cross Tabulation. Cross Tabulation (or CrossTab for short) is a frequency table between

two or more variables. For readability, it is normally involve less than 4 variables. Cross Tabulation

table has many name for different people. Actually they refer to the same thing. Some statistician

called it Contingency Table while MS excel call it Pivot Table .

In Microsoft Excel, CrossTabs can be automated using Pivot Table. You may use either Pivot Table

icon in the toolbar or using MS Excel Menu Data – Pivot Table and Pivot Chart Report .

When you click the toolbar or menu, Pivot Table wizard will pop up, click Next

In the step 2 of the wizard, you highlight the data including the label of the data in the top as

shown in the following figure

To answer the relationship between variable Playground and Satisfaction , drag and drop the name

of the variables on the right into the diagram. Put Satisfaction button in the row and Playground

button in the column and make another drop to put Satisfaction once again to the Data . It will

appear as Sum of Satisfaction . After that, double click the last button ( Sum of Satisfaction ) and

Pivot Table Field dialog will appear. Select summarized by Count and then click the OK button

twice.

When you go back to the Step 3 of Pivot table wizard, click Finish button.

MS excel will automatically create the Cross Tabulation table. Personally, I don't like to use it

directly because it may contain very long formula. Thus, I prefer to highlight this Pivot Table and

use Menu Edit Copy (CTRL-C).

Then select another cell, and use menu Edit - Paste Special . Click Values options and click OK

button

After we reformat, we need to count the Independent values of the table. We need to do so because

we want to know whether variable Playground has relationship with variable Satisfaction , or not.

We will perform some simple test, called Chi-square test . If the result of the chi-square test

shows that, variable Playground is independent from variable Satisfaction, and then we cannot

conclude any relation between the two variables. Otherwise, we can conclude the relationship.

Sound easy, isn't it?

Don't bother with the formula if you don't understand notation . It is just a symbol for

summation. (click here to learn more about this sigma notation) The meaning is like this: to get the

cell of independent table, we need to multiply the total of rows with the total of columns then divide

this with the total of all data.

Example, for Satisfaction = 1 and Playground = 1, we have data of 2 respondents. The total of rows

is 7 and total of columns is 4, the total of all data is 12. Then, the independent value for that cell is

7*4/12 = 2.333. Do the same thing for all cells in the table we get the table of independent values.

This table has meaning that if variable Playground is 100% independent from variable Satisfaction ,

then the contents of the cells must be equal to these values.

To make sure that variable Playground has relationship (not independent) from variable Satisfaction

, we need to put the degree of independent as small as possible, say less than 5%. The number of

5% is called the error or mistake that may happen by chance (who can avoid that we get it out of

luck). Some people called it significant level .

The problem now is how to get the index to indicate degree of independent? The mathematician are

very smart, they invented probability to represent that “degree of independent”. To determine that

probability, we need to compute the difference between the observe values (from the Pivot table)

and the expected values (from the Independent table), square this difference, and then divide with

the expected value (to get the same unit back) and we sum for all entries of the table. In short,

To get the probability, we need to compute degree of freedom (df), that is

The probability can be obtained using MS Excel function = CHIDIST( , ). Now the Chi-

square test is sound like this:

If probability is lower than 0.05, the two variables have relationship, otherwise we cannot conclude

any relationship between the two variables in the contingency table.

As shown in the figure above, the value of probability is 0.048, thus we may conclude that there is a

relationship between existences of playground with satisfaction level of parks' visitors. This answers

the research question of “Is the existence of children playground in the park may gain higher

visitors satisfaction ?” positively.

Note that if follow old statistical book, it will ask you to compare the Chi-square value ( ) with

the Chi-square from the table. You can get Chi-square value from the table in MS Excel using

function =CHIINV(probability, degree of freedom). If you put the probability 0.05 and 3 degree of

freedom, you will get 7.8147 (look at your Chi square table if you have it, whether it is the same

number). Because you have put probability here, you should use the value of the Chi-square to

compare. The chi-square test will sound like this

If your computed chi-square is larger than the Chi-square value from the table, the

two variables has relationship, otherwise we cannot conclude any relationship between

the two variables in the contingency table.

Another note that MS excel also provide function =CHITEST (actual-range, expected range) to

simplify the computation of probability. In this case, you don't need to create the third table to

compute the value of Chi-square. The result is directly probability that you can compare with the

value of significant level (i.e. 0.05).

For the next research questions, we can do the same steps as above in the new worksheet.

To get the Pivot table of Activity and activity time , drag and drop the time button in the column

and all activities (1 to 6) button in the data . Let it be in the sum field because our activity data is

binary.

Following the same steps as above, we gain the probability that variable Activity is independent

from Time is 46.2%. Since it is larger than 5% as required in the Chi-square test, we conclude that

there is no relationship between people activity and their activity time of the family in the park. This

answers the research questions of “Is there any relationship between people activity and their

activity time of the family in the park?”

For the last research question, we use variable Time in the Column of Pivot table and count of Mode

in the data diagram. The results is shown in the figure below.

Since the chi-square probability is larger than 0.05, we conclude that there is no relationship

between mode to go to parks with the activity time in the park. This answers negatively to the sixth

research questions of “Is there any relationship between activity time and their mode to go to

park?”

You have seen simple data analysis using MS excel Data Analysis – Descriptive Statistics and Pivot

Table can be used to analyze your data from questionnaire survey.

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.