Sunteți pe pagina 1din 71

STATISTICS

WITH COMPUTER APPLICATION

By

DR. HIPOLITO P. PALCON


BSE, MA in Mathematics
MA in Measurement & Evaluation
Doctor of Philosophy in Sociology & Anthropology
(UP Diliman)
Cell : 0921-4603232

A MODULE
For Exclusive Use of Graduate Students

1
TABLE OF CONTENTS

Title Page

Module 1. Introduction to Basic Ideas in Statistics

Lesson 1. Statistic Defined


Lesson 2. Measurement of Variable

Module 2. Description Statistics

Lesson 1. Statistical Notation


Lesson 2. Frequency Distribution
Lesson 3. Measures of location or Central Tendency
Lesson 4. Measures of Dispersion or Variability
Lesson 5. Measures of Skewness and Kurtosis

Module 3. Inferential Statistics

Lesson 1. Testing Hypothesis


Lesson 2. Test of Relations: Predictions and Correlations
Lesson 3. Test of Independence and Proportion:
Analysis of Frequencies
Lesson 4. Test of Significant Differences between Means:
t-test and F-test

Module 4. Analysis of Computer Outputs

Lesson 1. Preparing the Program for Computer Encoding


Lesson 2. Interpreting Results of Computer Outputs

Bibliography
Appendix

2
Module 1. Definition and Function of Statistics

Objectives : At the end of the lesson, students are expected to:

1. define what statistic/s is.

2. Differentiate population from sample; use the different

sampling procedures and identify the best method of determining the

sample size

3. Enumerate the different kinds of variable – their characteristics and

measures.

Lesson 1. Statistic Defined

Statistics in plural from refers to any kind of data -both qualitative and quantitative

data; in singular form, it is a branch of knowledge which deals with the processes collection,

presentation, analysis and interpretation of data obtained by the conduct of survey and

experiments. You will study statistics as a branch of science methodology. As such, its

essential purpose is to describe and draw inferences about the numerical properties of

population from a given sample of that population.

Collection. Data can be collected with the use of an interview - formal or casual. A

formal interview entails preparation of guide questions and/ or approved schedule for the

interviewees. Casual interview is done informally without guide questions. Data can also be

collected with the use of survey questionnaire, standard test or researcher-made tests, use of

documents, and through the use of various techniques of observations.

3
Presentation of Data. Data can be presented in textual, tabular and graphical forms.

For example, a researcher was able to gather the following data on the expenditures at ABC

Company for the past 5 years: 2000, P2,416,025; 1999,

P2,117,680.75; 1998, P1,986,5921; 1997, P1,876,458; and P974,697 in 1996, These can be

presented in textual, tabular and graphic forms.

Textual

For the past five years, ABC Company has a total expenditure of P2,416,025 in 2010;

P2,117,680.75 in 2011; P1,986,592 in 2012; P1,876,458 in 2013 ; and P974,6974 in 2014.

Tabular
Table 1. Total Expenditures
of ABC Co. from 2010- 2014
______________________________
Year Expenditure
______________________________
2010 P2,416,025.00
2011 2,117,680.75
2012 1,986,592.00
2013 1,876,458.00
2014 974,697.00
______________________________

Graphical

a. Polygon (line)

b. Histogram (bar)

c. Ogive (cumulative < or > )

4
Note that statistical data are frequently arranged and presented in the form

of tables. These tables are designed to enable the readers to grasp with minimal effort the

information intended to convey. In constructing tables for insertion in term papers, theses, or

manuscripts, the following points should be kept in mind:

1. every table should be self-explanatory

2. the title should be precise, stating clearly what the table is all about

3. column of numbers should be appropriately labeled, and arranged in a logical

sequence

4. the information contained in a table may be partitioned by the insertion of

horizontal and/ or vertical lines

5. tables should be appropriately numbered, should be inserted in the text close to

where they are first mentioned

Graphic representation is often of great help in enabling us to comprehend

Features of frequency distribution allow us to make comparisons in mathematical form,

enable us to think about a problem in visual terms as a geometric image of a set of data. One

is histogram which is presented in the form of bars. Another is a polygon which is in the form

of lines. Using the histogram or polygon, the reader will easily visualize intervals, and

comparisons can easily be made important in the analysis and interpretation of data. Questions

are answered by graphics such as: What is the increase….improvement….? Later, you will

learn how to make tables of statistical results.

5
Activity 1.

A. What statistical data do you have in your office or company?

Enumerate at least five of them. Collect one type of data and present this using any

form of presentation appropriate to the kind of data.

B. The following are a firm’s quarterly sales in thousands of pesos in each of the three major

markets, A, B and C.

Market
A B C
First Quarter 225 110 350
Second Quarter 175 180 290
Third Quarter 120 100 425
Fourth Quarter 210 220 510

Present these data in tabular and graphical forms.

Lesson 2. Population and Sample

In every language, the term population refers to groups or aggregates of people. In

statistics, the term generally refers to defined groups or aggregates of people, animals, objects,

materials, measurements, or events of any kind. The statistician’s concern is with properties

which are descriptive of the group. The entire group is called population. For example, we

refer to a population of children with IQ of 90 and above; factories involved in the production

of t-shirtst; rank-and-file employees in the government.

In research, the use of the entire population is not practical. It is time-consuming and

expensive, thus, we resort to studying a sample of this population. In other words, we only

select a small group for actual testing called sample. This sample is small but is

representative of the population. By representative, we mean that the sample should bear the

characteristics, whatever characteristics or descriptions the population has. For example, if

the population is characterized to be 5’6” tall and above, both male and female, aggressive,

6
and 50 to 65 years of age—the selected sample must contain all those characteristics.

Selecting the members that consist the sample employs one of the following sampling

techniques.

Probability Sampling

This technique means that every member of the population has the chance to be

selected or chosen without any influence from one member to another. Sampling procedures

such as the following are used:

Random Sampling. This sampling technique may use the following procedures:

a) lottery - names of all members are put inside a box. The researcher just

pick the appropriate members that will make up the sample

b) use of the Table of Random Numbers - The researcher may want to select

say, 100 students from a population of 8,500 students, any four-digit

members picked from the table from 0001 to 8,500 will comprise the

sample.

c) Systematic Random Sampling - a procedure which uses mathematical

procedures as basis for choosing the sample. Say, if 10 will be chosen from

a population of 100, one may use this procedure:

100 ÷ 10 = 10

From the result, get every 10th member. The sample of 10 are: 10, 20, 30,

40, 50, 60, 70, 80, 90, and 100. The procedure is to provide the population

size by the desired sample size to get the interval.

7
d) Stratified Random Sampling - is used to pick sample members when the

researcher has reason to believe that the population is composed of distinct

sub-groups or strata. These subgroups or strata are characteristics or

variables of the population which may influence differences in the results

of the study. For instance, a certain Insurance Company decides to study

the attitude of its clients on the benefits given by the company. The

company decides to pick a sample of 200 from 2,000 clients, but the clients

vary from one other in terms of age and monthly or yearly premium paid. It

is appropriate to subdivide the clients into subgroups or strata and then for

each subgroup, simple random sampling procedure can be applied.

Population Sample

Subgroup Size Size


(age group) N n
(10% of N)
_____________________________________
12 yrs & below 156 16
13-18 178 18
19-23 235 23
24-29 386 39
30-39 420 42
40 and above. 620 62
_____________________________________
TOTAL 2,000 200

Other sampling procedures may be applied such as cluster sampling as the successive

sampling of units. The same procedures as stratified sampling are used depending upon the

characteristics of the population and the purpose of the study. Finally, in all these procedures,

the researcher uses the simple random sampling after all subgroups and clusters or units are

made.

8
Non-probability Sampling

This technique does not use random sampling. It is weak, yet for special purpose, it

can be used with care in selecting samples by “replicating studies with different sample”

(Kerlinger, 1986:119).

a) Quota Sampling – uses knowledge of strata of the population such as sex, race,

region, and so on as basis for selecting the sample members. Each of the strata is

unique carrying the same characteristics – different from the others

b) Purposive Sampling – is characterized by the use of judgment and a deliberate

effort to obtain representative samples by including typical groups in the sample

(i.e. company manager, presidents, mentally – retarded children) presumably

because there are only few members.

c) Accident Sampling – the weakest form of sampling which considers available

samples at hand because of convenience.

Sample Size

Every researcher may ask himself a question like, “What is the right number of sample size?”

A rough-and-ready rule is “Use as large samples as possible.” Another question that must be

asked is. How much error is likely to be calculated given the sample size? Below is a figure

that shows the relationship between sample size and error, the deviation from population

values. The curve tells, the smaller the sample the larger the errors, and the larger the sample

the smaller the error (Kerlinger; 1986)

9
Large

Error

Small

Small Large
Size of Sample

Selecting a Representative Sample

A good sample must be as nearly representative of the entire population as possible

and ideally it must provide the whole of the information about the population from which the

sample has been drawn. A sample to be representative of the population should consider two

important aspects:

Population Characteristics

Adequacy of sample size

The first can be answered by sampling techniques. The second is in choosing the sample size.

The research should consider the following observations in sampling (Srivastava; 1994 : 52-

53).

1. The larger the sample, the smaller the magnitude or sampling error.

2. Survey–type studies probably should have large samples than needed in


experimental studies.

3. When sample groups are to be subdivided into smaller groups, the researcher
should select large enough sample so that the subgroups are of adequate size for
the purpose.

10
4. Subject availability and cost factors are legitimate consideration in determining
sample size.

5. In survey-type studies, usually 10-20 percent is recommended for population size


not less than 100.

After subgroups according to population characteristics have been made,

the following formula (Sloven, 1960) may be used considering a predetermined sampling

error (.05 or .01):

N
n= --------------- where
2

1 + Ne

n = Sampling
N = Population
e = Sampling error

Activity 2.

A. the following are population + sample sizes. Determine the members of

sampling using the indicated sampling procedures:

N n for a n for b
1. 500 workers of GSIS ___ _____

2. 3,560 students of ABC School ___ _____

3. 10,000 factory workers of ABC Co. ___ _____

4. 5,420 farmers of Nueva Ecija ___ _____

5 1,275 members of Women Federation ___ _____

Sampling Procedures:
a. using systematic sampling

b. using the Table of Random Numbers

B. From the above data, how would you classify them into subgroups?

C. Determine the sample size using Sloven’s formula.

11
Lesson 3. Measurement of Variables

A variable refers to a property whereby the members of a group or set differ from one

another . Members of a group may differ in sex, age, performance, height. These are variables.

Variables are labeled with the use of numbers. These values are called variates. For example,

Sex Value

Male 2

Female 1

Age Value

6 & below (Pre-school age) 1

7-12 (Elem age) 2

13-16 (High School age) 3

These values are important in statistics because these correspond to the scores that are

measured.

Classification of Variables

Variable can be classified as dependent and independent variables. Consider the expression y

= f (x). Here, the value of y is dependent on the value of x.

Therefore Y > dependent variable


X > independent variable

Let’s consider two variables with functional relationship such as age and performance. Here,

performance is dependent on age. Therefore, age is the independent variable and performance

is the dependent variable. Give your own examples of pairs of variables which have functional

relationships.

12
Another classification is continuous and discrete variable. A continuous variable may

take any value within a defined range of values. Example of continuous variables are height

and weight. A discrete variable can take specific value only. For example, the size of a family,

value in rolling a die and frequencies responding to a questionnaire.

Another classification of variables is important to statistician. This classification is

based on differences in the type of information which different operations of

classification or measurement yield. They are the following:

Variable Statements made of


Nominal equality or difference
Ordinal greater than or less than
(rank ordering)
Interval with an arbitrarily defined zero point
Ratio With true zero point

Examples:

Color and brand of cars are nominal variables because you can only make statement of

“same brand, and different color / brand. You cannot say, blue is greater than red.

Height is ordinal because people in the group can be ranked from lowest to tallest’.

Temperature and age are interval variables. They don’t have true zero point. All other

variables that have true zero point such as achievement, length and weight are ratio

variables.

Statistical methods exist for the analysis of data composed of nominal, ordinal,

interval, and ratio variables. Procedures for statistical test used are important to decide on the

kind of measurement the variable has. Variables are the stuff of which statistics is made. You

13
will understand better the variables when you’re studying inferential statistics – when

deciding what statistics should be used to test interrelationships of variables.

Measuring Variables

Some variables are easy to measure. For example, age in years. So we say that the two

samples are 15 and 12 years of age. In research, there are variables which need to be described

or defined by looking into some indicators before one can measure it. Consider, for instance,

productivity. What indicators will you use to measure productivity? or how will you define

productivity? You may consider productivity in terms of the number of boxes/items sold

during the day. In this case, you can measure it using the following quantitative label/code

and description:

No. of Items Value/Score Description

1–5 1 Poor
6 – 10 2 Fair
11 – 15 3 Good
16 – 20 4 Very Good
21 and Above 5 Excellent

Activity 3

A. Identify what type of variables are the following – continuous or discrete?

1. age
2. color of the eyes
3. number of residents in a community
4. number of employees
5. type of building

14
B. Identify whether the following variables are nominal, ordinal, interval and

ratio according to how you measure each one:

1. Race 6. Efficiency
2. Religion 7. Library collection
3. intelligence Quotients 8. Managerial skill
4. productivity 9. attitude
5. performance 10. Weight

C. Choose pairs of variables with functional relationships, label them and then

identify what type of variable each one is.

D. Define the following variables, measure them and then describe.

1. attitude towards work


2. fluency in the use of English language
3. attendance in meeting
4. efficient secretary
5. saleable cars in 2012

E. Write down 10 variables in your own discipline and classify these according to
type and identify how you will measure each one.

15
MODULE 2. DESCRIPTIVE STATISTICS

Objectives: At the end of this module, the students should be able to:

1. draw the graphical representative of frequencies with the use of histogram,

polygon, and ogive to show the kind of distributions the data have.

2. compute measure of location, dispersion, skewness and kurtosis of given sets of

data using the theories of summation notation.

3. Compare and contrast the different measure of location, dispersion, skewness, and

kurtosis with a graphical representation of the distribution or curve.

Lesson 1. Statistical Notation

The most commonly used form of notation in statistics is summation notation. A set of

values, measurements or observations is denoted by x1, x2, x3,…xn where n denotes the total

number of variables represented by x. So, we say that the summation of xi observations where

n = 5 is

x1 + x2 + x3 + x4 + x5 = Exi = Exi

Rules for Summation Notation

Theorem 1. If every variate value in a group is multiplied by a constant number or factor, that

factor may be removed from under the summation sign and written outside as a factor. Thus,

Ec xi = cx1 + cx2 + cx3 + cx4 = c (x1 + x2 + x3 + x4) = c E xi

Theorem 2. The summation of a constant over N terms is equal to Nc. Thus,

Ec = c + c + … +c = Nc

16
Theorem 3. The summation of the sum of any number of terms is the sum of the summation of

these terms taken separately. Thus,

E (xi + yi + zi) = E xi + E yi + E zi

Activity 1. Given

x1 = 5 y1 = 2
x2 = 6 y2 = 3
x3 = 12 y3 = 7
x4 = 15 y4 = 10

Find
1. E xi
2. E yi
3. E (xi + yi)
4. E (3xi + yi)
5. (xi – yi)

Lesson 2. Frequency Distribution

The arrangement of data is a frequency distribution. It is arranged to show frequency

of occurrences of values, especially for voluminous data. This arrangement shows how many

times each score occurs so that it is advisable to reduce the number by any desired number of

classes representing the individual scores. Some values can be grouped together

at a given class interval. Consider the following set of data (IQ scores of 45 applicants):

91 88 106 115 96 90

86 99 102 112 102 89

95 105 103 96 97 87

101 100 89 90 96 84

99 86 90 84 100 112

105 94 90 89 98

91 90 92 88 104

84 89 107 91 100

17
Steps in making the frequency table:

1. Find n (sample size) n = 45

2. Find the range by getting the difference between the highest and lowest values.

Range = 115 – 84 = 31

3. Divide the range by any desired interval (2,3,5,7,…). The answer should be equal

to or greater than 9 but not more than 20. 31 divided by 3 is approximately 10,

where 10 is an acceptable number of groups or classes. Therefore, the interval is 3.

4. Now, what number is nearest the highest value, 115, that is divisible by 3? The

answer is 114. Therefore, the first group or class starts with a class interval 114-

116 and the last group should contain the lowest score (in this example, it is 84).

5. Make the class interval and complete the needed information’s you want to include

in the table such as the frequency, midpoint, limits (lower and higher limits), and

the cumulative frequency (less than and greater than).

Class Tally frequency Midpoint limit cum f< cum f>


______________________________________________________________
114 – 116 // 1 115 113.5 – 116.5 45 1
111 – 113 /// 3 112 110.5 – 113.5 44 4
108 – 110 0 109 107.5 – 110.5 41 4
105 – 107 //// 4 106 104.5 – 107.5 41 8
102 – 104 //// 4 103 101.5 – 104.5 37 12
99 – 101 /////-/ 6 100 98.5 – 101.5 33 18
96 – 98 //// 4 97 95.5 – 98.5 27 22
93 – 95 // 2 94 92.5 – 95.5 23 24
90 – 92 /////-//// 9 91 89.5 – 92.5 21 33
87 – 89 /////-// 7 88 86.5 – 89.5 12 40
84 – 86 ///// 5 85 83.5 – 86.5 5 45
_______________________________________________________________

18
Many observations can be made for the distribution. The midpoint is the median of

every class interval. This midpoint can be used to name the whole group such as when you’re

making the graph. Cumulative frequency less than or greater than can be made by adding all

frequencies from below and above respectively. Percentages can also be computed for

cumulative frequencies by dividing the corresponding frequency (i.e. 36/45 x 100) by the total

observation (n) times 100.

A histogram and ogive (graph for cum f< and cum f>) can be made as follows

10

85 88 91 94 97 100 103 106 109 112 113

Midpoint
(A histogram)

19
45

40

35

30

25

20

15

10

85 88 91 94 94 97 100 103 106 109 112 115

Midpoint
(An Ogive)

Activity 2

20
A. For the following ungrouped or new data, make the frequency table with

appropriate interval and then draw its graphical representations such as the histogram and

ogive.

B. Answer the following from the graphs:

1. How many application got IQ score below 90?

2. What per cent of applicants are above average with IQ score of 96 and

above?

3. If you decide to pass a score of 95 and above, how many of the applicants

will you consider passing?

Date:

99 100 89 87 88 99 89 105 106 105 112


87 86 79 90 94 92 90 108 109 111 110
79 85 87 96 90 83 80 84 97 90 92
120 115 97 90 92 91 90 113 114 118 116
108 90 76 77 79 80 81 89 89 96 94
86 88 91 79 88 95 97 101 107 118 100
84 88 97 91

Lesson 3. Measure of Location or Central Tendency

After the collection of data from a common source, individual observations are not

likely to have the same value. It is impractical to keep all the values in mind. What we need is

single value that we may consider typical of the set of data as a whole. This single value is one

of the three most common measure called the measure of location or central tendency: the

mean, median and mode. Consider the following values or observations:

14, 9, 26, 30, 15, 12, 8, 21, 30, 20, 30,

21
Mode: The mode is the most frequently appearing value. In the measurement above,

the mode is 30 appears three times. In same sets of data, only one mode is present-

unimodal; if three are two – bimodal, if there are three or more multimodal.

Median. The median is the middle value of an ordered array when the items are

arranged in ascending or descending magnitudes.

When n=odd number

Example. 15, 20, 7, 12, 18, 25, 30, 21, 9

Arranged is ascending order : 7, 9, 12, 15, 18, 20, 21, 25, 30

Where n = 9

Median = (9+1)/2 = 10/2 = 5th No. (counted either from right or from
left)

Median = 18

When n = even number

Example. 8, 7, 12, 10, 15, 24, 18, 21


Arrange: 7, 8, 10, 12, 15, 18, 21, 24

Median = 8/2 = 4th (counted from left and from right – get the average)

7, 8, 10, 12, 15, 18, 21, 24

Counted from left, 4th number is 12; counted from right The number is 15

Median = (12+15) /2 = 27/2 = 13.5

22
Properties of the Median

1. the median always exits in a set of numerical data.

2. The median is not after affected by extreme values whereas the mean is.

3. The median can be used to characterize qualification data – e.g. quality categories

such as

1 good

2 better

3 best

The median is 2 “better.”

Mean: The most familiar measure of central tendency is the mean, sometimes called

Arithmetic mean or average. We find it by adding the values in a set of data and dividing the

total by the number of values that were added.

For example:

Xi = 8, 12, 21, 14, 18, 27

x (Mean) = Ex

x (Mean) = 102/6 = 7

Properties of the Mean

1. For a given set of data, there is one and only one mean.

2. Since every values goes into its computation, it is affected by the magnitude of

each value.

Comparing Mode, Median and Mean

Give the following frequency curves, the mode, median and mean are located

23
Symmetrical

Median
Mean
Mode

Fig 1. Mean = Median = Mode

Mean Mode
Median

Fig. 2. Mean is greater than the Median


(Negatively Skewed)

Mode Mean
Median

Fig 3. Mean is less than the median


(Positively Skewed)

24
A comparison of the mean, median and mode may be made when al these have been

calculated for the same frequency distribution. In figure 1, the distribution is symmetrical. The

mean, median and mode coincide. In Figure 2, the distribution is negatively skewed. The

mean is less than the median and also less than the mode. In figure 3, the distribution is

positively skewed where the mean is greater than the median and mode.

Activity 3

1. In the following sets of data, find the mode, median and mean

Set A. 17, 24, 18, 12, 36, 42, 18, 24

Set B. 18, 21, 76, 35, 42, 58, 39, 45, 50

Set C. 1, 14, 12, 18, 17, 81, 90, 100

2. Locate the mode, median and mean for sets A and C using a frequency curve.

Lesson 4. Measure of Dispersion or Variability

Of utmost concern to statistician is the variation in the events of various measurements

such as work performance of individual in a given sample.

Consider the following measurements for two groups.

A 9 13 15 18 20

B 2 8 10 27 28

The two sample above have the same mean, 15, however, by inspection, variation in sample B

is greater than the mean in sample A. Among the various measure to describe variation are the
25
1. range
2. mean deviation
3. standard deviation
4. variance

Range. The range is the simplest measure of variation. It is taken as the difference between

the largest and smallest measurements or values. In our samples above.

Range for Sample A = 20 – 9 = 11


Range for Sample B = 28 – 2 = 26

Mean Deviation. The mean deviation is the sum of the absolute deviation of every

measurement from the mean divided by the number of observations. Absolute value means

without regard to algebraic signs.

Consider the following:

Sample A: 10, 10, 10, 10, 10

Sample B: 1, 4, 7, 10, 13

Sample C: 1, 5, 20, 25, 29

By inspection, Sample C is more variable than Sample B. Obviously, Sample A has no

variation. Computing the mean deviation (MD) for sample A and B, we have,

Sample B Sample C

X (x-x) /x-x/ x (x-x) /x-x/

1 -6 6 1 -15 15

4 -3 3 5 -11 11

7 0 0 20 4 4

10 3 3 25 9 9

13 6 6 29 13 13

x(Mean) = 35/5 = 7 x=80/5 = 16

26
MD=E/x-x/ MD=E/x-x/
_____ _____
n n
= 18/5 =52/5

= 3.6 = 10.4

From the result above, the mean deviation for sample C is 10.4, that means that measurements

deviate by 10.4 from the mean. Sample B deviates by only 3.6 from the sample mean.

Therefore, we can say that value from Sample C are more variable than those in B.

Standard Deviation and Variance

In our computation of mean deviations, some deviations from the mean are

positive and others are negative. To do away with the sum of the deviations equal to O,

we get the absolute value of deviations. In the case of variance and standard

deviations, we square the deviations to do away with negative values. Thus, in our

Sample B, the standard deviation and variance is:

2
X x-x (x-x)
___________________________

1 -6 36
4 -3 9
7 0 0
10 3 9
13 6 36

X = 35 = 7

2 2
S (variance) = E (x-x)/n or E (x-x) /n = 90/4

= 90/5 = 18 = 22.5

27
S (standard deviation) = E ( x – x)/n = 90/5 = 4.2 (with square root sign)

2
Or = E (x – x) /n-1= 90/4 = 4.7 (with sq root sign)

The formula above uses N and N – 1. What is the best way to use? Remember that we want

an unbiased estimate of the population variance and by decreasing it by 1, we are sure that at

least one of its values is equal to the mean. So, for unbiased estimate we always use the

formula N – 1, when n is big enough, that is, greater than 25

We noted from our computations that when all the values in a set of data are located

near their mean, there is a small amount of variation or dispersion. And the set of data which

some values located far from their mean have a large amount of dispersion. Expressing these

relationships in terms of standard deviation, we say that the standard deviation is small when

the values are concentrated near the mean. When it is large, the values are dispersed widely

about the mean.

28
Activity 4

A. The following table gives the details of an investment consisting of the prices of

stock per share in 10 months:

Month Stock A price/share Stock B price/share

January P 9.00 P 9.00

February 21.00 21.00

March 20.00 15.00

April 18.00 12.00

May 15.00 10.00

June 15.00 11.00

July 16.00 12.50

August 20.00 15.00

September 20.00 16.00

October 10.00 14.00

B. Compute a) Range of Stock A and B

b b) Mean Deviation of Stock A and B.

c c) Standard Deviation of Stock A and B

d) Variance of Stock A and B

C. Which is a more stable Stock to invest into? Why?

29
Lesson 5. Measures of Skewness and Kurtosis

Consider the following distributions.

SET A. Skewness of a Distribution

Figure 1

Figure 2

Figure 3

30
SET B. Kurtosis of a Distribution

Figure 4

Figure 5

Figure 6

31
Skewness refers to the symmetry or asymmetry of the frequency distribution. Figure

1 is symmetrical, that half of its frequency falls at the left of the mean and another half at the

right. It follows the properties of the normal curve.

-3 -2 -1 0 1 2 3

___________ 50% _________ _____________ 50% ___________

Figure 2 is negatively skewed, that is, more frequencies are concentrated at the right

end of the mean. Figure 3 is positively skewed, that is more frequencies are found at the left

end.

Kurtosis refers to the flatness or peakedness of the distribution. One kind is normally

distributed called mesokurtic (see Figure 4). Another kind is when most frequencies are

evenly distributed from left end to right, called platykurtie, flat (see Figure 5). The third is

when frequencies are more concentrated at the middle. It is more peaked, called leptokurtic

(see Figure 6).

32
To find out the symmetry and Kurtosis of a given distribution, we use the concept of

moments. The term “moments” originates in mechanics (Ferguson, 1982 ; 72). The first four

moments are computed first:

m1 = E (x – x) = 0

2 2
m2 = E (x – x) = N – 1 (s)

3
m3 = E (x – x)

4
m4 = E (x –x)
N

To measure skewness of data, we use the kind moments (m3) defined

g1 = _________ . where

M2 m2

g1 = Skewness

m3 = third moment

m2 = Second moment

To measure kurtosis, the fourth moment is used, the formula of which is

g2 = m4 -3 , where
2
(m2)

g2 = kurtosis

m4 = fourth moment

m2 = second moment

To interpret the results:

33
If g1= 0 and m3 = 0, the distribution is symmetrical.

If g2 = 0 it is asymmetrical – either negative (g1 is negative) or positive (g1

is a positive non-zero number)

if g2 = 0, it is a normal distribution

if g2 > 0, it is leptokurtic, and

if g2 < 0, it is platykurtic.

To illustrate, let’s find the grand g2 of this set of measurements:

A: 6 8 10 12 14

2 3 4
x x-x (x–x) (x–x) (x – x)

6 -4 16 -64 256

8 -2 4 -8 16

10 0 0 0 0

12 2 4 -8 16

14 4 16 64 256

m3 3
x = 50/5 g1 = m3 = E (x – x)
m2 m2
N

= 10 = 0 = 0
5
8 8

= 0
2
m2 = E ( x – x )/n
= 40/5
= 8

34
g1 = 0, symmetrical

m4 4
g2 = 2 -3 m4 = E (x-x)
m2 N

108.8 = 544
g2 = 2 -3 5
8
= 108.0
= 108.8 -3

64

= 1.7 - 3

g2 = - 1.3 , platykurtic.

Activity 5

A. Compute the skewness of the following sets:

SET A.

9, 12, 18, 24, 6, 12, 10, 24, 28, 15, 36, 40, 27, 9, 10, 10,

24, 20, 19, 20

SET B:

9, 10, 12, 8, 10, 7, 10, 14, 10, 28, 30, 36

B. Decide whether the following are flat or peaked by computing the kurtosis

Computing the kurtosis

48, 56, 38, 20, 76, 84, 29, 37, 35, 58, 60, 64, 78, 100, 96

84, 80, 90, 90, 92, 78, 86, 72, 59, 60, 59, 54

75, 35, 45

35
Module 3. Inferential Statistics

Objectives: At the end of the modules the students should be able to:

1. conduct tests of hypotheses about

a. test of relationship

b. test of proportions

2. make statements of hypotheses of given research problem using the null

hypothesis.

3. Follow the steps in hypothesis testing in the conduct of solving problems.

Lesson 1. Hypothesis Testing

A hypothesis is a conjectural or speculative statement about the magnitudes of the

population parameters using sample data. The purpose of hypothesis testing is to help one

reach a decision about a population by examining the data contained in a sample from that

population.

Steps in Hypothesis Testing

1. Statement of the hypothesis.

Hypothesis is stated in null form, that is an unbiased statement of no relation or

difference. For example, the statement that 60% of the employees in a firm have had

one two years of college. The null hypothesis is a statement that is 60 %, symbolically

Ho : p = 0.60 (null)

H1 : p = # 0.60 (alternate)

P > 0.60

36
P < 0.60

Another is when a researcher wants to find out whether the male workers have a higher work

performance (mean performance = 96) than female worker ( x = 84). The null hypothesis.

Ho : x1 = x2 Where

x1 = mean of males

x2 = mean of females

2. Identify what test statistic is appropriate to use to test the null hypothesis.

3. Compute the test statistic.

4. Decide whether to accept or reject the null hypothesis.

To reject : The value of test statistic should be greater than the

tabular value of that test statistic.

tc > ttab ; r, F, x

To accept: The value of test statistic should be lower than the tabular

tc < ttab ; r, F, x

Assumed tabular
Value (2.47)

assumed
computed value
(3.29)

-3 -2 -1 0 1 2 3

37
The computed test should be found in the region or rejection or since depending on the

value (computed vs tabular). In the above figure, mull hypothesis is rejected. Accept the

alternate hypothesis that there are no significant differences.

5. Decided the level of significant or acceptance. In most cases, the 5% = 0.05 and

1% = 0.01 levels are used, for both two-tailed and one-tailed tests. When do you decide

that the statement is a two-tailed or a one-tailed test?

Two-tailed (non-directional test) = when the conjectural statement has no

predetermined bias against one group or the research has no idea that one group is better than

the other (this is when testing differences between two groups); this is also called non-

directional test. For example

Ho : x1 = x2

Ho : x1 = x2 or x1 # x2

x < x2

1-x = .95
x = .05

x = 0.025
region of Region of 2
rejection acceptance region of
rejection

-1.96 1.96

0.005 0.005

-2.58 2.58

38
One-tailed: one directional test has a predetermined bias against one group.

Form example, the researcher wants to test if boys have a higher performance in

Mathematics than girls.

Ho : Xboys = Xgirls

Ho1 : Xboys > Xgirls

Xboys < Xgirls

X = 0.05 - 1.645 1 – x = 0.95

X = 0.01 - 2.33 x (.01) = 2.33

Note that these values at 0.05 and .01 level vary from statistics to another.

6. Reject or accept the null hypothesis given the degrees of freedom (df) and level of

significance.

7. Make your conclusion. Conclusions depend upon your decision of the null hypothesis.

When you reject the null hypothesis, your conclusion is based on the alternate

hypothesis, the accepted hypothesis. For example, if you rejected the null hypothesis:

Ho : x1 = x2

H1 : x1 # x2

Then you accept the alternate hypothesis (H 1) that means you conclusion is that there is a

significant difference between the means of boys and girls.

39
Activity 1.

A. State the Ho and H1 of the following problems:

1. The quality control department of a food processing firm found out that the

mean net weight per package of cereal must not be less than 30 ounces. In a

survey of 15 packages, the mean was 18 ounces.

2. One psychologist found out that the mean sociability index of sales

representative is 48.6 while that of accountants of the same company is 32.8.

B. State some inferences that you know and then transform these into null form (using

symbols or descriptions).

Lesson 2. Test of Relationship

The study of paired measurements is closely related to correlation and prediction.

Our concern here is with the problem of describing the degree of magnitude of the relation

between two variables. The statistics used is called correlation coefficient using the measure

Pearson-product-moment correlation coefficient or r. This is a statistics of the interval-ratio

type of variables.

Researchers may want to find out what relation exits between attitude

(x) and work performance (y) of employees. Two paired measurements can be computed

using the simple Parson r

Xi Yi

X1 y1

X2 y2

X3 y4

X4 y4

X5 y5

40
The formula is defined as

N E NY - EX EY

r =
2 2 2 2
[ n EX -(EX)}] [n E y – (Ey) ]

The degree of relationship between x and y may take from –1 to +1. There is a

perfect negative relation when r = -1 a perfect positive relation when r = +1.

The scatter diagram shows degrees of relations between x and y

y y

x x
Perfect + relation (r = 1) Perfect + relation (r = 1)

y y

x x
r is between –1 and 1 r=0

Positive relationship between attitude (x) and make performance (y) can be stated and

interpreted as

41
a) The more positive attitude a worker has, the better is his

performance (r is positive)

b) The more negative attitude, the less be performs

X y

+ + Positive r (High)

- -

+ -

- + Negative r

Consider the following paired measurements taken from a survey of 8 employees.

x wage

y cost of living

x (in thousands) x2 y (in thousand) y2 xy


8 64 6 36 48
4 16 5 25 20
6 36 3 9 18
12 144 10 100 120
24 576 15 225 360
18 324 9 81 162
14 196 7 49 98
3 9 2 4 6

Compute the following :


2
a. Ex or 89
2
b. Ex or 1365
2
c. Ex or 57
2
d. Ey or 529
2

42
d. Exy or 832

8 (83.2) – 89 (57)
r =
[8 (1365) – (89) 2] (529) – (57)2]

r = 0.91 (highly correlated)

Therefore, the relationship is: the higher the salary of on employees, the higher his

cost of living is, and the lower the salary, the lower is the cost of living.

When r is squared,

2 2
r = (.91)

= 0.83 (explained variance)


2
and 1 - r (unexplained variance)

1.- 0.83 = 0.17

This result can also be interpreted as 83% is influenced by wage and 17% is attributable to

other factors other than wage.

Prediction

When two variables are highly correlated, prediction of one variable is possible from

a knowledge of the other. The presence of a nonzero correlation ( r ) between x and y implies

that if we know something about x, we know something about y and vice versa. If knowing x

implies some knowledge of y, a prediction of y form x is possible. The greater the value of

correlation between x and y, the more accurate the prediction of one variable form the other.

The Linear Regression of y on x

Using the idea of the equation of a straight line given by

43
y = bx + a where

b = is the slope of the line

c = constant

AC
slop = -----
BC

A
y

The slope of the registration line for predicting Y from X is given by

y’ = predicted value of y

y’ = byx + ayx where b = slope

a = constant

The values of byx and ayx may be calculated as follows

byx = N Exy - ExEy

N Σ X - (ΣX)

ayx = Ey – byx EX

Similarly, predicting X from a knowledge of y is calculated by the following:

x’ = bxy Y + axy where

x’ = predicted value of x

b = slope

a = constant (line intercepts x axis)

44
bxy = N Exy - ExEy

N Ey – (Ey)

axy = 8 (832) – 89 (59) = 1583 = 0.53

8 (1356_) – (89)

a = Ey – Ex, = 57 – 0.53 (89)/8

N = 57 –47.17

Therefor, y’ = 0.53 x + 1.23

If we want to predict y given x = 20, then

y’ = 0.53 (20) + 1.23

= 10.60 + 1.23

= 11.83 or 12

Computation or correlations involving three or more variables

Involves multiple correlations (R). This involves tedious process of computation so that the

computer can do it for you. The last chapter will give you the output, the correlation matrix

and how to analyze these results.

Note that for every pair of variables, one variable is an independent variable and the

other is the dependent variable – the variable that is influenced by the other. Our example

about wage (x) and cost of living (y), y is the one influenced by wage, that is amount we

spend is always attributed or influenced by the amount we earn.

To test if the degree of relationship is significant, compare the computed r with the

tabular r given N – 1 degrees of freedom at .05 or .01 levels. For example, if r = 0.91 with df =

45
7, we compare the tabular r (.05) and r (.01) = 0.798. Since r computed = 0.91 > r tabular (or

critical), then we reject the null hypothesis,

Ho : rc = rt

H1 : rc = rt

Our conclusion is the there is a significant relationship between wage and cost of living. We

can strongly say that cost of living is significantly influenced by the amount we earn.

Activity 2

A. Infer the relationships between pairs of variables as follows. Identity the independent

variable (IV) and dependent variable (DV.

1. A study conducted to analyze the relationship between production and

manufacturing expenses.

2. A firm wants to find out if sales volume is related to effective buying income.

3. A market analyst wants to find out if there exists a relation between traffic and

market sales strategy.

46
B. Compute the correlations of age and efficiency ratings of 20 assembly line employees.

X Y

44 61
44 41
45 89
43 76
40 79
52 67
43 73
46 94
53 96
43 77
51 60
50 78
61 74
47 82
62 70
34 70
51 60
48 67
51 72
57 80

C. From the result in B, predict the efficiency ratings for each of the following ages:

Age Efficiency Rating

29 __________
70 __________
24 __________

47
Lesson 3. Test of Independence and Proportions

In any research situations, we may wish to compare a set of observed frequencies with
a set of theoretical frequencies. In such situations, the chi-square (x) is used and is defined by

2
2 (o-e)
X = _______ where,
e
0 observed frequency

e expected frequency

Activity 3

A. In tossing a coin 300 times, the following results are obtained

No. of Head = 90 and No. Tails = 210

Is the coin balanced or biased? Why or why not?

B. A certain business journal showed that 30% of Makati Investors are Chinese, 12%

Americans, 42% Japanese, 9% other nationalities, and only 3% Filipinos. A survey

of 200 investors in Makati showed the following results:

Chines 56
American 41
Japanese 60
Filipino 25
Others 18

Total = 200

Test the hypothesis that observed (surveyed) frequencies is equal to expected.

C. The following contingency table shows a relation between pass and fail on ratings

of job performance of 100 employees. Test the hypothesis that job performance is

independent of examination results.

48
Rating

__________________________________________
Below Average: Average: Above Average: Total
__________________________________________

Pass : 11 25 35 71
Fail : 15 7 7 29
_____________________________________________
Total 26 32 42 100

Consider a market research study. Two brands of soap, A and B, were

distributed to 200 households. After its use, they were asked which brand they

preferred. The results show 112 preferred A and 88 preferred B.

Ho: There is no difference in consumer preference for the two brands


of soap, A and B

H1: One brand of soap is preferred significantly against or a 50:50


split exists

Brand A Brand B Total

Preference 112 88 200

Computation:
2
E 0 o-e (o-e) (o-e) /e
__________________________________

A 100 112 12 114 1.44


B 100 88 -12 144 1.44
__________________________________

X (tabular) with df =1 at .05=3.84, at .01 = 6.64

49
Since the computed x = 2.88 is less than the tabular value at .05 level with df=1, then

the null hypothesis is accepted. It can be concluded that no difference exists in the

consumer preference between brands A and B.

Test of Independence

In test of independence, two variables are involved. These are usually nominal

variables. The question is whether the two variables are independent of each other. The

data are arranged in a table called contingency table.

Variable : A1 : A2 : Total
________________________________________________
B1 20 10 30
B2 30 40 70
_________________________________________________
Total : 50 50 100

For example, 200 males and females were asked if they were smoking. The results show the

following table.

Response
Sex : yes no Total
____________________________________
M 96 (66.6) 25 (53.4) 120
F 15 (44.4) 65 (35.6) 80
_____________________________________
Total 11 89 200

Ho: Smoking is independent of sex


H1: Smoking is related or associated to sex
2

50
Computing for the chi-square (x), we have to compute the expected frequencies as follows:

2 2
O E O-E (O-E) (O-E)/E

96 120(11)/200=66.6 29.4 864.36 12.98


24 120(89)/200=53.4 -29.4 864.36 16.19
15 80(111)/200=44.4 -29.4 864.36 19.46
65 80(89)/200=35.6 29.4 864.36 24.28
--------------
2
X = 72.91
2
The computed X value is very much greater than the tabular value of x=6.64 at .01 level with

df=1. The null hypothesis is rejected. Therefore, it can be safety concluded that smoking is

associated or dependent of sex; that is, males tend to smoke more than girls.

Lesson 4. Test of Differences Between Means

Test of significance may be applied to the difference between means of

a. two independent samples

b. the same sample under two different conditions

c. three or more independent samples or groups

All tests can only be applied to interval-ratio scales or variables.

Significance of the Difference Between Means of


Two Independent Samples

Two groups with N1 and N2 cases are independent with x1 and x2 are assumed to be

drawn from normally distributed population with equal variances. If these assumptions are

warranted, then sample means can be tested for significant differences between means defined

by the statistic t ( Student t),

51
X1 - x2

t = ------------- where,
Sx1-x2

Xi → mean of the first group

X2 → mean of the second group

Sx1-x2 → standard error of the difference


between the two means

Steps:

1. Compute the respective means for the two groups and then get their difference.

2 2
2. Compute the variances and add together these variances (S1 + S2) to obtain the

2
pooled variance (S). The formula is given by

2 2
2 E (X1-X2 ) + E (X2-X2)
S = ____________________________
N1+N2-2

2
3. With the known pooled variance ( S ), compute the standard error of the difference
between means

S = ) 2 2

x1-x2 s /N1 + s/N2

4. Now, you are ready to compute for t

X1 - X2
t = --------------------
S
X1 - X2

52
The null hypothesis being tested here is that there is no significant difference

between the two means,

Ho: X1 - X2 = 0 or X1 - X2

H1: X1-X2 for non-directional test or two-tailed test

X1>X2 or X1<X2 for directional or one-tailed test.

The degree of freedom (df) with this test is N1+N2-2 tested for significance at .

05 or .01 level.

Let’s consider the illustration.

Example 1. A Sociologist wants to find out if in two government agencies the employee differ

in their perception on the positive effects of liberalized or global economy brought about by

the APEC meeting. A test was administered on their degree of perception. The table below

shows the scores of both groups:

Sample A Sample B

30 30
30 28
36 26
20 45
42 20
38 15
25 21
27 27
48 20
40 32
32
30
21

___________________________

53
Ho : xA = XB

H1 : xA = XB

Df = N1 + N2 - = 10 + 12 – 2 = 20

Level of acceptance a = 0.05

Computation:

1. XA = 330/10 = 33

2. 2 2
A (x-xA) B (x – xB)

24 81 30 14.06
30 9 28 3.06
36 9 26 0.06
20 169 45 351.56
42 81 20 39.06
38 25 15 126.56
25 64 21 27.56
27 36 27 0.56
48 225 20 39.06
40 40 32 33.06
30 14.06
21 27.56
_______________________________________________
2
3. S = 748 + 676.22
___________
10 + 12 - 2
= 1424.22/20
= 71.21

4. S = ) 71.21/10 + 71.21/12
x1 - x2
= ) 7.12 + 1.869
= 3.612

5. t = 33 -26.25/3.612 = 1.869
tc (.05) with df = 20 2.086 > t = 1.869

54
Decision: Accept the null hypothesis since t-computed is less than t-tabular (1.869 < 2.086)

Conclusion: Employees of both government agencies equally perceived the effect of

liberalization of Philippine economy. Although, sample B has a smaller mean compared to

Sample A. this differences is not significant.

t-test Correlated Groups

t-test for correlated groups applies to interval-ration scores from the same sample or group

who are exposed under two different conditions. These condition may be:

a. pre-test and post-test comparison to see significant improvement, increase or

decrease after treatment is given.

b. Developing parallel test to see reliability of the test through time or validity

Of a test in terms of content administered to the same group.

__________________________________

01 ---------> X ---------> 02

pre-test treatment post-test

02 – 01 = difference

__________________________________

The null hypothesis being tested here is that there are no significant gain or

improvement after a treatment is given. This treatment can be time, a certain new drug,

new program, training, or any strategy to affect or influence attitude or any physical

characteristic in the sample. The post-test score will show this change. If found that the

difference is significant, this change can be attributed to this treatment.

55
Thus, the null hypothesis is stated:

Ho: x1 = x2 or x1 – x2 = D = 0

Hi : x1≠x2 or x1 – x2 = D = 0

The formula for t is given by either of these two:

t= ED or
NED - (ED) / N – 1

t= D where D ------> mean difference


See Se ------> standard error

2 2 2
Se = Sd/N and SD = E (D –D) / N – 1

Example 2. A company manager wants to find out if work efficiency can be improved

applying Total Quality Management (TQM). He devised an instrument to measure work

efficiency and pre-tested this to 20 rank-in-file employees and then after the TQM was

applied. After six months, he administered the same test to see if there is a significant

improvement. The following are the pre-test and post-test score.

56
2
Pre-test Post-test D D

80 90 10 100
86 88 2 4
78 90 12 144
80 86 6 36
90 92 2 4
92 95 3 9
87 88 1 1
75 80 5 25
78 80 2 4
80 84 4 16
84 80 -4 16
84 80 -4 16
92 90 -2 4
90 85 -5 25
87 84 -3 9
80 86 6 36
82 80 -2 4
85 80 -5 25
84 89 5 25
___________________________________
ED = 31 507

31
t = ____________
20 (507 – (31) / 20 –1

t = 31/ 21.98 = 1.41

tc (.05) = 2.093 > t = 1.41

Decision: Accept the null hypothesis

Conclusion: there are no significant differences in the pre-test and post-test scores, that

means, the TQM did not influence significant change in the work performance of

employees.

57
Analysis of Variance: One-Way and Two-Way

Analysis of variance is a method of dividing the variation observed in experimental

data into different parts. It is used to test the significant of differences between means

of three or more populations, applying different treatment to each of the k sample

being comprised of n members. We may wish to test the effectiveness of the effects of

k-treatments such as methods of instruction, methods of training, exposure to different

programs, different dosages of drugs and exposure to managers using different

management styles. The problem of testing the significant of the differences between a

number of means results from experiments to study variation in a dependent variable

with variation in an independent variable.

Some Experimental Designs

A. One-way classification

Environmental Conditions

Group A Group B Group C

(Restricted) (Free) (Combination)

NA NB NC

Ho: XA = xB = xC

H1: XA = XB = XC

58
B. Two-way classification

Environmental Conditions

Position : Group 1 : Group 2 : Group 3 : Total


---------------------------------------------------------------------------------
Clerical cell – 1 cell – 2 cell – 3 R-1
_______________________________________________________
Supervisory cell – 4 cell – 5 cell – 6 R-2
_______________________________________________________
Managerial cell – 7 cell – 8 cell – 9 R-3
_______________________________________________________
TOTAL G-1 G-2 G-3 N
________________________________________________________

59
Data Analysis

Data for one-way ANOVA is analyzed with the use of F-test for k-groups equal of or
greater than three (3). A summary table for analysis is made such as the following:

Summary Table for One-Way


ANOVA
__________________________________________________________
Source of : Sum of : Degrees of : Mean :F
Variation Squares Freedom Estimate
__________________________________________________________
Between Groups : k –1
Within Groups : n–k
__________________________________________________________
TOTAL N-1
__________________________________________________________
**
p < 0.01

For two-way classification, the F-test is also used and analysis is made with the following

summary table:

Summary Table for One-Way


ANOVA
__________________________________________________________
Source of : Sum of : Degrees of : Mean :F
Variation Squares Freedom Estimate
__________________________________________________________
Between Rows R –1
Between Columns C–1
Interaction (R-1) (C-1)
With Cells RC (n-1)
__________________________________________________________
TOTAL n RC - 1
__________________________________________________________

60
The null hypotheses tested for two-way ANOVA consist of the following:

1. Ho1 : There are no significant differences between rows.

2. Ho2 : There are no significant differences between columns.

3. Ho3 : There is no significant interaction effect between rows and columns

under different conditions or treatment and groups.

Interaction effects can be understood by an illustrative example and graphs of

mean shown below:

A. No Interaction Effects

Methods

1 2 3 Age

80
∙ A
60 ∙ ∙

40 ∙ ∙ B

20 ∙ ∙ ∙ C

B. No Interaction Effects

61
Methods

1 2 3

80

60 ∙ ∙

40 ∙ High

∙ ∙ Average
20
∙ Low
0

When there are significant differences found between groups in one-way ANOVA, say K=4,

then there is a need to find out which pairs of group means are significant. Thus, a posteriori

t=test, Studentized Range or Newman-Keuls Methods, Tukey Method or Scheffe’ Method

may be used:

______________________________________

Groups

1 2 3 4

X12 X23 X34

X13 X14

X24
________________________________________

Comparison F

________________________________________

1,2
1,3
1,4
2,3
2,4
3,4
_______________________________________

62
Using the Scheffe’ Method, the following steps should be done:

1. Calculate the F-ratio between of means using the within-group variance estimate, Sw.
2
(X1 – X2)
F = _______________
2 2
Sw / n1 + Sw / n2
2. Consult the table of F and obtain the value of F required at .05 or .01 level, for df1 = k –

and df2 = n – k.

3. Calculate the quantity F’ , which is F’ = (k –1) F.

4. Compare the value of F and F’. To be significant at any required level, F must be greater

than or equal to F’.

Example 3. Score in productivity for three groups of salesmen exposed to three different

kinds of training program. Test the hypothesis:

Ho. No significant differences among the three methods of training program

Group
_______________________________________
A B C
5 9 1
7 11 3
6 8 4
3 7 5
9 7 1
7 4
4
2
______________________________________
n 8 5 6 N=9

Ti 43 43 18 T = 104
2
Xi 5.38 8.40 3.00 T /N =560.26
2
Exij 269 364 68 EE xij = 701
2
2 T

63
Ti/nj 231.13 352.80 54.00
E --- = 637.93
N
_______________________________________________________

Sum of Square
______________________________________________
Between 637.93 - 569.26 = 68.67
Within 701 - 637.93 = 63.07
______________________________________________
Total 701 - 569.26 = 131.74

Summary Table for One-Way


ANOVA
__________________________________________________________
Source of : Sum of : df : Mean : F
Variation Squares Estimate
__________________________________________________________
Between 68.67 2 34.34 **
Within 63.07 16 3.94 8.716
__________________________________________________________
TOTAL 131.74 18
_________________________________________________________
**
p < .01

The null hypothesis tested that there are no significant differences between

groups of salesman exposed to three different methods of training is rejected since the

computed F-ratio=8.716 is greater than the tabular F-ratio=6.23 at .01 level with df = 2/16.

Therefore, it can be concluded that expose to the different training program has affected

significantly the productively of salesman – that is Group B got the highest mean, followed by

Group A and then Group C.

To find out which of the pairs had significant different, the Scheffe Method is

used:

64
Table for Comparison of Pairs
Using Scheffe’ Method
______________________________
Pairs Fair
______________________________
*
1,2 7.18
1,3 2.07 ns
*
2,3 3.75
________________________________
*
p < .05

Computation of F of pairs using the formula


2
(X1 – X2)
F = _____________
2 2
Sw/n1 + Sw/n2
2
Pairs F (1,2) = (8.40 – 5.38)
______________ = 7.08
3.94./4 + 3.94./8
2
F (1,3) = (5.38 – 3.00)
_____________
3.94/8 + 3.94/6 = 2.07
2
F (2,3) = (8.40 – 3.00)
_____________ = 3.75
3.94/5 + 3.94/6

The required tabular F (.05) = 3.63 with df1 = 16. Therefore, the only pairs whose F is found

higher than 3.63 are pairs 1,2 and 2,3. This shows that no significant different exists between

Groups 1 and 3 but significant differences are found in Groups 1 and 2 and Groups 2 and 3.

Analysis of Variance can be two-way independent variables simultaneously analyzed.

This is also true with three-way classification with three independent variable being analyzed

65
at the same time. In so far as computations are concerned, it is recommended that

computations be done through the use of the computer using the SPSS (Statistical Package for

the Social Sciences). The next chapter will give you knowledge about interpreting and

analyzing computer printouts using the different test statistics useful in research.

Module 4. Analysis of Computer Outputs

Objectives: at the end of this module, the students should be able to:

1. program data collected from surveys and experiments ready for encoding in the

computer.

2. Interpret results of computer printouts for both descriptive and inferential test

statistics.

Lesson 1. Preparing the Program, the SPSS Program, for Encoding.

You have learned the different types of variables—how to define them in order to be

measurable. To a researches and statistician-measuring, encoding, labeling of variables for

encoding in the computer are very important. The following tables show the program for

computer encoding based on the three research problems.

1. Two what extent is the acceptability of the new training program of selected

employees of ABC company?

2. Are there significant differences in the level of acceptability when respondents are

grouped according to the following variables?

a. sex

b. age

c. educational attainment

d. position

66
3. Are there significant relationship between employees’ acceptability level and the

variable above?

Steps:

1. Identify what variables are used as found in the research problems. there

are:

Acceptability – measured in terms of 5-point scale

4.45 - 5.00 100%


3.45 - 4.44 75%
2.45 - 3.44 50%
1.45 - 2.44 25%
1.44 - below 0%
Sex - measured as

Sex
Male - 1
Female – 1

Age – measured in years (e.g. 36 years old)

Educational Attainment – measured using the following scores

Score
High School Graduate - 1
College Undergraduate - 2
College Graduate - 3
With MA Units - 4
MA degree holder - 5

Position – measured using the following codes

Score/Code
Position and File - 1
Supervisory - 2
Managerial - 3

2. Enter the score in a table for each corresponding respondent for all
variables.

67
Respondent Educ’l Scores in
No. Sex Age Attainment Position Acceptability

1 1 36 1 1 89
2 1 28 3 2 72
3 2 25 1 1 90
4 2 36 3 1 76
5 2 29 3 3 74
6 1 45 4 3 86
7 2 39 2 1 84
8 2 45 5 2 90
9 2 52 4 2 42
10 1 56 1 1 92
11 1 58 4 3 68
12 2 61 4 3 89
13 1 48 3 1 80
14 1 39 5 2 84
15 2 53 1 1 64
16 1 27 2 1 87
17 1 60 5 2 60
18 2 50 5 3 78
19 2 42 2 1 58
20 2 35 3 2 69
21 2 30 2 1 46
22 1 54 4 3 78
23 1 50 2 1 90
24 2 48 5 3 65
25 2 40 3 2 86
____________________________________________________

3. Encode all entries made (as in the above table ) in the computer.

Open SPSS for Windows


Command Type in New Data
Input the Program Type in the Variable name

Program the appropriate test statistics (refer to ANALYZE DATA Menu).

Statistical tests (based on our research problems)

# 1 problem →
Mean
# 2 problem →
Group Mean
t-test →
a) Sex – 1, 2
t-test →
b) Age – 1 ( 39 and below)
2 (40 and above)
ANOVA → c) Education – 1,5

68
ANOVA → d) Position – 1,3
# 3 Problem → Correlation Matrix
Printout

B. t-test for differences between sex and score for acceptability level

C. t-test for Age differences in acceptability level

D. One-way ANOVA for level of Acceptability in terms of differences in

education

E. One-way ANOVA on the level of acceptability differences in terms of

differences in position

F. Correlation Matrix for all Variables (sex, age, education, position and level of

acceptability)

Lesson 2. Interpreting Results of Computer Outputs

The following are the tables taken form the printouts based on the program made for

statistical analysis. What you should do is to present the data, using the different ways of

presenting data you have learned (textual, tabular, and graphical).

Table 1. Mean acceptability Level


by Sex
___________________________________
Group : f : Mean : SD : Se
___________________________________

Female : 11 80.54 10.17 3.067

Male : 14 72.21 15.75 4.21


___________________________________

Total : 25
___________________________________

69
Table 2. Mean acceptability Level
by Age
__________________________________
Age : f : Mean : SD : Se
Group
___________________________________

1 (39 yrs. : 10 77.10 13.2 4.18


& below)

2 (40 yrs.
& above) : 15 75.07 14.86 3.84
___________________________________

Total : 25
___________________________________

Table 3. Mean acceptability Level


by Education
_______________________________________
Group : f : Mean : SD : Se
_______________________________________
1 (High School) 4 83.75
2 (Col Undergrad) 5 73.00
3 (Col Grad) 6 76.17
4 (W/MA/MS) 5 72.60
5 (W/Doctoral) 5 75.40
_______________________________________
Total
_______________________________________

Table 3. Mean acceptability Level


by Position
_______________________________________
Group : f : Mean : SD : Se
_______________________________________
1 (Rank-anf File 11 78.09
2 (Supervisory 7 71.86
3 (Managerial) 7 76.86
_______________________________________
Total
_______________________________________

70
Table 5. Result of t-test for Sex
Differences on Level of Acceptability
____________________________________________
Statistics : Grp 1 (Female) : Grp 2 (Male)
____________________________________________
n (No. of Cases) 11 14
x (Mean) 80.54 72.21
x1 – x2 8.32
____________________________________________
t-value (df=23;01) = 1.52 NS
____________________________________________
NS
p .05 since t (tabular; .05) = 2.065

Activity 4.

A. Test the hypothesis that there are no differences in ratings of performance of


employees coming the different departments

Dept. A Dept. B
(Production) (Accounting)
90 86 90 87
96 78 85 86
86 70 80 84
85 80 84 80
80 82 90 76
78 81 90 75
95 96 87 78
92 89 80 89
84 94 84 90
80 99 83 90

B. Test the hypothesis that power consumption significantly increased in 2008.

Power Consumption for year 2006 and 2008 in thousands of KWH

Year Months
2006 Jan Feb Mar April May June July Aug Sept. Oct. Nov. Dec.

2007 4.5 3.6 5.8 9.2 8.7 6.9 7.2 7.5 8.2 8.4 9.2 9.4
2008 5.3 6.9 12.5 12.2 10.6 10.8 10.9 10.8 11.5 12.7 12.9 13.5

71

S-ar putea să vă placă și