Documente Academic
Documente Profesional
Documente Cultură
Introduction
Statistics may be defined as the branch of mathematics that deals with the systematic
method of collecting, classifying, presenting, analyzing, and interpreting quantitative or
numerical data.
DIVISION OF STATISTICS
A. COLLECTION OF DATA:
The data collected must be valid, reliable, relevant and consistent with
other information to the problem at hand. Data collected may be classified
as:
1
a. Primary data – refer to data obtained directly from an original source by
means of actual observations or by conducting interviews. The direct
source could be an individual or family group, business entities or private
and government agencies.
b. Secondary data – refer to data or information that come from existing
record ( published and or unpublished ) in usable form such as surveys,
census, business journals and magazines, newspapers, commercial
publications, and others such as theses and dissertations and research
papers, etc.
c. Internal data – data taken from the company’s own records of operations
such as sales records, production records, personnel records, etc.
d. External data – data that come from outside sources and not from the
company’s own records.
B.PROCESSING OF DATA
After data have been collected, they have to be processed. Processing of
data includes:
a.Editing – in which the purpose is detect errors and omissions, and to
ensure that the data gathered are accurate, consistent with other information,
complete, and should be arranged in such a way as to facilitate and
classification.
b.Coding – refers to assigning numerals and other symbols to the data
collected to be able to group them into a limited number of classes or
categories.
c.Classification – refers to sorting of the data and grouping them on the
basis of similarity . The purpose of classification is to enable us quickly see all
the possible characteristics in the data collected.
C. PRESENTATION OF DATA.
3
Types of Graphs:
1.Bar Graph. The simplest form of graphing presentation generally intended for
comparison of simple magnitude. It may either be horizontal bar graphed or a
vertical bar graph.
Business classification
2.Line Graph. The most widely use practical device effective in showing a trend (
Changes in value ) over a period.
3.Circle or Pie Chart. A circle divided into parts whose sizes are proportional to
the magnitude or percentages they represent. Used to show the component parts
of a whole.
4
D. ANALYSIS OF DATA.
E. INTERPRETATION OF DATA.
Cumulative Frequency- is used in getting the value for the median, quartiles,
deciles and percentiles.
Data- point to statistical facts, principles, opinions and various items of different
sources.
Grouped data- are properly organized and classified data such as the use of
frequency distribution.
TYPES OF MEASUREMENT
The data can be classified into two types. These are the continuous and
discontinuous or discrete data.
Continuous data- are measures like feet, pounds, kilos, minutes and meters.
These kinds of data can be made into measurement be made into
measurement of varying degrees of precision, for example, 1 yard equals three
feet ( 1 yd = 3; 1 ft = 12 in.
MEASUREMENT OF SCALES
According to Stevens, there are four types of scales that are used in sciences.
These are the nominal, ordinal, interval and, ratio..
6
Interval scales- are numbers that reflect differences among items. Examples are
scores in a test, grades of students, ages, blood pressures, Fahrenheit and
Celsius thermometers.
Ratio scale- the highest type of scale. The basic differences between the interval
and ratio scale is that ratio scale are the measures of length, weight, loudness,
width, and so on.
STATISTICAL SYMBOLS
x = y x equals y
x ≠ y x is not equal to y
x ˃ y x is greater than y
x˂ y x is lesser than y
7
The characteristics of the population are called parameters while the
characteristics of the sample are called statistics.
Mean μ , mu x
Number of Cases N n
Proportion P p
Variance S2 s2
Summation Notation
Example 1. If N =5 the following observations are X1 = 2 ; X2 = 4; X3 = 3 ; X4 = 5; X6 =
6, find the sum of five values of Xi using summation notation.
N
Solution: Σ Xi = X1 + X2 + X3 + X4 + X5 = 2 + 4 + 3 + 5 + 6 = 20
i=1
Example 3. Suppose a be a constant. Find the sum of the values, when a constant has
been added to each, Use example 2, where N = 3 and X i = 5; X2 = 4; X3 =1
N
Solution: Σ (Xi + a) = (Xi + a) + (X2 + a) + (X3 + a) = 5 + a + 4 +a + 1 + a = 10 + 3a
i=1
So we can say that the es of the variables plus N times the constant. Therefore;
N N
Σ (Xi + a) = Σ Xi + Na
i=1 i =1
8
Example 4. Suppose a be a constant has been subtracted from each observation X i.
Find the values using the notation of N = 4 and X 1= 4; X2 = 7; X3 = 1; X4 = 5.
N
Solution: Σ (Xi - a) = ( X1 – a ) + ( X2 – a ) + ( X3 – a ) + ( X4 – a )
i=1
= ( 4 –a ) + ( 7 – a ) + ( 1 – a ) + ( 5 – a )
= 17 – 4a
So, the sum of the values of a variable when a constant has been
subtracted from each is equal to the sum of the values of the variables
minus N times the constant. Therefore,
N N
Σ (Xi - a) = Σ X1 - Na
i=1 i-1
THE NATURE OF STATISTICS: Statistical investigation can be classified into two major
functions;
1. Descriptive Statistics- method of collecting and presenting data. it includes the
computation of measures of central tendency, measure of central location,
likewise the measures of dispersion or variability. It also includes the construction
of tables and graphs.
Sampling- the method of getting a small part from the population that serves as
the representative of the population called sample.
9
Note: If the population under study is too large to handle and will entail too much time, cost,
and effort, taking samples is a very alternative. It should be noted that if a small part of the
population is considered, sampling error should be expected. Thus, in drawing conclusions
about the population from which a sample is drawn, the researcher should learn how to draw
samples that are truly representative of the population. Different sampling techniques include
sample random sampling, stratified sampling, cluster sampling and multi- stage
sampling.
A simple random sample is a subset of a statistical population in which each
member of the subset has an equal probability of being chosen. A simple
random sample is meant to be an unbiased representation of a group.
Note: The problem that is commonly encountered is determining the sample size.
It is not advisable to set a certain percentage; instead, the margin of error which
is from 1% to 10% in social science researches should be considered. The
computation of the sample size, relative to the population size has this formula:
n= N
1 + Ne2
Where: N = the population size
e2 = the margin of error
n = the sample size
10
Example 1. Find the sample size if the population size is 2500 at 95% accuracy.
= 2500
1 + 2500 ( .05 )2
= 344.83 or 345
n = N_______
1 + Ne2
= 200
1 + 200 (.03)2
= 169.49 or 169
11
II. MEASURES OF CENTRAL TENDENCY
X = ΣfM X = Am + ( Σfd ) i
n n
Where: X = the mean;
ΣfM = the summation of the products of frequencies and midpoints
Σ fd = the summation of the products of frequencies and deviations
Am = assumed mean, the midpoint of the class where the zero
deviation is placed.
n = the number of cases or scores
i = the class interval
b.Median – median of the distribution. Half of the values in the distribution fall
below the median, and the other half fall above it. It is the most appropriate
locator of center values.
Me = Lme + ( n/2 – fb ) i
Fw
median class
12
c.Mode – value that appears with the highest frequency. It is determined by the
formula;
Mo = Lmo + ( d1 ) i
d1+ d2
Fractiles )
a.Quartiles – are values that divide the distribution into 4 equal parts.. These are
Q1 in which 25% or less of the distributions lie, Q 2, which 50% or less of the
distributions lie, and Q3, where 75% or less of the distributions lie.
fb1 = the less than or equal to cumulative frequency just below the
quartile one class
b.Deciles – are values that divide the distribution into 10 equal parts:
The deciles are: D1 , D2 , D3 , …, D9.
13
Note: You will notice that quantiles, deciles, and percentiles utilize the
median formula while they differ only in the subscripts.
MEASURES OF VARIATION:
A.Range – the difference between the upper boundary of the highest class
and the lower boundary of the lowest class.
R = UBHC - LBLC
C.Quartile Deviation, QD = Q3 – Q1
2
F.Standard Deviation, S = √ S2
a. Coefficient of Variation ( CV ),
CV= S x 100%
X
CQD = Q3 – Q1 x 100%
Q3 + Q1
14
Illustrative Problem:
Problem: The following are the distribution of the ages of 100 employees of Philippine
Christian University during the time of Carlito S. Puno as the President.
Class f M fM M-x f M-x d fd d2 fd2 ≤cumf
54-59 1 56.5 56.5 23.22 23.22 3 3 9 9 100
15
o) S = √ 68.84 = 8.30
Exercises:
1. For the given frequency distribution table determine the following: a) Mean b)
Me c) Mo d) Q3 e) Q1 f) D8 g) D5 h) P65 i) P35 j) R k) IQR l) QD m) MAD n)
S2 o) S p) CV q) CQD
Classes F M fM d fd d2 fd2 f M - x f M - x ≤cumf
95 – 99 2
90 – 94 2
85 – 89 7
80 – 84 9
75 – 79 10
70 – 74 8
65 – 69 2
16
2. In the given frequency distribution table, determine: a) Mean b) Me c) Mo d) Q 1
e) Q3 f) D5 g) D7 h) P25 i) P65 j) R k) IQR l) Q.D. m) MAD n) S2 o) S.D.
Classes F M fM d fd d2 fd2 fM-x f M-x ≤cumf
60 - 64 6
55 - 59 7
50 - 54 10
45 - 49 8
40 - 44 8
35 - 39 5
30 - 34 4
25 - 29 2
V. HYPOTHESIS TESTING
In either accepting or rejecting a null hypothesis, incorrect decision can be
made. A null hypothesis can be accepted when it should have been rejected or rejected
when it should have been accepted. Thus in accepting or rejecting the null, two types of
decision errors could be committed.
Another most widely used test of significance ( non- parametric) is the x 2 test. X2
can test for the significant differences between the observed distribution of data among
categories an the expected distribution of data based upon the null hypothesis ( or
significant relationship ) .It is used in cases of one- sample analysis, two- independent
samples or k independent samples.
Illustrative Problem.
Test the hypothesis that there is no significant relationship between the gender of the
employees and their job satisfaction level, if in a certain School the following results
were obtained at 0.05 significant level.
17
I.Statement of hypothesis:
Ho: There is no significant relationship between the gender of the employees
and their job satisfaction level.
H1: There is significant relationship between the sex of the employees and
Their job satisfaction level.
II.Statistical test: use the one sample x2 test.
Level of significant and Critical value:
@ 0.05 and df = ( r-1)( c -1) = ( 2 -1)(3 -1)= 2
Critical x2 value = 5.99
Expected value, E = Ct x Rt
Gt
III.Computation:
Male/ low: E = 54 x 160 = 45.71
189
Male/ med : E = 70 x 160 = 59.26
189
Male/ high : E = 65 x160 = 55.03
189
Female/ low: E = 54 x 29 = 8.29
189
Female/ med: E = 70 x 29 = 10.74
189
Female / high: E = 65 x 29 = 9.97
189
O–E X2 =∑ ( O – E )2
O E E
45 45.71 -0.71 0.011028
60 59.26 0.74 0.009241
55 55.03 -0.03 0.000016
9 8.29 0.71 0.060808
10 10.74 -0.74 0.050987
10 9.97 0.03 0.003009
∑X 2 = 0.135089
Decision: Since the critical X2 value of 5.99 ˃ the computed X2 value of 0.135089, the
null hypothesis, Ho is accepted while the alternative hypothesis, H 1 is rejected.
Therefore, there is no significant relationship between the gender of the employees and
their job satisfaction level.
18
Note: For any hypothesis testing involving the relationship between the critical statistic
test value and the computed value, when the critical value is greater than the computed
value (Critical value ˃ computed value), the null hypothesis is accepted leading to the
rejection of the alternative hypothesis. But when the critical value is less than the
computed value (critical value ˂ computed value), the null hypothesis is rejected leading
to the acceptance of the alternative hypothesis.
Exercise:
Test the hypothesis that there is no significant relationship between the students
class level and attitudes with respect to fraternities using 5% level of significance.
Students Favorable Neutral Unfavorable Total
Junior 80 60 70
Senior 100 50 70
Total
B. LINEAR CORRELATION
19
Student 1 2 3 4 5 6 7 8 9 10
Number
English grade 93 89 84 91 90 83 75 81 84 77
Mathematics 91 86 80 88 89 87 78 78 85 76
grade
20
21
n–2
Formula: t = r
1- r2
I. Statement of Hypotheses:
n–2
t = r
1- r2
Computation: 10 - 2
t = 0.89
1 – ( 0.89 )2
t = 5.52
Conclusion: In as much as the critical t- value of 2.306 < the computed t value
of 5.52, the null hypothesis (Ho) is rejected while the alternative hypothesis ( H 1)
is accepted. Therefore, there is a high correlation between grades in English
and grades in Mathematics.
22
Exercise:
Ten employees in one industrial organization have the following characteristics of number of years of
experience(X) and yearly salary (Y)(given in thousand pesos). Solve the Pearson product –moment
correlation (r) for the data and interpret the result.
SN X Y XY X2 Y2
1 7 18
2 11 16
3 33 25
4 24 22
5 5 19
6 18 23
7 35 24
8 12 19
9 9 21
10 10 26
Use for testing the null- hypothesis that the means of several populations are equal.
The comparison in means of 3 or more populations which follow normal distributions
can be taken simultaneously in just one application of this test. This test, therefore, is
the generalization of Z- test and t- test of two normal population means.
ANOVA uses a simple factor, fixed- effects model to compare the effects of one factor
on a continuous dependent variable. It uses squared deviations or variances so that the
computation of distances of individual data points from their own mean or from the
grand mean can be summed.
The test statistic for ANOVA is the F- ratio, comparing the variance from the two
sources.
Formula:
Where:
MSB = Sum of Squares between = SSB
Degree of freedom between dfB
Degree of freedom for SSB, dfB =k-1’,where k pertains to the number of groups
or samples
22
ΣX = Xa + Xb + Xc + …
Illustrative Problem:
Three brands of infant’s powdered milk ( Infant’s formula) were given to three groups of
8 infants and the results were monitored for a certain period of time during an outreach
program of a certain University in Cavite. The results in terms of weight gains are
tabulated below:
Test the hypothesis that there is no significant difference in the mean growth of the three
groups of infants given the three brands of infant powdered milk @ 0.01 level.
23
I.STATEMENT OF HYPOTHESES
Ho: there is no significant difference in the mean growth of the three
groups of infants given the three brands of infant powdered milk )
H1: There is a significant difference …given the three brands of infant
powdered milk.
Solution: ( completing the table below);
IV Computation:
F ratio = MSB
MSW
ΣX = XA + XB + Xc = 33.1 + 28.5 + 26.1 = 87.7
23
2 2 2 2
ΣX =XA +X + Xc = 330.31
B
Exercise
A. Three Administrators were task for packing noodles in a plastic cup that must
weigh 200 grams. A random sample of 6 plastic cups were weighed and the
results are tabulated below. Test the hypothesis that there is no significant
difference in the average weight of the cup noodles packed by the 3
administrators at 0.05 level.
A D M I N I S T R A T O R
Cup A B C
1 198 188 199
2 201 195 200
3 196 193 198
4 201 196 201
5 199 200 198
6 196 190 197
24
D. REGRESSION ANALYSIS
This section deals with the simplest type of prediction. When we tahe the
observed values of X to estimate or predict corresponding Y values, the process is
called simple prediction. When more than one x values is used, the outcome is a
function of multiple predictors. The simple and multiple predictions are made using a
technique called regression analysis.
y = a + bx
where: y = predicted value
a = y- intercept
b = slope of the line ( regression coefficient )
To find the y- intercept ( a ),
a = y – bx
y = mean of y- values
25
Illustrative Problem
Dr. Fred Santos, the Administrator of the biggest University in Asia would
like to estimate the number of enrollees that would be expected 7 th week of their
2-month long ( 8weeks) school promotion. The number of enrolees during the
past 6 weeks are tabulated below.
Solution.
1 6 6 1
2 5.5 11 4
3 6.4 19.2 9
4 5.1 20.4 16
5 4.9 24.5 25
6 6.6 39.6 36
7 ?
= y – bx x = Σx = 21 = 3.5
n 6
y = Σy = 34.5 = 5.75
n 6
26
a = 5.75 - ( - 0.0029 )( 3.5 ) = 5.76
By regression equation;
y = a + b x = 5.76 +( - .0029)( 7)
E. The table below shows the monthly income (x) and the monthly expenses (y) of 7
families in a certain barangay in Makati. Estimate the monthly expenditure of the
family whose income is P 8250.
The parametric tests are tests that require normal distribution and the level of
measurement are expressed in interval or ratio data.
Type of Parametric Tests are ( t-test, z- test, F- test, analysis of variance for the
test of difference and r, Pearson Product Moment Coefficient of Correlation for the
test of relationship/ association, and the test for prediction and forecasting are the
Simple Linear Regression Analysis, and Multiple Regression Analysis.
27
I. The t –Test . The t- test is used to compare two means, the means of two
independent samples or two independent groups and the means of correlated
samples before and after the treatment. Ideally, the t- test is used when there
are less than 30 samples, but some researchers use t- test even if there are
more than 30 samples.
X2 - X 1
t=
17 11
16 SS 1 + SS2 1 + 1 5
4 n 1+ n2 – 2 n 1 n 2
10
14 3
Where: t = the
12 t- test 7
X1 = mean
10 of group 1 2
X2 = mean
9 of group 2 6
SS1 = sum
17 of squares of group 1 13
SS2 = sum of squares of group 2
n1 = number of observations in group 1
N2 = number of observations in group 2
Solution:
_______________________________________________________________
Male ( X1 ) Female ( X2 )
2
X1 X1 X1 X 22
14 196 12 144
18 324 9 81
17 289 11 121
16 256 5 25
4 16 10 100
14 196 3 9
12 144 7 49
10 100 2 4
9 81 6 16
17 289 13 169
2 2
ΣX1 =131 ΣX1 =1891 ΣX2 =78 ΣX2 = 738
28
n1= 10 n2 = 10
X1 = 13.1 X2 = 7.8
t= X1 - X 2
SS1 + SS2 1 + 1
n1+ n2 – 2 n 1 n2
= 13.1 – 7.8
174.9 + 129.6 1 +1
10 + 10 – 2 10 10
t = 2.88
H0: x1 = x2
H1: There is a significant difference between the performance of male
and female AB students in spelling.
H1: x1 ≠ x2
III. Level of Significance:
α = .05 ; df =[ n1 + n2] -2 = 10 -+ 10 -2 =lar/ critical value, reject the null ( H0 )
IV. Conclusion: Since the t- computed value of 2.88 is greater than the t- tabular
value of 2.101 at .05 level of significance with 18 degrees of freedom, the null
hypothesis is rejected in favor of the alternative hypothesis. This means that
there is a significant difference between the performance of male and female
AB students in spelling. It implies that the male perform perform better than
the female students considering that the mean/ average score of the male
students of 13.1 is greater compared to the average score of female students
of only 7.8.
29
Exercise 1.
Two groups of experimental rats were injected with tranquilizer at 1.0 mg. and 1.5
mg dose respectively. The time given in seconds that look them to fall asleep is hereby
given. Use the t-test for independent samples at .01 to test the null hypothesis that the
difference in dosage has no effect on the length of time it took them to fall asleep.
1.0 mg. dose 9.8 13.2 11.2 9.5 13.0 12.1 9.8 12.3 7.9 10.2 9.7
1.5 mg. dose 12.0 7.4 9.8 11.5 13.0 12.5 9.8 10.5 13.5
Exercise 2.
To find out whether a new serum would arrest leukemia, 16 patients, who had all
reached an advanced stage of the disease, were selected. Eight patients received the
treatment and eight did not. The survival was taken from time the experiment was
conducted.
No Treatment ( x1) 2.1 3.2 3.0 2.8 2.1 1.2 1.8 1.9
With treatment ( x2 ) 4.2 5.1 5.0 4.6 3.9 4.3 5.2 3.9
t= D
Σ D - (ΣD )2
2
n
n (n-1)
where: D = the mean difference between the pretest and the posttest
Σ D2 = the sum of squares of the difference between the pretest and the post test
Σ D = the summation of the difference between the pretest and the post test
n = the sample size
30
Pretest Posttest
X1 X2 D D2
20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6 36
15 20 -5 25
20 15 5 25
18 30 -12 144
15 10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5 25
2
ΣD = -81 ΣD = 947
D = -81 = -4.05
20
947 - ( -81 )2
20
20 ( 20 - 1)
t = - 3.17
31
Solving by Stepwise` Method:
I.Problem: Is there a significant difference between the pretest and the posttest on
the use of program materials in English?
II.Hypothesis: H0: There is no significant difference between the pretest and post
test on the use of the programmed materials did not effect the
student’s performance in English.
H1: The posttest result is higher than the pretest result.
III.Level of Significance:
α = .05
df = n-1= 20 -1 = 19
t@ .05 = -1.729 = -1.73
Exercise
An admission test was administered to incoming freshmen in the College of Nursing
and veterinary medicine with 100 students. Each was randomly selected. The mean
score of the given samples were x1= 90 and x2 = 85 and the variances of the test scores
were 40 and 35, respectively. Is there a significant difference between the two groups?
Use .01 level of significance.