Sunteți pe pagina 1din 122

St.

Paul University Philippines

A Course Presentation in
STATISTICS
Definition
Statistics is a science that
deals with the collection,
presentation, analysis and
interpretation of data.
Definition
Statistics is a collection of
methods for planning
experiments, obtaining data
and then organizing,
summarizing, presenting,
analyzing, interpreting and
drawing conclusions based on
the data.
Divisions of
Statistics
Descriptive and
Inferential Statistics
Descriptive Statistics is a
statistical procedure
concerned with the describing
the characteristics and
properties of a group of
persons, places or things.
Example
A teacher computes the
average grade of her
students and then
determine the top ten
students.
Inferential Statistics is a statistical
procedure that is used draw
inferences or information about the
properties or characteristics of
people, places or things on the basis
of the information obtained from the
small portion of a large group.
Example
A dermatologist tests the
relative effectiveness of a
new brand of medicine in
curing pimples and other
skin diseases.
Basic
Terminologies
Population vs. Sample
A population is the complete
collection of elements (scores,
people, measurements, and so on) to
be studied.
A sample is a sub-collection of
elements drawn from a population.
Parameter vs. Statistic
A parameter is a numerical
measurement describing some
characteristic of a population.
A statistic is a numerical
measurement describing some
characteristic of a sample.
Data
Data are facts, or set of
informations or observations under
study. More specifically, data are
gathered by the researcher from a
population or from a sample. Data
may be classified into two categories,
qualitative and quantitative data.
Nature of Data
Qualitative vs.
Quantitative Data
Qualitative (or categorical or
attribute) data can be separated into
different categories that are
distinguished by some non-numerical
characteristic.
Quantitative data consist of
numbers representing counts or
measurements.
Discrete vs. Continuous
Data
Discrete data result from either a
finite number of possible values or a
countable number of possible values.
(That is, the number of possible
values is 0, 1, 2 or more)
Continuous data result from
infinitely many possible values that
can be associated with points on a
continuous scale in such a way that
there are no gaps or interruptions.
Levels of
Measurement
Nominal Level of
Measurement
The nominal level of measurement is
characterized by data that consists
of names, labels, or categories only.
The data cannot be arranged in an
ordering scheme. This is used when
we want to distinguish one object
from another for identification
purposes.
Ordinal Level of
Measurement
The ordinal level of measurement
involves data that may be arranged in
some order, but differences between
data values either cannot be
determined or are meaningless.
Interval Level of
Measurement
The interval level of measurement is
like the ordinal level, with the
additional property that meaningful
amounts of differences between data
can be determined. However, there
are no inherent (natural) zero
starting point.
Example: body temperature, year
(1955, 1843, 1776, 1123, etc.)
Ratio Level of
Measurement
The ratio level of measurement is
the interval modified to include the
inherent zero starting point. For
values at this level, differences and
ratios are meaningful.
Example: weights of plastic, lengths
of movies, distances traveled by cars
Data Gathering
Techniques
The main objective of
Statistics
To help us in making wise
decision.
Decision-making is an important
part of our lives. Everybody
makes decisions almost
everyday.
For instance, students decide
on what course they would
take in college that could
give them high salary and a
better future.
Mothers decide on what
brand of milk to buy.
Business-minded people
think whether to put their
money in the bank or to
open a business or a
factory
Collecting Data

In conducting a study or
research, collection of data is
the first step. Data may be
gathered from primary or
secondary sources.
Two Sources of
Data
Primary Sources of Data
Primary sources of statistical
data are the government
institutions, business agencies,
and other organizations. For
example, National Statistics
Office (NSO), Information
derived from personal interview.
Secondary Sources of
Data
Secondary Sources are books,
encyclopedia, journals, magazines,
and research or studies conducted by
other individuals.
Different Ways of
Collecting Data
The Direct or Interview
Method
In this method, the researcher has a
direct contact with the interviewee.
The researcher obtains the
information needed by asking
questions and inquiries from the
interviewee. This method is usually
used in business research.
The Direct or Interview
Method
For example, a business firm would
interview residents of a certain barangay
regarding their favorite brand of
toothpaste, soap or shoes. TV personnel
would ask televiewers about their
favorite noontime show. Even political
analysts use this method to determine
public opinion or preferences for
candidates in upcoming elections.
Using this method, the researcher
can get more accurate answers on
responses since clarifications can be
made if the interviewee or
respondent does not understand the
question. However, this method is
costly and time-consuming.
The Indirect or
Questionnaire Method
This method makes use of written
questionnaire. The researcher gives or
distributes the questionnaire to the
respondents either by personal delivery or
by mail. Using this method, the researcher
can save a lot of time and money in
gathering the information needed because
questionnaires can be given to a large
number of respondents at the same time.
However, the researcher cannot
expect that all distributed
questionnaire will be retrieved
because some respondents simply
ignore the questionnaires. In
addition, clarifications cannot be
made if the respondent does not
understand the question.
The Registration Method
This method of colleting data is
govern by laws. For example,
birth and death rates are
registered in the National
Statistics Office for records
and future use. The number of
registered cars can be found at
the Land Transportation Office
(LTO).
The Registration Method

The list of registered voters in


the Philippines is found in the
Commission on Elections
(COMELEC). This method of
gathering data is perhaps the
most reliable because this is
enforced by law.
The Experimental
Method
This method is usually used to
find out the cause and effect
relationships. Scientific
researchers often use this
method.
The Experimental
Method
For example, agriculturists
would like to know the effect
of a new brand of fertilizers
on the growth of plants. The
new kind of fertilizers will be
applied to ten sets of plants.
Determining Adequate
Sample Size
In research we seldom use the entire
population because of the cost and time
involved. In fact, most researchers do not
use the population in their study. Instead,
the sample which is a small representative
of a population is used. The characteristics
of the whole or entire population is
described using the characteristics
observed from the sample.
To determine the sample
size from a given
population size, the
Slovins formula is used.
Sampling Formula
(Slovins)
N
n = -----------
1 + eN
2

Where n = sample size


N = population size
e = margin of error
Observe that there is a margin
of error. When we use a sample,
we do not get the actual value
but just an estimate of the
parameter. Hence, there is
error associated when using the
sample.
Examples in finding the
sample size
1. A group of researchers will conduct a
survey to find out the opinion of
residents of a particular community
regarding the oil price hike. If there
are 10,000 residents in the community
and the researchers plan to use a
sample using a 10% margin of error,
what would be the sample size be?
Example for Slovins
Formula
Solution: Here: N = 10 000 and e =
10% or 0.10. Substituting the given
values in the formula, we have
10 000 10 000
n =-------------= ------------
1+(.10)2(10 000) 1+(.01)(10 000)

n =10 000/101= 99.01or 99


Example 2.

Suppose that in Example 1,


the researchers would like
to use a 5% margin of
error. What should be the
size of the sample?
Example 2

Solution: Here: N = 10 000 and e = 5%


or 0.05. Substituting the given values
in the formula, we have
10 000 10 000
n =-------------= ------------
1+(.05)2(10 000) 1+(.0025)(10 000)

n =10 000/1+25= 384.62or 385


What did you observe of the sample
size as we reduce the margin of
error?

If you want to have a more accurate


result, are you going to consider a
larger sample?
SAMPLING
TECHNIQUES
Definition
Sampling may be defined as
measuring a small portion of
something and then making a general
statement about the whole thing
(Bradfield & Moredock, 1957)
Why do we need
sampling?
Why we need sampling
Sampling makes possible the study of
a large, heterogeneous population.
Sampling is for economy, speed, and
accuracy.
Sampling saves the sources of data
from being all consumed.
General Types of
Sampling
There are two general
types of sampling

Probability Sampling
Non-Probability Sampling
Probability Sampling
The sample is a proportion (a certain
percent) of the population and such
sample is selected from the
population by means of some
systematic way in which every
element of the population has a
chance of being included in the
sample.
Non-Probability Sampling
The sample is not a proportion of the
population and there is no system in
selecting the sample. The selection is
dependent on the situation from
which the sample is taken. This
technique lacks objectivity of the
selection. It is sometimes called
subjective sampling.
Types of Non-Probability
Sampling are
Convinience Sampling
Quota Sampling
Purposive Sampling
Convinience Sampling
This is used because of the
convenience it offers to the
researchers.
Example: The researcher wishes
to investigate the most popular
noontime show may just interview
the respondents through the
telephone.
Quota Sampling
In this type of sampling,
the proportions of the
various subgroups in the
population are determined
and the sample is drawn to
have the same percentage
in it.
Quota Sampling
Example: Suppose we want
to determine the
teenagers most favorite
brand of t shirt. If there
are 1000 female and 1000
male teenagers and we
want to draw 150 members
Quota Sampling
for our sample, we can
select 75 female and 75
male teenagers from the
population without using
randomization.
Purposive Sampling
This is based on certain
criteria laid down by the
researcher. People who
satisfy the criteria are
interviewed.
Purposive Sampling
In purposive sampling, the respondents are
chosen on the basis of their knowledge of
the information desired.
Ex: If a research is to be conducted on
the history of a place, the old people of
the place must be consulted and included
in the sample.
Purposive Sampling
Example: Suppose the target
is to find out the effectivity
of a certain kind of champoo.
Of course bald fellows will
not be included in the sample.
Types of Probability
Sampling are
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
RANDOM SAMPLING
Simple Random Sampling is a sampling
technique where members of the
population are selected in such a way that
each member has an equal chance of being
selected.
It is also called the lottery or raffle type
of sampling.
Stratified Sampling
With stratified sampling, the
population is subdivided into at least
two different subpopulations(or
strata) that share the same
characteristics (such as gender), and
then a sample is drawn from each
stratum.
Systematic Sampling
In systematic sampling, one chooses
a starting point and then select every
kth (such as every 5th) element in
the population.
Cluster Sampling
In cluster sampling, the population
area is divided into sections (or
clusters), a few of those sections are
randomly selected , and then all the
members from the selected sections
are chosen as samples.
Measures of Central
Tendency
Measures of Central
Tendency
Numerical values which describe the
average or typical performance of
given group in terms of certain
attributes.
Basis in deterring whether the group
is performing better or poorer than
the other groups.
Mean
The most reliable and the most
sensitive measure of position.
It is the most widely used
measure.
It is commonly known as the
average although the median and
the mode are also known as
averages.
Mean:
It comes into 2 different
forms:
1) Simple Mean
2) Weighted Mean
Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-
food meals.

Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
How to solve the simple
mean:
The simple mean is obtained by
adding all the values/
observations of a certain
variable and divide the sum by
the total number of values,
cases or observations.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16

To obtain the simple mean amount


of fat for the 5 fast-food meals
Mean = (14+18+22+10+16)/5
Mean = 80/5 = 16
This means to say that mean fat
content of the 5 fast-food meals
is too much.
Example 2:
The following represents the final
grades obtained by a nursing
student one summer term:
Anatomy (5 units) - - - 93
Chemistry (3 units) - - - 88
SOT 2 (2 units) - - - 89
Find the weighted average of the
student.
To solve for the weighted average
of the student we have...
wixi
Mean = ----------
w

93(5) + 88(3) + 89(2)


Mean = --------------------------
10

465 + 264 + 178 907


Mean = ----------------------- = -------- = 90.7 (Excellent)
10 10
Example 3:
The following represents the responses of
50 randomly chosen respondents in one
item of a research questionnaire:
Very Strongly Agree (5) - - - 17
Strongly Agree (4) - - - 11
Agree (3) - - - 9
Disagree (2) - - - 12
Strongly Disagree (1) - - - 1
Find the weighted response of the
respondents.
To solve for the weighted
response we have...
wixi
Mean = ----------
w

5(17) + 4(11) + 3(9) + 2(12) + 1(1)


Mean = ------------------------------------------
50

85+44+27+24+1 181
Mean = ----------------------- = -------- = 3.62 (Strongly Agree)
50 50
Table of Interpretation
(5 pt. Likert Scale)
4.20 5.00 Very Strongly Agree
3.40 4.19 Strongly Agree
2.60 3.39 Agree
1.80 2.59 Disagree
1.00 1.79 Strongly Disagree
The Median
What is
the
Median?
The median is . . .
A positional measure that divides
the set of data exactly into two
parts.
It is the score/observation that is
centrally located between the
highest and the lowest observation.
Determined by rearranging the data
into an array.
Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-
food meals.

Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
Median for Odd Sample

Odd???
The array for the data A is :

10, 14, 16, 18, 22


To obtain the median fat
content of the 5 meals we have
to use the median formula for
odd sample since n = 5.
Median = [(n + 1)/2]s
Median = (5 + 1)/2
Median = 3rd item = 16
Median for
Even Sample

What is
even?
The following are samples scores
obtained from a 75 item summative test:
(n= 12) 48, 53, 63, 65, 45, 47, 52, 48,
63, 54, 63, 53

Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65

Since n = 12 (even).
Median = [ 6ths + 7ths /2]
Median = [(53 + 54)/2] = 53.5
Mode
The mode is
The most favorite score.
The score having the highest
frequency.
The most frequently occurring score.
The least reliable measure of position
Determined by way of inspection.
A set of data is said to
be
Unimodal or monomodal if it
has only one mode.
Example: 33, 35, 35, 38,
40, 46
Its mode is 35.
A set of data is said to
be
Bimodal if it has two modes.
Example: 33, 35, 35, 38,
40, 40, 46
Its modes are 35 and 40.
A set of data is said to be
Multimodal if it has more than
two modes.
Example: 33, 35, 35, 38, 40,
40, 46, 46, 51, 58, 58, 60
Its modes are 35, 40, 46 and
58.
Grouped
Data
What is a Frequency
Distribution?
A Frequency
Distribution is a tabular
representation of data
consisting of intervals
and their respective
frequencies.
Other ways of
presenting
data are . . .
BAR CHART
LINE GRAPH
PIE CHART
Scatter Plot
How to construct a
Frequency Distribution:
Determine the range. R = H0
LO.
Determine the ideal class interval
(ICI).
Determine the class size (i) using
the formula, i = R/ICI.
Construct the interval
Tally the data and determine the
frequency for each interval.
The class interval in a
frequency distribution must:
Not overlap.
Be relatively complete where
each data can be tallied in the
different interval.
Have a uniform class size.
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Total 3342 Mean 66.84
Frequency Distribution
Uses of the Measures
of Central Tendency
The Mean is used
For interval and ratio measurements
When there are no extreme values in a
distribution since it is easily affected by
extremely high or extremely low scores
When higher statistical computations are
wanted
When the greatest reliability of the
measure of central tendency is wanted
since its computations include all the given
values
The Median is used
For ordinal and ranked measurements
When there are extreme values, thus the
distribution is markedly skewed
For an open-end distribution; that is, the
lowest or the highest class interval or both
are defined (i.e., 50 and below or 100 and
above)
When one desires to know whether the
cases fall within the upper halves or the
lower halves of a distribution.
The Mode is used
For nominal and categorical data
When a rough or quick estimate of a
central value is wanted
When the most popular or the most
typical case or value in a distribution
is wanted
Limitations of the
Measures of Central
Tendency
The Limitations of the Mean
It is the most widely used average, since it
is the most familiar. However, it is often
misused. It can not be used if the
clustering of values. Or items is not
substantial.
If the given values do not tend to cluster
around a central value, the mean is a poor
measure of central location.
It is easily affected by extremely large or
small values. One small value can easily pull
down the mean.
The Limitations of the Mean
The mean can not be used to compare
distributions since the means of 2 or more
distributions may be the same but their
other characteristics may be entirely
different. The means of distribution A
whose values are 80, 85 and 90 and
distribution B whose values are 86, 85, 84
are both 85. We can not imply, however,
that both distributions possess the same
characteristics since their patterns of
dispersions or variations are markedly
different despite having the same mean.
The Limitations of the Median
It is easily affected by the number of
items in a distribution.
It can not be determined if the given values
are not arranged according to magnitude
If several values are contained in a
distribution, it becomes laborious task to
arrange them according to magnitude
Its value is not as accurate as the mean
since it is just an ordinal statistic.
The Limitations of the Mode
It is seldom or rarely used since it
does not always exist.
Its value is just a rough estimate of
the center of concentration of a
distribution.
It is very unstable since its value
easily changes depending on the
approaches used in finding it.
Measures of
Vari
ability
Measures of Central
Tendency
Indicate or describe how spread the
scores are. The larger the measure
of variability the more spread the
scores are and the group is said to be
heterogeneous ; the smaller the less
spread the scores are and the group
is said to be homogenous.
Measures of Variability
The statistical tool used to
describe the degree to
which scores/ observations
are scattered/dispersed.
It is also used to determine
the degree of consistency/
homogeneity of scores.
Measures of Variability
Range(R) = HO - LO
Mean Absolute Deviation
(MAD)
Standard Deviation(s)
Variance(s2)
Coefficient of Variation (CV)
The following are the scores obtained by
two groups of 2nd year ASHE students in
N101:
Group A Group B
30 30
28 20
27 18
25 16
25 15
23 15
21 14
20 13
18 12
12 12
Range = 30 12 = 18

Standard devn =

G 256.9/(10-1)
R
= 28.54
O = 5.34
U
P Mean Absolute Devn
= 41.2/10
= 4.12
A
Variance = (5.34)2
=
28.54

CV = (5.34/22.9) X 100

= 23.32%
Problem:
Two seemingly equally excellent BSN
students are vying for an academic
honor where only one must have to be
chosen to get the award. The
following are their grades used as
basis for the award:
Franzen : 91, 90, 94, 93, 92
Rico : 92, 92, 90, 94, 92
Whom do you think deserves to get
the award?
Guiding Principle
The lesser the value of the
measure, the more consistent,
the more homogeneous and
the less scattered are the
observations in the set of
data.

S-ar putea să vă placă și