Documente Academic
Documente Profesional
Documente Cultură
Statistics is the study of the methods for describing and interpreting quantitative
information, including techniques for organizing and summarizing data and techniques for making
generalizations and inferences from data. The first of these two broad classes of methods is called
descriptive statistics, and the second is called inferential statistics.
Descriptive statistics refers to the procedures for organizing, summarizing, and describing
quantitative information which is called data. For example, a basketball fan is accustomed to
checking over his favorite player’s shooting average; the sales manager relies on charts showing
the sales distribution of an enterprise.
The second class of statistics, inferential statistics, include methods for making inferences
about a larger group of individuals on the basis of data collected on a much smaller group.
1. Entity. When we make observations about persons, places, and things, we call that which
is being observed an entity, regardless of the type of unit involved.
2. Variable. A characteristic that assumes different values for different entities is called a
variable. By contrast, a characteristic that retains the same value from entity to entity is
called a constant. The different values that one observes (or measures) are called
observations.
3. Quantitative Variable. A quantitative variable is one whose values are expressible as
numerical quantities, such as measurements and counts. A measurement taken on a
quantitative variable conveys information regarding amount.
4. Qualitative variable. A qualitative variable is one that is not measurable or countable.
Many characteristics can be classified only. A measurement taken on a qualitative variable
conveys information regarding attributes.
1
5. Discrete Variable. A discrete variable is one that can assume only certain values within
an interval. A discrete variable is characterized by interruptions between values that the
variable can assume.
6. Continuous variable. There is a continuum of values that a continuous variable can
assume- all whole numbers and all values in between.
7. Population. The largest collection of values of some variable in which there is interest
constitutes the population of these values.
8. Sample. A sample is a part of a population.
Summarizing Data
An ordered array is a list of the observations in order of magnitude. The order may be
from smallest value to the largest value or from the largest to the smallest.
A frequency distribution is any device, such as graph or table, that displays the values that
a variable can assume along with the frequency of occurrence of these values, either individually
or as they are grouped into a set of mutually exclusive and exhaustive intervals.
Class intervals are contiguous, nonoverlapping intervals selected in such a way that they
are mutually exclusive and exhaustive. That is, each and every value in the set of data can be
placed in one and only one of the intervals.
2
4. Organize the class intervals and proceed constructing your frequency distribution table.
# Also Discuss: True class limits (class boundaries); lower/upper class limits, class marks.
Sometimes one wants a cumulative frequency distribution. The entries in the cumulative
frequency < column is obtained by adding the number of observations from the first interval
(smallest) through the preceding interval, inclusive. A cf < indicates the number of observations
that fall below a specified upper boundary. Meanwhile, to obtain the entries in the cumulative
frequency > column, we add the number of observations from the largest interval (largest) to the
smallest interval. A cf > indicates the number of observations that fall above a specified lower
boundary.
3
IV. Descriptive Measures
The Arithmetic Mean is the most popular measure of central tendency. We find it by adding
all the values in a set of data and dividing the total by the number of values that were
summed.
ΣXi
Ungrouped data: = 𝑛
4
4. The mean cannot be obtained by inspection, it is a computed value and therefore can
be manipulated algebraically.
The median is that value above which half the values lie and below which the other half
lie. If the number of items is odd, the median is the value of the middle item of an ordered
array, when the items are arranged in ascending (or descending) order of magnitude. If the
number of items is even, none of the items has an equal number of values, above and below
it. In this case, the median is equal to the mean or average of the two middle values.
where:
L = the lower boundary of the class interval in which the median is located.
j = the number of values still needed to reach the median after the lower
limit of the interval containing the median has been reached.(n/2 – cf<).
i = class size.
f= the frequency in the class interval containing the median.
5
information regarding the magnitude of measurements near the center of the data set is
available.
The median for a frequency distribution is that value or point on the horizontal axis of the
histogram of the distribution at which a perpendicular line divides the area of the histogram
into two equal parts.
The mode for ungrouped discrete data is the value that occurs most frequently. If all the
values in a set of data are different, there is no mode. When we want to find the mode of
a frequency distribution, we usually specify the modal class, which is defined as the class
interval containing the largest number of values.
B. Measures of Dispersion
Once we have computed the mean of a data set, we want to know the extent to which the
values differ from this mean. We use the term dispersion to describe the degree to which
a set of values vary about their mean. When the values are closed to the mean, they exhibit
less dispersion than when some of the values are much larger and/or much smaller than the
mean.
The range is the difference between the largest and the smallest values in a set of data. For
grouped data, the range is simply the difference of the exact upper limit of the largest class
interval and the exact lower limit of the smallest class interval.
The variance uses all the deviations of values from their mean. It is the average of the
squared deviations of the individual values from the mean of the data set.
The standard deviation is simply the positive square root of the variance.
Sometimes the need arises to compare the variability present in two sets of data. This
usually can be done by comparing the two variances or standard deviations if the data sets
6
satisfy two conditions: 1) the same unit of measurement is employed in both data sets; 2)
the means of the two data sets are approximately equal. If either of these conditions is not
met, we need a relative measure of dispersion for use in comparing the variability of the
two data sets. Such relative measure of dispersion is the coefficient of variation. The
sample coefficient of variation (CV) is equal to the ratio of the standard deviation to the
𝑠
mean. That is, CV = 𝑥 The CV is frequently multiplied by 100 and expressed as a percent.