Quan Tech-1

©Dr. Valerie P.
Muehsam, 2006
Quantitative Techniques in Business
Introduction to Statistics
In the business world, and in fact, in practically every aspect of daily living, quantitative techniques are used to assist in
decision making. Why? Unlike the classroom, in the “real world” there is often not enough information available to be guaranteed of
making a correct decision. For instance, if advertisers would like to know how many households in the United States with televisions
are tuned to a particular television show, at a particular date and time, it would be impossible to determine without the complete
cooperation of every household and an astonishing amount of time and money. If a consumer protection agency wanted to determine
the true proportion of prescription drug users who also use herbal non-regulated over-the-counter supplements, this information would
most likely not be available. As a result of the inability to determine characteristics of interest, the application of statistics, and other
quantitative techniques has developed.
Statistics is defined as the process of collecting a sample, organizing, analyzing and interpreting data. The numeric values
which represent the characteristics analyzed in this process are also referred to as statistics. When information related to a particular
group is desired, and it is impossible or impractical to obtain this information, a sample or subset of the group is obtained and the
information of interest is determined for the subset. For instance someone is interested in the average annual income of all the
students with majors in the College of Business Administration at Sam Houston State University, the only way this information could
be obtained is if the annual income of every student in this population could be collected, recorded and analyzed without error. Since
this would take considerable time and money, and since the probability of collecting the data necessary to determine the true annual
salary of the students is small, a sample of this population will be taken. The sample mean annual salary of the sample of students will
be determined and used to estimate the true mean annual salary of all the students with majors in the College of Business
Administration at Sam Houston State University.
The study of statistics consists of two types: descriptive statistics and inferential statistics. Descriptive statistics are
characteristics, usually numeric, used to describe a particular data set. An example of a descriptive statistic would be the average final
exam grade of ten students in an elementary statistics class. This average test score is used to indicate a “typical value” for the exam
grades of the ten students. Inferential statistics, on the other hand, are similar to descriptive statistics in that each is calculated from a
sample, but the difference is the use of the statistic. In inferential statistics, the statistic is used to make inference, or make decisions,
about the entire population of interest. In other words, we take a sample and calculate a statistic and use that statistic to make
inference about the actual value of the characteristic in the entire population.
For instance, there are many descriptive characteristics of a firm’s customers that their management would like to know but
this information may be difficult or impossible to determine. Measurement of each and every customer of a large retail firm is nearly
impossible. Even if the information were gathered, it would be unlikely that it would be timely.
Unfortunately, managers do not always know what mean (average) weekly demand for a product will be or what
proportion of television viewers will watch a particular show. Since these parameters of interest are not known, and usually
impossible or impractical to determine, the parameters will be estimated using partial information gathered from a sample.
For instance, if the desired parameter is the mean annual salary of the income earning residents of a particular county, a
sample of 200 of these residents could be obtained and the annual salary of each resident (element) in the sample could be determined
and the mean annual salary of the sample residents. If the sample is drawn in a random fashion from a frame, or list, of the entire
population, and if we use correct statistical techniques, the sample mean annual salary (a statistic) may be a good estimate of the true
mean annual salary (a parameter) of all the residents of this county.
A population includes all the elements of interest. We use the term “element” to represent each individual unit of a group
in which we have interest. For instance, elements may refer to people (i.e., customers), records (i.e., all loan accounts at a particular
bank), products (i.e., we are interested in the proportion defective) etc. The notation used in statistics to represent the population size
is “N”. In our example above, the population of interest would be all the income earning residents of the county. Each of these
residents is an element in our population. If the population of the income earning residents in the county was 50,000 then N = 50,000.
The size of the population, N, is often not known.
A sample is a subset of the population. The notation for the sample size is “n”. In our previous example, the sample
would be the 200 residents we sampled out of all the income earning residents in the county. In this case n = 200.
A parameter is a characteristic, usually numeric, of the population. Populations have many parameters but researchers are
often interested in only one or two of these characteristics. For instance, in our example above, the parameter of interest is the
population mean annual salary of all the income earning residents of the county. The mean annual salary is but one of many other
characteristics of this population that may be of interest and could also be estimated. The proportion of these residents who support a
particular school bond issue and the mean age of the residents are two examples of other parameters that may be of interest.
A statistic is a characteristic, usually numeric, of the sample. Samples, like populations, also have many statistics that may
be calculated. For each parameter of a population, there is a corresponding statistic that may be calculated from a sample. An
important item to remember is that a statistic is a random variable which indicates that each sample may result in many different
values for the statistic. For instance, in the example above, the statistic is the sample mean annual income of the 200 residents of the
county. This value is called the “sample mean” because it is calculated from the sample.
Although the sample mean is our “best guess” for the value of the population mean it is one of many possible values that
could be calculated from different samples of size 200. In other words, there are many samples of 200 that could be collected from the
population of 50,000 residents. Unfortunately, even if we take a random sample of 200, we could end up with the most affluent 200
residents in the county. The sample mean calculated from this sample would not be representative of the population. The possibility
of collecting a sample like this cannot be ignored. We will, however, learn to use statistical techniques that allow us to estimate the
probability of getting a value for the sample statistic that is not a good estimate of the population parameter.
The use of statistics to estimate parameters of interest is not guaranteed to be successful. If the estimate is not “good” the
result could be a faulty decision that, in turn, could result in loss of time and/or revenue. We must not allow quantitative techniques to
make decisions for us, we must use these techniques only as a tool to assist us in decision making.
Scale of Data Measurement
2
Before any statistical technique is employed, a researcher must determine the type of data that is to be collected. In a
general sense, there are two types of data: qualitative data and quantitative data.
Qualitative data categorizes an element by a non-numeric attribute. For instance, if we are interested in which political
party a resident belongs to, we are categorizing the resident using qualitative data: Democratic, Republican, Independent, etc.
Qualitative data is often the data we are interested in gathering in the social sciences and particularly in business. For instance, much
of what we want to know in business is related to attitudes or behavior of consumers. The data is not numeric and therefore more
difficult to analyze. We often calculate the proportion of elements with a particular characteristic (i.e., the proportion of residents who
own their own home) but many techniques cannot be used on this type of data.
There are two types of qualitative data: nominal data and ordinal data. Nominal data is, in terms of structure, the
lowest form of data. Nominal data is qualitative data that has no natural order. Examples of nominal data include: gender; political
affiliation; type of car owned; product model; etc. Data comprised of “numbers” can also be qualitative data. Zip codes, area codes,
telephone numbers are examples of data that are qualitative. In math terms, these data are not “real” numbers because they do not
represent numeric measures. One way to determine whether “numbers” are numeric measures is to consider whether one might be
interested in an average of these “numbers”. If a number can be replaced with letters, words or symbols without losing any
information then this indicates that a “number” is NOT a numeric measure. Ordinal data is qualitative data that has a natural order.
Examples of ordinal data include: military rank; size of clothing using S, M, L, XL; place in which a race was finished; condition of a
used appliance using POOR, AVERAGE, GOOD, EXCELLENT; etc. While ordinal data has an order, the intervals between the
rankings are not equal intervals. Thus, while ordinal data has more structure than nominal data, math functions on the data, such as
differences, are not valid.
Quantitative data categorizes an element by a numeric measure. Quantitative data are true numbers and, as a result, more
quantitative techniques are available for use with this data. Quantitative data can be divided into two types of data: interval data and
ratio data. Interval data is quantitative data that has no natural starting point or zero level. Examples of interval data include
Fahrenheit temperature and scores on IQ tests. Each (of these type data) is a numeric measure but neither has a natural starting point
or zero level. Zero degrees Fahrenheit is not the absence of temperature just as there is no zero level for a test of intelligence. Interval
data can be used for any technique that requires quantitative data, however, we must realize that ratios have no meaning with this type
of data since there is no natural zero level. For example, 50 degrees Fahrenheit is not twice as warm as 25 degrees Fahrenheit. Ratio
data is quantitative data that has a natural starting point or zero level. Most quantitative data falls into this scale of data measurement.
Examples of ratio scaled data include height, weight, rate of return, net income, etc. Since there is a natural zero level, ratios have
meaning.
Measures of Central Tendency
Once we have decided the type of data that we are going to collect, we must determine the type of techniques that are
appropriate for analyzing the data. The first organizational technique we will most likely perform is to order the data from smallest
value to largest value. We order the data to get an idea about the range of the values observed. Consider a particular example, if we
have collected annual income figures from 1,000 households what might we be interested in knowing about this data? Perhaps we
3
would be interested in a typical annual income value for the data set. Typical values are often referred to as Measures of Central
Tendency. Measures of central tendency are attempts to identify typical values which are representative of the 1,000 observations
collected. The three most common measures of central tendency are the mean, the median and the mode. All three of these
measures are referred to as “average” or “typical” values although they are each different measures of typical.
The first, and most popular, measure of central tendency is the arithmetic mean, hereafter referred to as simply the mean.
The mean is calculated as the sum of the observations divided by the number of observations. The sample mean is denoted x and
the formula for calculating the sample mean is: x=

∑x . The population or true mean is denoted µ (the Greek script letter
n
“mu”) and is calculated the same way as the sample mean except that all elements in the population are measured.
The mean requires at least interval scaled data which means it is only valid for true numeric measures. The mean is often
referred to as the “gravitational center of the data set” which is similar to the balancing point of the data. If equal weights were
placed on a scale representing a number line for each observation in a data set, the mean would be the point at which the scale
balances. Since each observation has an equal weight, the magnitude of the values influence the mean. The mean, while certainly the
most commonly used measure of central tendency, is not always a good measure of “typical.” For instance, data sets that include
extreme values relative to the rest of the data “pull” the mean in that direction. Extremely small values cause the mean to be “small”
and extremely large values cause the mean to be “large.” The result is that the mean is not a “good” measure of typical and in fact,
may be larger or smaller than all values except the extreme one. When extreme values occur in a data set, we often use another
measure of typical referred to as the median. For instance, attempts to find a typical income often is best expressed as the median
income rather than the mean income since there is a lower limit (zero) but not an upper limit on income.
The median is the second most commonly used measure of central tendency and is referred to as the positional average.
The median is the center value in an ordered data set. If the data set has an odd number of observations then the median is the value
found in the center of the distribution of ordered values. If the sample set has an even number of values then the median is the mean
of the two values surrounding the center of the data set. The median is also P50, the fiftieth percentile. This means that 50% or half of
the values are smaller than the median and half of the values or 50% are greater than the median. The procedure for finding the
median is:
1. Order the data set from smallest to largest (or largest to smallest). NOTE: this requires that the data can be
ordered so the median cannot be found for nominal data.
2. Find i, which is the location or position of the median. This position can be calculated by using the
n +1
following formula: i= , where n is the size of the sample.
2
3. If i is an integer then the median is the value found at the ith position in the ordered data set. If i is not an
integer, then the median is the mean of the two values surrounding the ith position.
4
The median is often denoted as M or ~
x.
The last of the more common Measures of Central Tendency is called the mode. The mode is the most commonly
occurring value in a data set, in other words, the value that occurs with the greatest frequency. The mode, unlike either the mean or
the median, does not have to be unique. A data set can have more than one mode or no mode at all. A data set with: one mode is
referred to as unimodal; two modes is referred to as bimodal; and three or more modes is referred to as multimodal. There is no
universal notation for the mode and the mode is valid for any type of data.
Measures of Data Variation
Besides a measure of “typical,” what else might we want to know about a data set? Do the measures of central tendency
tell us all we need to “know” about the observations we have collected? Certainly not, in fact, two data sets could have the same mean
and be completely difference in terms of dispersion. Consider that we “know” the mean depth of a lake where we plan our next office
picnic. Suppose the mean depth of the lake is 4 feet, is this all we need to know about the depth of this lake? No. We need to know
how much the values (depth) varies around 4 feet. The depth of the lake could be 4 feet at every point and have a mean of 4 feet or
the depth of the lake could vary greatly around four feet and still have a mean of 4 feet. There could be places where the depth is a
few inches and other places where the depth is 10 feet. This information about how the data are dispersed is very important
(especially for those of us who cannot swim). The study of statistics could appropriately be referred to as the study of variability since
many of the techniques employ the comparison of the variability of typical values in different groups to determine whether or not
these values are the same or different between groups. Measures of Data Variation (variability, dispersion, or spread) are attempts
to describe how spread out, or how much the values vary, in a particular data set. All measures of data variation or dispersion
require quantitative data to calculate and are nonnegative. The measures of data variation are zero (if all the values are equal) or
positive. A “large” measure of spread indicates a more dispersed data set while a “small” measure indicates a more tightly grouped
data set.
The easiest measure of spread to calculate is the range. The range is the difference between the largest or maximum value
and the smallest or minimum value. The notation and formula for the range is: R = H − L , where H is the largest of maximum
value and L is the smallest or minimum value. The range, while simple to calculate, is only informative if it is “small.” “Small” and
“large” are relative terms and must be determined relative to the magnitude of the values measured. For instance, a range of $3 for
dinner could be characterized as “small” if we are eating at a five-star restaurant in a pricey hotel in New York City where the dinner
entrees range in price from $12.00 to $35.00 but may be characterized as “large” if we’re eating at a local fast-food restaurant. If the
range is “small” it means that the two extreme values are very close to each other, so the rest of the values must also be tightly
grouped. If the range is “large” we know that the extreme values are a long way from each other but we know nothing about the
distribution of the rest of the observations. Since the range only uses two values in its calculation, we are provided with limited
information.
Like our favorite measure of central tendency, the mean, we might like to come up with a measure of variability that
incorporates all the values in the data set as opposed to using only the two values needed to calculate the range. We might be
interested in finding out, on the average, how much the values vary around a “typical value.” In an effort to describe the variability of
5
a data set we could measure the distance each value is from the mean, our standard measure of “typical.” The distance a value is from
the mean is called the “deviation from the mean” and is found by subtracting the mean from a particular value. This deviation from
the mean can be negative, (if the value is smaller than the mean) positive, (if the value is bigger than the mean) or zero (if the value is
equal to the mean). To calculate the average deviation from the mean, we could sum the deviations from the mean for each value in
the data set and divide by the number of observations in our sample. Unfortunately, although a good idea intuitively, this value will
always be zero since the mean is the gravitational center of the data set and as a result, the sum of the deviations from the mean sum
to zero and so the average deviation would be zero (0):

∑( x −x ) = 0 . This occurs because the deviations from the mean
n
that are negative offset the deviations from the mean that are positive. We can avoid this problem by using the absolute value or
square of the deviations from the mean.
The Mean Absolute Deviation (MAD), is the sum of the absolute deviations from the mean divided by the sample size:
MAD =
∑| x − x | . The MAD is used in financial analysis to determine the variability in stock prices from the expected
n
price. Unfortunately, while the MAD is the “best” measure of spread for descriptive purposes, it is not useful for inferential statistics
since the distribution of an absolute value function is not smooth.
The sample variance, denoted s2, is the sum of the squared deviations from the mean divided by the sample size less one
(n-1). Continuing our effort to find an average deviation from the mean, we square the deviations from the mean to eliminate any
negative values so our numerator is not equal to zero, and then divide by the sample size less one. Our denominator is made smaller
(hence our variance is made larger) as an adjustment to our estimate for the true population variance, denoted σ 2
(sigma squared)
since we calculate the sample variance, s2, using the sample mean, x , instead of the true population mean, µ (mu). The true
measure of variability for the population should be calculated according to each value’s distance from µ , the population mean. The
adjustment in the denominator makes our estimate larger than without the adjustment to account for the estimate ( x ) used in the
numerator. Since we would prefer to have a “small” measure of variability because this indicates that the mean, x , is a good
measure of “typical” since most of the values are “close to” the mean, adjusting our estimate for the variance to be larger is considered
to be conservative. We are unsure of the true value of the mean so we use the value of the sample mean to estimate the variability in
the data. The deviations from the mean are estimated using deviations from the sample mean. It is said that we lose one degree of
freedom (df) in the denominator for every estimate in the numerator. All variances are of the form: sum of squares divided by
degrees of freedom.
The problem with the variance is that the value is in squared units. For instance, if we are measuring the dollar amount
spent on lunch, the variance will be in dollars squared. Since squared units make interpretation difficult, we normally take the square
6
root of the variance to return to the original units of measurement. The positive square root of the sample variance, s2, is the sample
standard deviation, s. The sample standard deviation, s, is our estimate for the true population standard deviation, denoted
σ ( sigma), which is the positive square root of the population variance, σ 2. The definitional formula for the sample variance, s2, is
given below followed by an algebraic manipulation which we call the computation formula. The computational formula is easier and
faster to calculate but intuitively the definitional formula makes more sense as our estimate of the “average” (squared) deviation from
the mean.
(∑ x) 2
∑(x − x)
2
∑ x2 − n = the sample variance
s2 = =
n −1 n −1
s= s 2 = the sample standard deviation
Although we rarely calculate parameters, the following formulae are given for the population variance and the population standard
deviation.
(∑ x ) 2
σ2 =
∑( x − µ ) 2 =∑
x2 −
N = the population variance
N
N
σ = σ2 = the population standard deviation.
Uses of the Standard Deviation
The standard deviation of a sample is an attempt to estimate the typical distance that values in the data set differ from the
mean. We use the standard deviation as the step-size to estimate the percentage of values that lie within 1 step, 2 steps, or three steps
of the mean. For example, Chebyshev’s Theorem, which applies to any distribution regardless of its shape, states that within k
1
standard deviations of the mean, at least 1 − % of the values will fall. Since Chebyshev’s Theorem applies to any distribution
k2
regardless of shape, the information learned is less specific then we might like. In other words, using the formula, we would discover
that at least 75% of the observations (in any distribution) lie within 2 standard deviations of the mean. This means that 75%-
100% of the values will fall within two standard deviations of the mean. While some information is better than none, we would like to
be more precise in our estimate of this percentage. For certain known distributions, we can more precisely estimate the percentage of
values that lie within one, two or three standard deviations of the mean.
The Empirical Rule, which only applies to a normal distribution, provides us with much more information about this
particular distribution than Chebyshev’s Theorem. The Empirical Rule states that for any normal distribution, approximately 68% of
the values will fall within one standard deviation of the mean, approximately 95% of the values will fall within two standard
7
deviations of the mean, and approximately 99.7% of the values will fall within three standard deviations of the mean. This much more
precise information is only true for data distributed normally. The normal distribution, sometimes referred to as the Gaussian
distribution after Karl Gauss who discovered that the normal distribution of certain errors, is bell-shaped and symmetrical, and models
the behavior of many random variables. We will discuss the normal distribution as well as its probability distribution later in the
course.
Measures of Position or Location
Measure of central tendency and measures of data variation are singular values to describe an entire data set. Measure of
position or location are measures of an individual value and indicate the relative position of that value to the other values in the data
set. A commonly used measure of position is a percentile. Aptitude tests often provide an individual’s percentile ranking to let them
know how they did relative to others who took the test. To determine what test score exceeds a certain percentage of test scores, we
first divide our data set into 100 equal parts and then count in to determine the location of the value that corresponds to the percentile
we are interested in.
The kth percentile, Pk, is that value which is equal to or greater than, k% of the observations and is less than or equal to the
remaining (100-k)% of the observations.
The procedure for calculating the kth percentile is:
1. Order the data from smallest to largest value.
nk
2. Find , where n is the sample size and k is the percentile you are calculating.
100
nk
3. (a) if is not an integer, then i, the position of the kth percentile, will be the next larger integer. For
100
nk
example if = 4.5 then i = 5.
100
nk nk
(b) if is an integer, then i, the position of the kth percentile, will be +.5. For example if
100 100
nk
= 6 then i = 6.5.
100
4. (a) if i is an integer (3a above) then the kth percentile if the value found at the ith position. For example, in 3a above, i =
5, so the kth percentile is the 5th value in the ordered data set.
(b) if i is not an integer (3b above) then the kth percentile if the mean of the two values surrounding the ith position.
For example, in 3b above, i = 6.5, so the kth percentile is the mean of the sixth and seventh values in the ordered
data set.
8
Sometimes, instead of being interested in what data point has a certain percentage above it or below it, researchers are
interested in determining the value that is “typical” for the “center” group of values. For example, suppose we are charged with the
responsibility of developing the curriculum for a kindergarten class. The students in a class of kindergarteners could differ
tremendously in terms of acquired knowledge. Suppose, in an effort to develop the curriculum, we give each student in the class an
aptitude class to measure his/her abilities in basic knowledge. The scores may vary greatly since some of the students may have
attended preschool since they were very young while others may not have attended at all. If we do not have the resources to have
multi-level curriculum, then we would develop a curriculum that was targeted at those “in the middle” in terms of their aptitude
scores. Since we are interested in targeting the center of the distribution of aptitude scores, we will determine what constitutes the
“middle 50%” and gear our curriculum at those students.
Quartiles, which are just specific percentiles, allow us to divide our data into four equal groups. The first or lower
quartile, Q1, is equal to the 25% percentile, P25. The second or mid-quartile, Q2, is equal to the 50% percentile, P50, which is also the
median, M. The third or upper quartile, Q3, is equal to the 75% percentile, P75. We use these quartiles to help us determine
characteristics of the middle 50% of our data. For example, the Interquartile Range (IQR), is the range of the middle 50% of the
data. Like the range, the IQR is a measure of data variation or dispersion but instead of indicating the range of all the data like the
range does, the IQR indicates the range of only the middle 50%. Like other Measures of Data Variation, the IQR requires quantitative
data to calculate. The formula for the IQR is: IQR = Q3 − Q1 . To calculate the IQR, the first and third quartiles are
determined by finding the corresponding percentile, i.e., Q3=P75 and Q1=P25.
The Mid-Quartile Range, (MQR), is a statistic we calculate to determine a “typical” value in the middle group of
observations. The MQR is a Measure of Central Tendency and is the mean of the extreme values of the middle 50% of the
observations. It is not the mean of all observations in the middle 50%, but instead we find the mean of the first and third quartiles.
Q1 + Q3
The formula for the MQR is: MQR = .
2
Another measure of position or location is called the Z-score or Z value. The Z-score for a particular value in a data set
indicates the number of standard deviations that value is from the mean. Z-scores can be negative (if the value is less than the
mean), positive (if the values is larger than the mean), or equal to zero (if the value is equal to the mean). The Z-score for the mean is
always zero. For example, a value with a Z-score of 1.35 is 1.35 standard deviations above the mean. A value with a Z-score of –
2.12 is 2.12 standard deviations below the mean.
Z-values can be calculated, and a Standard Normal Table used, to determine approximately what proportion of the values,
for a normal distribution, are above or below a particular value, or between two values in a distribution.
Frequency Distributions
Terminology:
9
Defn: The frequency, f, for a value or a class of values is the number of times that value or class of values
occurs in the data set.
We are simply counting how often a value or set of values occurs in the data set.
1. What is the minimum number of times a value or class of values occur(s) in a data set? The minimum number of times a
value or class of values can occur is zero (0). What is the maximum number of times a value or class of values can occur
in the data set? The maximum number of times a value or class of values can occur in the data set is n, or the total number
of values in the data set.
0≤ f≤ n
2. If we add the frequencies for each value or set of values it will sum to n.
Σ f=n
Defn: The relative frequency, f/n, (how often the value occurs divided by the total number of observations—
gives you a proportion of times a value or class of values occurs) for a value or a class of values is the proportion of time that a
value or class of values occurs in the data set.
1. What is the minimum proportion of time a value or class of values occur(s) in a data set? The minimum proportion of time
a value or class of values can occur is zero (0). What is the maximum proportion of time a value or class of values can
occur in the data set? The maximum proportion of time a value or class of values can occur in the data set is one (1).
0 ≤ f/n ≤ 1
2. If we add the relative frequencies for each value or set of values it will sum to one (1).
Σ f/n = 1
Defn: The cumulative frequency, F, for a value or a class of values is the number of times that value or any
smaller value occurs in the data set.
We are simply keeping a running total.

1. Cumulative frequencies are non-decreasing (this means the values cannot decrease—they can level off but they can’t go
down).
2. The cumulative frequency for the last value or class of values is n.
3. We must have at least ordinal scaled data to find cumulative frequencies.
Defn: The cumulative relative frequency, F/n, for a value or a class of values is the proportion of time that value
or any smaller value occurs in the data set.
We are simply keeping a running total of relative frequencies or proportions.

1. Cumulative relative frequencies are non-decreasing.
2. The cumulative relative frequency for the last value or class of values is one (1).
3. We must have at least ordinal scaled data to find cumulative relative frequencies.
10

Quan Tech-1

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Quan Tech-1

Încărcat de

Drepturi de autor:

Formate disponibile

©Dr. Valerie P.

Quantitative Techniques in Business

quantitative techniques has developed.

Administration at Sam Houston State University.

mean annual salary (a parameter) of all the residents of this county.

The size of the population, N, is often not known.

Scale of Data Measurement

differences, are not valid.

Measures of Central Tendency

the formula for calculating the sample mean is: x=

ordered so the median cannot be found for nominal data.

Measures of Data Variation

to zero and so the average deviation would be zero (0):

square of the deviations from the mean.

since the distribution of an absolute value function is not smooth.

s= s 2 = the sample standard deviation

Uses of the Standard Deviation

Measures of Position or Location

we are interested in.

remaining (100-k)% of the observations.

The procedure for calculating the kth percentile is:

1. Order the data from smallest to largest value.

“middle 50%” and gear our curriculum at those students.

determined by finding the corresponding percentile, i.e., Q3=P75 and Q1=P25.

2.12 is 2.12 standard deviations below the mean.

We are simply keeping a running total.

We are simply keeping a running total of relative frequencies or proportions.

S-ar putea să vă placă și