Sunteți pe pagina 1din 102

Median and Mean of a Density Curve

The median of a density curve is the equal-areas point, the point that divides
the area under the curve in half.
The mean of a density curve is the balance point, at which the curve would
balance if made of solid material.
The median and mean are the same for a symmetric density curve. They both lie at
the center of the curve. The mean of a skewed curve is pulled away from the
median in the direction of the long tail.
Statistics
Founded in 1890, the Literary Digest magazine was famous for its success in conducting polls
to predict winners in presidential elections. The magazine correctly predicted the winners in the
presidential elections of 1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest
between Alf Landon and Franklin D. Roosevelt, the magazine sent out 10 million ballots and
received 1,293,669 ballots for Landon and 972,897 ballots for Roosevelt, so it appeared that
Landon would capture 57% of the vote.
Well, Landon received 16,679,583 votes to the 27,751,597 votes cast for Roosevelt. Instead
of getting 57% of the vote as suggested by the Literary Digest poll, Landon received only
37% of the voteIn that same 1936 presidential election, George Gallup used a much smaller
poll of 50,000 subjects, and he correctly predicted that Roosevelt would win.

Data are collections of observations (such as measurements, genders, survey
responses).
Statistics is the science of planning studies and experiments, obtaining data,
and then organizing, summarizing, presenting, analyzing, interpreting, and
drawing conclusions based on the data.
A population is the complete collection of all individuals (scores, people,
measurements, and so on) to be studied. The collection is complete in the
sense that it includes all of the individuals to be studied.
A census is the collection of data from every member of the population.
A sample is a sub collection of members selected from a population. We should
consider these factors:
Context of the data
Source of the data
Sampling method


Flipping of coin

Data A plural noun (the singular form is datum) which means a set of known or given things,
facts. Note that data can be numerical (e.g. age of people) or non-numerical (e.g. gender of
people).

statistics Without a capital letter, i.e. in its lower-case form, this means a set of numerical
data or figures that have been collected systematically.

Statistics With a capital letter this is a proper noun that means the set of methods and
theories that can be used to arrange, analyse and interpret statistics.

A variable A quantity that varies, the opposite of a constant. For example, the number of
mobile phones sold per day in a shop is a variable, whereas the number of hours in a day is a
constant. In the expressions that we will use to summarize methods a capital letter, usually X
or Y, will be used to represent a variable.

Value A specific amount that it is possible for a variable to be. For example, the number of
mobile phones sold per day could be 25 or 43 or 51. These are all possible values of the
variable number of phones sold.

Random This adjective refers to something that occurs in an unplanned way. A random
variable is a variable whose observed values arise by chance. The number of new accounts a
bank opens during a month is a variable that is random, whereas the number of days in a
month is a variable that is not random, i.e. its observed values are pre-determined.

Distribution The pattern exhibited by the observed values of a variable when they are
arranged in order of magnitude. A theoretical distribution is one that has been deduced, rather
than compiled from observed values.

Population Generally this means the total number of persons residing in a defined area at a
given time. In Statistics a population is the complete set of things we want to investigate.
These may be human such as all the people who have visited a supermarket, or inanimate
such as all the policies issued by an insurance company.

Sample A subset of the population, that is, a smaller number of items picked from the
population. A random sample is a sample whose components have been chosen in a random
way, that is, on the basis that any single item in the population has no more or less chance
than any other to be included in the sample.


Copyright 2004
Pearson Education, Inc.
Business

The etymology of "business" relates to the state of being busy either as an
individual or society as a whole, doing commercially viable and profitable
work
A business (also known as enterprise or firm) is
an organization engaged in the trade of goods, services, or both
to consumers.
[

business statistics can be described as the collection, summarization,
analysis, and reporting of numerical findings relevant to a business
decision or situation.
Copyright 2004
Pearson Education, Inc.
Why Statistics

The time has three phases Past ,Present and Future
To continue and growth of any business depends on strategic decisions based
on finance, operations or market
The decision making is very crucial either it is based on intuition or information/
Knowledge
The Data ( Facts of present) Analysis Information Knowledge
Knowledge base decisions are based on some model
There is a time lag between awareness of impeding event or need and
occurrence of that event
This is lead time and hence planning and forecasting is needed
Occurrence is either random or has a causal relation.
The statistics helps here

Properties of Estimators
Statistics:
1. Sufficiency
2. Un-biased
3. Resistance
4. Efficiency
Parameters: Describe the population
Describe
But we use it to estimate population parameters
samples.
Sample Variance as an Unbiased estimator
Biased
UNbiased
Example population:
o

2
2
=

( ) y
n
y =

6
= 2
o
2
= 0667 .
( )
n
y y
s


=
2
2
( )
1
2
2

n
y y
s
y: 1, 2, 3
Samples of Two from the above population
If
Sample y: 1, 2
If
( )
25 . 0
2
2
=

n
y y
s
( )
50 . 0
1
2
2
=

n
y y
s
n versus n-1: All permutations
Sample mean Var(n) Var(n-1)
1,1 1.0 0 0
1,2 1.5 .25 .50
1,3 2.0 1.0 2.0
2,1 1.5 .25 .50
2,2 2.0 0 0
2,3 2.5 .25 .50
3,1 2.0 1.0 2.0
3,2 2.5 .25 .50
3,3 3.0 0 0
E= 2.0 .333 .667
Not all n-1 estimates are better than their n counterparts.
But, on average, n-1 is superior (Unbiased).
Remember
o
2
= 0667 .
E = expected value
Degrees of Freedom
Why we learn statistics

Data is a numerical information

Data
Information
Analysis
Knowledge
Only data is useless it has to be organized summarized and presented
based on it is analyzed or estimated these are the functions of statistics
Measurement is done is either quantitative or qualitative

.
In business we have to take decisions
There is risk associated with future
Decisions are either intuitional or calculated
(1) Carefully defining the situation, (2) gathering data, (3) accurately summarizing the
data, and (4) deriving and communicating meaningful conclusions.
Statistics: The science of collecting, describing, and interpreting data.
Population: A collection, or set, of individuals, objects, or events whose
properties are to be analyzed.
Sample: A subset of a population.
Variable (or response variable): A characteristic of interest about each
individual element of a population or sample.
Data value: The value of the variable associated with one element of a
population or sample. This value may be a number, a word, or a symbol
Data: The set of values collected from the variable from each of the elements
that belong to the sample.
Experiment: A planned activity whose results yield a set of data.
Parameter: A numerical value summarizing all the data of an entire pulation.
Statistic: A numerical value summarizing the sample data.
Qualitative, or attribute, or categorical, variable: A variable that describes or
categorizes an element of a population.
Quantitative, or numerical, variable: A variable that quantifies an element of
a population.
A variable is simply something that can vary: that is, it can take on many different
values or categories. Examples of variables are gender, typing speed, top speed of
a car, number of reported symptoms of an illness, temperature, attendances at rock
festivals (e.g. the Download festival), level of anxiety, number of goals scored in
football matches, intelligence, number of social encounters while walking your dog,
amount of violence on television, occupation and favourite colours. These are all
things that we can measure and record and that vary We are generally interested
in variables because we want to understand why they vary as they do.
Ratio-level scales have equal intervals between adjacent scores on the scale and an
absolute zero.
Interval scales have equal intervals between adjacent scores but do not have an
absolute zero.
Ordinal scales have some sort of order to the categories (e.g. in terms of magnitude)
but the intervals between adjacent points on the scale are not necessarily equal.
Nominal-level scales consist of categories that are not ordered in any particular way.
Nominal variable: A qualitative variable that characterizes (or describes, or
names) an element of a population. Not only are arithmetic operations not
meaningful for data that result from a nominal variable, but an order cannot be
assigned to the categories
Ordinal variable: A qualitative variable that incorporates an ordered
position, or ranking.
Discrete variable: A quantitative variable that can assume a countable
number of values. Intuitively, the discrete variable can assume any values
corresponding to isolated points along a line interval. That is, there is a gap
between any two values.
Continuous variable: A quantitative variable that can assume an
uncountable number of values. Intuitively, the continuous variable can assume
any value along a line interval, including every possible value between any two
values.
Biased sampling method: A sampling method that produces data that
systematically differ from the sampled population. An unbiased sampling method
is one that is not biased
Sampling frame: A list, or set, of the elements belonging to the population
from which the sample will be drawn.
Why we learn statistics
Data is a numerical information

Data
Information
Analysis
Knowledge
Only data is useless it has to be organized summarized and presented
based on it is analyzed or estimated these are the functions of statistics
Measurement is done is either quantitative or qualitative

Scales used

Nominal Scale
Ordinal Scale.
Interval Scale.
Ratio Scale
Probability is a numerical measure between 0 and 1 that describes the
likelihood that an event will occur. Probabilities closer to 1 indicate that the
event is more likely to occur. Probabilities closer to 0 indicate that the event
is less likely to occur.
P(A), read P of A, denotes the probability of event A.
If P(A) 1, the event A is certain to occur.
If P(A) 0, the event A is certain not to occur.
Probability is base for inferential statistics
Event is outcome of an experiment
Sample space collection of all events

1. All sample point probabilities lie between 0 and 1
2. Sum of probabilities of all sample point within sample space =1

Probability
Mutually exclusive events are statistically independent
When two events are mutually exclusive then the probability of A or B occurring
can be expressed by the following addition rule for mutually exclusive
events P(A, or B) P(A) P(B)
A queen of sped and Ace of sped has probability
P(As or Qs)1/52+1/52 with replacement and 1/52+1/51 without replacement
If two events are non-mutually exclusive
addition rule for no mutually exclusive events P(A, or B)= P(A) + P(B)- P(AB)
joint probability. This is calculated by the product of the individual marginal
probabilities P(AB) = P(A) * P(B)
The concept of statistical dependence implies that the probability of a
certain event is dependent on the occurrence of another event
Venn diagram
A Venn diagram, named after John Venn an English mathematician (18341923),
Classic Theory The classic theory of probability underlies much of probability
in statistics. Briefly, this theory states that the chance of a particular outcome
occurring is determined by the ratio of the number of favourable outcomes
(or successes) to the total number of outcomes. Expressed as a formula,
The classic theory assumes that all outcomes have equal likelihood of
occurring. In the example just cited, each card must have an equal chance
of being chosenno card is larger than any other or in any way more likely
to be chosen than any other card. The classic theory pertains only to outcomes that
are mutually exclusive (or disjoint), which means that those outcomes may
not occur at the same time. For example, one coin flip can result in a head or a
tail, but one coin flip cannot result in a head and a tail. So the outcome of a head
and the outcome of a tail are said to be mutually exclusive in one coin flip, as is
the outcome of an ace and a king as the outcome of one card being drawn.
A probability assignment based on equally likely outcomes uses the formula
11.30
Chapter 11
Introduction to Hypothesis
Testing
11.31
Nonstatistical Hypothesis Testing
A criminal trial is an example of hypothesis
testing without the statistics.
In a trial a jury must decide between two
hypotheses. The null hypothesis is
H
0
: The defendant is innocent

The alternative hypothesis or research
hypothesis is
H
1
: The defendant is guilty

The jury does not know which hypothesis
is true. They must make a decision on the
basis of evidence presented.
11.32
Nonstatistical Hypothesis Testing
In the language of statistics convicting the defendant is
called rejecting the null hypothesis in favor of the
alternative hypothesis. That is, the jury is saying that
there is enough evidence to conclude that the defendant
is guilty (i.e., there is enough evidence to support the
alternative hypothesis).

If the jury acquits it is stating that there is not enough
evidence to support the alternative hypothesis. Notice
that the jury is not saying that the defendant is innocent,
only that there is not enough evidence to support the
alternative hypothesis. That is why we never say that we
accept the null hypothesis, although most people in
industry will say We accept the null hypothesis


11.33
Nonstatistical Hypothesis Testing
There are two possible errors.
A Type I error occurs when we reject a
true null hypothesis. That is, a Type I error
occurs when the jury convicts an innocent
person. We would want the probability of
this type of error [maybe 0.001 beyond a
reasonable doubt] to be very small for a
criminal trial where a conviction results in
the death penalty, whereas for a civil trial,
where conviction might result in someone
having to pay for damages to a wrecked
auto,we would be willing for the
probability to be larger [0.49
preponderance of the evidence ]
P(Type I error) = o [usually
0.05 or 0.01]



11.34
Nonstatistical Hypothesis Testing
A Type II error occurs when we dont
reject a false null hypothesis [accept the
null hypothesis]. That occurs when a guilty
defendant is acquitted.
In practice, this type of error is by far the
most serious mistake we normally make.
For example, if we test the hypothesis that
the amount of medication in a heart pill is
equal to a value which will cure your heart
problem and accept the hull hypothesis
that the amount is ok. Later on we find
out that the average amount is WAY too
large and people die from too much
medication [I wish we had rejected the
hypothesis and threw the pills in the trash
can], its too late because we shipped the
pills to the public.
11.35
Nonstatistical Hypothesis Testing
The probability of a Type I error is denoted
as (Greek letter alpha). The probability
of a type II error is (Greek letter beta).

The two probabilities are inversely related.
Decreasing one increases the other, for a
fixed sample size.

In other words, you cant have o and
both real small for any old sample size.
You may have to take a much larger
sample size, or in the court example, you
need much more evidence.


11.36
Types of Errors
A Type I error occurs when we reject a
true null hypothesis (i.e. Reject H
0
when it
is TRUE)







A Type II error occurs when we dont
reject a false null hypothesis (i.e. Do NOT
reject H
0
when it is FALSE)
H
0
T F
Reject
I
Reject
II
11.37
Nonstatistical Hypothesis Testing
The critical concepts are theses:
1. There are two hypotheses, the null and the
alternative hypotheses.
2. The procedure begins with the assumption that the
null hypothesis is true.
3. The goal is to determine whether there is enough
evidence to infer that the alternative hypothesis is true,
or the null is not likely to be true.
4. There are two possible decisions:
Conclude that there is enough evidence to support
the alternative hypothesis. Reject the null.
Conclude that there is not enough evidence to
support the alternative hypothesis. Fail to reject the
null.
11.38
Concepts of Hypothesis Testing
(1)
The two hypotheses are called the null
hypothesis and the other the alternative
or research hypothesis. The usual
notation is:


H
0
: the null hypothesis

H
1
: the alternative or research
hypothesis

The null hypothesis (H
0
) will always state
that the parameter equals the value
specified in the alternative hypothesis (H
1
)
pronounced
H nought
11.39
Concepts of Hypothesis
Testing
Consider mean demand for computers
during assembly lead time. Rather than
estimate the mean demand, our
operations manager wants to know
whether the mean is different from 350
units. In other words, someone is claiming
that the mean time is 350 units and we
want to check this claim out to see if it
appears reasonable. We can rephrase this
request into a test of the hypothesis:
H
0
: = 350
Thus, our research hypothesis becomes:
H
1
: 350
Recall that the standard deviation []was
assumed to be 75, the sample size [n] was
25, and the sample mean [ ] was
calculated to be 370.16
11.40
Concepts of Hypothesis
Testing
For example, if were trying to decide
whether the mean is not equal to 350, a
large value of (say, 600) would provide
enough evidence.

If is close to 350 (say, 355) we could not
say that this provides a great deal of
evidence to infer that the population mean
is different than 350.
11.41
Concepts of Hypothesis Testing
(4)
The two possible decisions that can be made:

Conclude that there is enough evidence to support the
alternative hypothesis
(also stated as: reject the null hypothesis in favor of the
alternative)

Conclude that there is not enough evidence to support
the alternative hypothesis
(also stated as: failing to reject the null hypothesis in favor
of the alternative)
NOTE: we do not say that we accept the null hypothesis if
a statistician is around
11.42
Concepts of Hypothesis Testing
(2)
The testing procedure begins with the
assumption that the null hypothesis is
true.

Thus, until we have further statistical
evidence, we will assume:

H
0
: = 350 (assumed to be TRUE)
The next step will be to determine the
sampling distribution of the sample mean
assuming the true mean is 350.
is normal with 350

75/SQRT(25) = 15
11.43
Is the Sample Mean in the Guts of the Sampling
Distribution??
11.44
Three ways to determine this: First way
1. Unstandardized test statistic: Is in
the guts of the sampling distribution?
Depends on what you define as the guts
of the sampling distribution.

If we define the guts as the center 95% of
the distribution [this means o = 0.05],
then the critical values that define the
guts will be 1.96 standard deviations of X-
Bar on either side of the mean of the
sampling distribution [350], or
UCV = 350 + 1.96*15 = 350 + 29.4 =
379.4
LCV = 350 1.96*15 = 350 29.4 =
320.6

11.45
1. Unstandardized Test Statistic Approach
11.46
Three ways to determine this: Second way
2. Standardized test statistic: Since we
defined the guts of the sampling
distribution to be the center 95% [o =
0.05],
If the Z-Score for the sample mean is
greater than 1.96, we know that will be
in the reject region on the right side or
If the Z-Score for the sample mean is
less than -1.97, we know that will be in
the reject region on the left side.

Z = ( - )/ = (370.16 350)/15 =
1.344

Is this Z-Score in the guts of the sampling
distribution???
11.47
2. Standardized Test Statistic Approach
11.48
Three ways to determine this: Third way
3. The p-value approach (which is generally used with a
computer and statistical software): Increase the
Rejection Region until it captures the sample mean.

For this example, since is to the right of the mean,
calculate
P( > 370.16) = P(Z > 1.344) = 0.0901
Since this is a two tailed test, you must double this area
for the p-value.
p-value = 2*(0.0901) = 0.1802
Since we defined the guts as the center 95% [o = 0.05],
the reject region is the other 5%. Since our sample
mean, , is in the 18.02% region, it cannot be in our 5%
rejection region [o = 0.05].


11.49
3. p-value approach
11.50
Statistical Conclusions:
Unstandardized Test Statistic:
Since LCV (320.6) < (370.16) <
UCV (379.4), we reject the null
hypothesis at a 5% level of significance.

Standardized Test Statistic:
Since -Z
o/2
(-1.96) < Z(1.344) < Z
o/2

(1.96), we fail to reject the null hypothesis
at a 5% level of significance.

P-value:
Since p-value (0.1802) > 0.05 [o], we
fail to reject the hull hypothesis at a 5%
level of significance.

11.51
Example 11.1
A department store manager determines
that a new billing system will be cost-
effective only if the mean monthly account
is more than $170.

A random sample of 400 monthly accounts
is drawn, for which the sample mean is
$178. The accounts are approximately
normally distributed with a standard
deviation of $65.

Can we conclude that the new system
will be cost-effective?
11.52
Example 11.1
The system will be cost effective if the
mean account balance for all customers is
greater than $170.

We express this belief as a our research
hypothesis, that is:

H
1
: > 170 (this is what we want to
determine)

Thus, our null hypothesis becomes:

H
0
: = 170 (this specifies a single
value for the parameter of interest)
Actually H
0
: < 170
11.53
Example 11.1
What we want to show:
H
1
: > 170
H
0
: < 170 (well assume this is true)
Normally we put H
o
first.
We know:
n = 400,
= 178, and
= 65
= 65/SQRT(400) = 3.25
o = 0.05
11.54
Example 11.1 Rejection
Region
The rejection region is a range of values
such that if the test statistic falls into that
range, we decide to reject the null
hypothesis in favor of the alternative
hypothesis.
is the critical value of to reject H
0
.
11.55
Example 11.1
At a 5% significance level (i.e. =0.05), we get [all o in
one tail]
Z
o
= Z
0.05
= 1.645
Therefore, UCV = 170 + 1.645*3.25 =
175.35
Since our sample mean (178) is greater than the critical
value we calculated (175.35), we reject the null
hypothesis in favor of H
1
OR
(>1.645)
Reject null

OR
p-value = P( > 178) = P(Z > 2.46) = 0.0069 < 0.05
Reject null
11.56
Example 11.1 The Big
Picture
=175.34
=178
H
1
: > 170
H
0
: = 170
Reject H
0
in favor of
11.57
Interpreting the p-value
The smaller the p-value, the more
statistical evidence exists to support the
alternative hypothesis.
If the p-value is less than 1%, there is
overwhelming evidence that supports
the alternative hypothesis.
If the p-value is between 1% and 5%,
there is a strong evidence that supports
the alternative hypothesis.
If the p-value is between 5% and 10%
there is a weak evidence that supports
the alternative hypothesis.
If the p-value exceeds 10%, there is no
evidence that supports the alternative
hypothesis.
We observe a p-value of .0069, hence
there is overwhelming evidence to
support H
1
: > 170.
11.58
Interpreting the p-value
Overwhelming Evidence
(Highly Significant)
Strong Evidence
(Significant)
Weak Evidence
(Not Significant)
No Evidence
(Not Significant)
0 .01 .05 .10
p=.0069
11.59
Conclusions of a Test of
Hypothesis
If we reject the null hypothesis, we
conclude that there is enough evidence to
infer that the alternative hypothesis is true.

If we fail to reject the null hypothesis, we
conclude that there is not enough
statistical evidence to infer that the
alternative hypothesis is true. This does
not mean that we have proven that the null
hypothesis is true!

Keep in mind that committing a Type I
error OR a Type II error can be VERY
bad depending on the problem.
11.60
One tail test with rejection
region on right
The last example was a one tail test,
because the rejection region is located in
only one tail of the sampling distribution:





More correctly, this was an example of a
right tail test.
H
1
: > 170
H
0
: < 170
11.61
One tail test with rejection
region on left
The rejection region will be in the left tail.
11.62
Two tail test with rejection region in both
tails
The rejection region is split equally
between the two tails.
11.63
Example 11.2 Students work
AT&Ts argues that its rates are such that
customers wont see a difference in their
phone bills between them and their
competitors. They calculate the mean and
standard deviation for all their customers
at $17.09 and $3.87 (respectively). Note:
Dont know the true value for , so we
estimate from the data [ ~ s = 3.87]
large sample so dont worry.
They then sample 100 customers at
random and recalculate a monthly phone
bill based on competitors rates.
Our null and alternative hypotheses are
H
1
: 17.09. We do this by assuming
that:
H
0
: = 17.09
11.64
Example 11.2
The rejection region is set up so we can
reject the null hypothesis when the test
statistic is large or when it is small.






That is, we set up a two-tail rejection
region. The total area in the rejection
region must sum to , so we divide o by
2.
stat is small stat is large
11.65
Example 11.2
At a 5% significance level (i.e. = .05),
we have
/2 = .025. Thus, z
.025
= 1.96 and our
rejection region is:

z < 1.96 -or- z > 1.96
z
-z
.025
+z
.025

0
11.66
Example 11.2
From the data, we calculate = 17.55

Using our standardized test statistic:


We find that:

Since z = 1.19 is not greater than 1.96, nor
less than 1.96 we cannot reject the null
hypothesis in favor of H
1
. That is there is
insufficient evidence to infer that there
is a difference between the bills of
AT&T and the competitor.
11.67
Summary of One- and Two-Tail
Tests
One-Tail Test
(left tail)
Two-Tail Test One-Tail Test
(right tail)
11.68
Probability of a Type II Error
A Type II error occurs when a false null
hypothesis is not rejected or you accept
the null when it is not true but dont say it
this way if a statistician is around.

In practice, this is by far the most serious
error you can make in most cases,
especially in the quality field.

11.69
Judging the Test
A statistical test of hypothesis is effectively
defined by the significance level ( ) and
the sample size (n), both of which are
selected by the statistics practitioner.

Therefore, if the probability of a Type II
error ( ) is too large [we have insufficient
power], we can reduce it by
increasing , and/or
increasing the sample size, n.

11.70
Judging the Test
The power of a test is defined as 1 .
It represents the probability of rejecting the null
hypothesis when it is false and the true mean is
something other than the null value for the mean.

If we are testing the hypothesis that the average amount
of medication in blood pressure pills is equal to 6 mg
(which is good), and we fail to reject the null
hypothesis, ship the pills to patients worldwide, only to
find out later that the true average amount of
medication is really 8 mg and people die, we get in
trouble. This occurred because the P(reject the null / true
mean = 7 mg) = 0.32 which would mean that we have a
68% chance on not rejecting the null for these BAD pills
and shipping to patients worldwide.


11.71
Probability you ship pills whose mean amount of medication is 7 mg approximately 67%
Definition
When we select a sample from a population and then try to estimate the
population parameter from the sample, we will not be entirely accurate. The
difference between the population parameter and the sample statistic is the
sampling error.
Data collection
Statistics is the study of how to collect, organize, analyze, and interpret
numerical information from data.
The goal of statistics is to gain understanding from data
Individuals are the people or objects included in the study.A variable is
a characteristic of the individual to be measured or observed
A quantitative variable has a value or numerical measurement for which
operations such as addition or averaging make sense.
A qualitative variable describes an individual by placing the individual into a
category or group,such as male or female it is categorical variables
In population data, the data are from every individual of interest.
In sample data, the data are from only some of the individuals of
interest.
A parameter is a numerical measure that describes an aspect of a
population.
A statistic is a numerical measure that describes an aspect of a sample.
DATA
Summarizing the data: Summarization is a process in which the data is
reduced for interpretation without sacrificing any important information.
Finding hidden relation ship, Anomalies ,trends, estimating ,predicting
Data analysis task
Basic concept of Probability
Data
Element
Dictums'
Discrete
Variable
Continuous Discrete
Qualitative Quantitative Interval
Ratio scale
For measurement
Nominal
Ordinal scale
For measurement
The population consists of the set of all measurements in which the
investigator
is interested. The population is also called the universe.
A sample is a subset of measurements selected from the population.
Sampling from the population is often done randomly, such that every
possible sample of n elements will have an equal chance of being
selected. A sample selected in this way is called a simple random sample,
or just a random sample. A random sample allows chance to determine
its elements.
Samples and Populations
A survey by an electric company contains questions on the following:
1. Age of household head.
2. Sex of household head.
3. Number of people in household.
4. Use of electric heating (yes or no).
5. Number of large appliances used daily.
6. Thermostat setting in winter.
7. Average number of hours heating is on.
8. Average number of heating days.
9. Household income.
10. Average monthly electric bill.
11. Ranking of this electric company as compared with two previous electricity
suppliers.
Describe the variables implicit in these 11 items as quantitative or qualitative, and
describe the scales of measurement
Given a set of numerical observations, we may order them according to magnitude.
Once we have done this, it is possible to define the boundaries of the set. Any
student
who has taken a nationally administered test, such as the Scholastic Aptitude Test
(SAT), is familiar with percentiles. Your score on such a test is compared with the
scores
of all people who took the test at the same time, and your position within this group
is
defined in terms of a percentile. If you are in the 90th percentile, 90% of the people
who took the test received a score lower than yours. We define a percentile as
follows.
The Pth percentile of a group of numbers is that value below which lie P%
(P percent) of the numbers in the group. The position of the Pth percentile
is given by (n 1)P/100, where n is the number of data points.
The magazine Forbes publishes annually a list of the worlds wealthiest individuals.
For 2007, the net worth of the 20 richest individuals, in billions of dollars, in no
particular
order, is as follows:
33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18
Find the 50th and 80th percentiles of this set of the worlds top 20 net worths.
Basic concept of Probability
It is better to be roughly right than precisely wrong.
John Maynard Keynes
You all have probably heard the story about Malcolm Forbes, who once got lost
floating for miles in one of his famous balloons and finally landed in the middle of a
cornfield. He spotted a man coming toward him and asked, Sir, can you tell me
where I am? The man said, Certainly, you are in a basket in a field of corn.
Forbes said, You must be a statistician. The man said, Thats amazing, how did
you
know that? Easy, said Forbes, your information is concise, precise, and
absolutely
useless!
Basic concept of Probability
Nominal scale A scale of measurement for a variable that uses a label or name
to identify
an attribute of an element. Nominal data may be nonnumeric or numeric.
Ordinal scale A scale of measurement for a variable that has the properties of
nominal
data and can be used to rank or order the data. Ordinal data may be nonnumeric or
numeric.
Interval scale A scale of measurement for a variable that has the properties of
ordinal
data and the interval between observations is expressed in terms of a fixed unit of
measure.
Interval data are always numeric.
Ratio scale A scale of measurement for a variable that has all the properties of
interval
data and the ratio of two values is meaningful. Ratio data are always numeric.
Measure of variation








Data
Qualitative or
attribute
Discret
e
Continuou
s

Type of car owned.
Color of pens.
Number of children.
Time taken for
an exam.
Presentation
The pineapples are the objects (individuals) of the
study. If the researchers are
interested in the individual weights of pineapples in the
field, then the variable
consists of weights. At this point, it is important to
specify units of
measurement and degree of accuracy of
measurement. The weights could be
measured to the nearest ounce or gram. Weight is a
quantitative variable
because it is a numerical measure. If weights of all the
ready-to-harvest pineapples
in the field are included in the data, then we have a
population. The average
weight of all ready-to-harvest pineapples in the field is a
parameter.
(b) Suppose the researchers also want data on taste. A panel of tasters rates the
pineapples according to the categories poor, acceptable, and good. Only
some of the pineapples are included in the taste test. In this case, the variable is
taste. This is a qualitative or categorical variable. Because only some of the
pineapples in the field are included in the study, we have a sample. The proportion
of pineapples in the sample with a taste rating of good is a statistic.
Ordered
Array
Ogive Polygon
Histo-
gram
Frequency
Distributions
Numerical
Data
Stem-&-Leaf
Display
Numerical (Quantitative)
Data Presentation
Numerical (Quantitative)
Data Presentation
Presentation
Descriptive statistics involves methods of organizing, picturing, and
summarizing
information from samples or populations.
Inferential statistics involves methods of using information from a sample to
draw conclusions regarding the population.
A simple random sample of n measurements from a population is a subset
of the population selected in a manner such that every sample of size n from
the population has an equal chance of being selected.
Probability
Basic concept of Probability
Central tendency
Mean is summarizing the data in to one fig. summarize a wide range of
measurements with a single value?
Mean X number = Total
When there is no trend and values are fluctuating arithmetic mean is a best
representative. Distribution is normal and not skew
Arithmetic mean > Geometric mean > Harmonic mean

Probably the least understood, the harmonic mean is best used in situations where
extreme outliers exist in the population. The harmonic mean can be manually
calculated; however, most people will find it much easier to just use Excel. In Excel,
the harmonic mean can be calculated by using the HARMEAN() function
The arithmetic mean is best used in situations where:
the data are not skewed (no extreme outliers)
the individual data points are not dependent on each other (see the section below for
examples of where data are interrelated, e.g., financial analysis)
Geometric means are often useful summaries for highly skewed dataWhen
there is growth or trend observed geometric mean is best

Statistics
Functions of statistics
Some important functions of statistics are as follows
1. To collect and present facts in a systematic manner.
2. Helps in formulation and testing of hypothesis.
3. Helps in facilitating the comparison of data.
4. Helps in predicting future trends.
5. Helps to find the relationship between variable.
6. Simplifies the mass of complex data.
7. Help to formulate polices.
8. Helps Government to take decisions.
Limitations of statistics
1. Does not study qualitative phenomenon.
2. Does not deal with individual items.
3. Statistical results are true only on an average.
4. Statistical data should be uniform and homogeneous.
5. Statistical results depends on the accuracy of data.
6. Statistical conclusions are not universally true.
7. Statistical results can be interpreted only if person has
sound knowledge of
statistics
Data collection
Central tendency and Dispersion
Central tendency is middle point of distribution measures of central tendency is also
called measure of location
Dispersion is spread of data in distribution extent to which data is scattered
There are two more characteristics skewness and kurtosis

Mean of individual data x/n
Mean for grouped data (fXx)/n x= midpoint of class
Arithmetic mean has following advantages
1. Simple to understand
2. It is one and only one for data set
3. Mean is suitable for statistical procedure
Disadvantage
1.Afected by extreme observation
2.It is not representative of whole data
Weighted average mean
Geometric mean
Median


Basic concept of Probability Rolling of die
Taking out card from deck of cards
Probability of 5 or 6 P(5) or P(6) = 1/6+1/6
Probability of sped and queen P(s) or P(q) = 13/52+4/52-1/52
P(s) +P(q) P(s AND q)
And or are called as operators
Basic concept of Probability
Basic concept of Probability
Basic concept of Probability
Basic concept of Probability
Basic concept of Probability
Basic concept of Probability

S-ar putea să vă placă și