Sunteți pe pagina 1din 25

Introduction

to
Statistical Inference

By
Dr. Saddam Hussain
Objectives
To define statistics
To discuss the wide range of applications
of statistics in business
To understand the branches of statistics
To describe the levels of measurement of
data
What is Statistics?
A collection of tools used for converting raw data into
information to help decision makers in their works.
Science of collecting, organizing, presenting,
analyzing, and interpreting data for the purpose of
assisting in making more effective decision
Branch of mathematics
Facts and figures
What is Statistics?

“Statistics is a way to get information from data”

Statistics

Data Information

Data: Facts, especially Information: Knowledge


numerical facts, collected communicated
together for reference or concerning some
information. particular fact.

Statistics is a tool for creating new understanding


from a set of numbers.
Applications of Statistics in Business
 Accounting – auditing and cost estimation
 Finance – investments and portfolio management
 Human resource – compensation, job satisfaction,
performance measure
 Operation – quality management, forecasting, MIS,
capacity planning, materials control
 Marketing - market analysis, consumer research,
pricing
 Economics – regional, national, and international
economic performance
 International Business- market and demographic
analysis.
Key Statistical Concepts…
Population
— a population is the group of all items of interest to a
statistics practitioner.
— frequently very large; sometimes infinite.
e.g. All blue collar workers in Pakistan
Sample
— A sample is a set of data drawn from the population.
— Potentially very large, but less than the population.
e.g. a sample of 765 blue collar workers
Key Statistical Concepts…

Parameter
— A descriptive measure of a population.

Statistic
— A descriptive measure of a sample.
Key Statistical Concepts…
Population Sample

Subset

Statistic
Parameter
 Populations have Parameters,
 Samples have Statistics.
Branches of Statistics

Statistics

Descriptive Statistics Inferential Statistics

Parametric Statistics Non-Parametric Statistics


Descriptive Statistics…
 …are methods of organizing, summarizing, and
presenting data in a convenient and informative way.
These methods include:
 Graphical Techniques
 Numerical Techniques
 The actual method used depends on what information
we would like to extract. Are we interested in…
 measure(s) of central location? and/or
 measure(s) of variability (dispersion)?
 Descriptive Statistics helps to answer these
questions…
Inferential Statistics…
 Descriptive Statistics describe the data set that’s
being analyzed, but doesn’t allow us to draw any
conclusions or make any interferences about the data.
Hence we need another branch of statistics: inferential
statistics.

 Inferential statistics is also a set of methods, but it is


used to draw conclusions or inferences about
characteristics of populations based on data from a
sample.
Statistical Inference…
Statistical inference is the process of making an
estimate, prediction, or decision about a population
based on a sample.
Population

Sample

Inference

Statistic
Parameter

What can we infer about a Population’s Parameters


based on a Sample’s Statistics?
Population Vs Sample
Population
 A population is a collection of all the elements
we are studying and about which we are trying
to draw conclusions.
 All items of interest
 Group of interest to investigator
Sample
 A sample is a collection of some, but not all of
the elements of the population.
 Portion of population
 Will be used to reach conclusions about population
Statistical Inference…
We use statistics to make inferences about
parameters.

Therefore, we can make an estimate,


prediction, or decision about a population
based on sample data.

Thus, we can apply what we know about a


sample to the larger population from which it
was drawn!
Statistical Inference…
 Rationale:
•Large populations make investigating each member
impractical and expensive.
•Easier and cheaper to take a sample and make
estimates about the population from the sample.
 However:
Such conclusions and estimates are not always
going to be correct.
For this reason, we build into the statistical
inference “measures of reliability”, namely
confidence level and significance level.
Confidence & Significance Levels…
The confidence level is the proportion of times that an
estimating procedure will be correct.
E.g. a confidence level of 95% means that,
estimates based on this form of statistical
inference will be correct 95% of the time.
When the purpose of the statistical inference is to
draw a conclusion about a population, the
significance level measures how frequently the
conclusion will be wrong in the long run.
E.g. a 5% significance level means that, in the
long run, this type of conclusion will be wrong
5% of the time.
Process of Inferential Statistics

Calculate x
to estimate 
Population Sample
 x
(parameter ) (statistic )

Select a
random sample
Types of Data and Information
Definitions…
A variable is some characteristic of a population or
sample.
E.g. student grades; workers salary
Typically denoted with a capital letter: A, A-, B+, B, B-…
The values of the variable are the range of possible
values for a variable.
E.g. student marks (0..100)
Data are the observed values of a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Types of Data & Information
Data (at least for purposes of Statistics) fall into
three main groups:

 Interval Data
 Nominal Data
 Ordinal Data
Interval Data…
Interval data
• Real numbers, i.e. heights, weights, prices,
etc.
• Also referred to as quantitative or numerical.

Arithmetic operations can be performed on


Interval Data, thus its meaningful to talk about
2*Height, or Price + $1, and so on.
Nominal Data…
Nominal Data
• The values of nominal data are categories.
E.g. responses to questions about marital status,
coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4

Because the numbers are arbitrary, arithmetic


operations don’t make any sense (e.g. does Widowed
÷ 2 = Married?!)
Nominal data are also called qualitative or categorical.
Ordinal Data…
Ordinal Data appear to be categorical in nature, but their
values have an order; a ranking to them:
E.g. College course rating system:
poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

While its still not meaningful to do arithmetic on this data


(e.g. does 2*fair = very good?!), we can say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric
values are assigned to each category.
E.g. Representing Student Grades…

N Interval Data
Data Categorical?
e.g. {0..100}
Y

Y Ordinal Data
Ordered?
e.g. {F, D, C, B, A}
Categorical
Data N Rank order to data

Nominal Data
e.g. {Pass | Fail}

NO rank order to data


Calculations for Types of Data
As mentioned above,
• All calculations are permitted on interval
data.
• Only calculations involving a ranking process
are allowed for ordinal data.
• No calculations are allowed for nominal
data, only counting the number of
observations in each category is possible.
This lends itself to the following “hierarchy
of data”…
Hierarchy of Data…
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent
categories.
Only calculations based on the frequencies of
occurrence are valid.
Data may not be treated as ordinal or interval.

S-ar putea să vă placă și