Sunteți pe pagina 1din 195

Business

Statistics

Course
Course
Index
Index
Particulars

S.
No.

Referenc
e No.

1.

Chapter 1

Introduction to Business Statistics

08 21

2.

Chapter 2

Descriptive Statistics: Collection, Processing and


Presentation of Data

22 36

3.

Chapter 3

Measures of Central Tendency

37 51

4.

Chapter 4

Measures of Dispersion

52 66

5.

Chapter 5

Skewness and Kurtosis

67 79

6.

Chapter 6

Correlation Analysis

80 98

7.

Chapter 7

Regression Analysis

99 114

8.

Chapter 8

Theory of Probability

115 134

9.

Chapter 9

Probability Distribution

135 153

10.

Chapter 10

Use of Excel Software for Statistical Analysis

154 194

1 2

Slide
From
To

Course Introduction
Introduction
Course

Managerial decision-making can be made efficient and effective

by

analyzing available data using appropriate statistical tools.

Statistical

tools not only have application in research (marketing

research

included) but also in other functional areas like quality

management,

inventory management, financial analysis, human

resource planning

1 3

and so on.

Cont.

The word statistics is derived from the Italian word Stato which
means state; and Statista refers to a person involved with the

affairs of state. Thus, statistics originally was meant for collection


of facts

useful for affaires of the state, like taxes, land records,

population demography, etc.

1 4

Cont.

Significant contribution has also been made by Indians in the

field of statistics. Prof Prasant Chandra Mahalanobis, is the first to


pioneer
Indian

the study of statistical science in India. He founded the


Statistical Institute (ISI) in1931. Mahalanobis viewed

statistics as a
and also

tool in increasing the efficiency of all human efforts

concentrated on sample surveys.

Statistics are the classified facts representing the conditions of

the people in the state. specially those facts which can be stated
in

number or in table of numbers or in any tabular or classified

arrangement. Webster
1 5

Cont.

Statistical methods are broadly divided into five categories.

These

are Descriptive Statistics, Analytical Statistics, Inductive

Statistics,

Inferential Statistics, Applied Statistics

Statistics is an indispensable tool of production control and

market research. Statistical tools are extensively used in business


for

time

investment

and

motion

decisions,

compensations,

credit

study,

consumer

performance
ratings,

accounting, quality control, distribution


1 6

behaviour

study,

measurements

inventory

and

management,

channel design, etc.


Cont.

Statistical analysis is a vital component in every aspect of

research.

Social surveys, laboratory experiment, clinical trials,

marketing
management,

research,
quality

human

management

treatment before arriving at

resource

planning,

etc.,

require

inventory
statistical

valid conclusions.

Functions of statistics are Condensation, Comparison, Forecast,


Testing of hypotheses, Preciseness, Expectation.

Statistical techniques, because of their flexibility and economy,

have

become popular and are used in numerous fields. But

statistics is not a cure-all technique and has limitations. It cannot


be applied to all
1 7

answer all queries.

kinds of situations and cannot be made to

S. No.

Introduction to
to Business
Business
Introduction
Statistics
Statistics

Reference
No.

1.

1 8

Particulars

Slide
From To

Learning Objectives

09 09

2.

Topic 1

Introduction

10 10

3.

Topic 2

Development of Statistics

11 11

4.

Topic 3

Definitions of Statistics

12 12

5.

Topic 4

Importance of Statistics

13 13

6.

Topic 5

Classification of Statistics

14 14

7.

Topic 6

Role of Statistics

15 15

8.

Topic 7

Functions of Statistics

16 16

9.

Topic 8

Limitations of Statistics

17 17

10.

Topic 9

Summary

18 21

Learning Objectives
After studying this chapter, you should be able to:

Understand the development, importance and role of statistics

Explain the basic concept of statistical studies

Understand the application of statistics in business and management

Learn about functions and limitations of statistics

1 9

Introduction

Information derived from good statistical analysis is always precise and


never useless.

One of the primary tasks of a manager is decision-making.


Statistical techniques offer powerful tools in the decision-making

process.

These tools have power to interpret quantitative information in a

scientific

1 10

and an objective manner.

Development of Statistics

The word statistics is derived from the Italian word Stato which means
state; and Statista refers to a person involved with the affairs of

state.

Statistics originally was meant for collection of facts useful for affaires

of the

state, like taxes, land records, population demography, etc.

During ancients times even before 300BC, the rulers and kings, like
Chandragupta Maurya used statistics to maintain the land and revenue
records, collection of taxes and registration of births and deaths.

1 11

Definitions of Statistics

Statistics are the classified facts representing the conditions of the

people

in the state. specially those facts which can be stated in

number or in

table of numbers or in any tabular or classified

arrangement.
Webster

By statistics we mean quantitative data affected to a marked extent

by

multiplicity of causes.
Yule and Kendall

Statistics may be defined as the science of collection, presentation,

analysis and interpretation of data.


1 12

Croxton and Cowden

Importance of Statistics

Identify what information or data is worth collecting,

Decide when and how judgments may be made on the basis of partial
information, and

Measure the extent of doubt and risk associated with the use of partial
information and stochastic processes.

1 13

Classification
Statistics

of

Descriptive
Descriptive
Statistics
Statistics
Analytical
Statistics
Analytical
Statistics

Inductive
Statistics
Inductive
Statistics

Inferential
Statistics
Inferential
Statistics

Applied
Statistics
Applied
Statistics

1 14

Role of Statistics

Role of
Statistics in
Business

Role of
Statistics
in
Decision
Making

1 15

Role of
Statistics in
Research

Functions of Statistics
Laws of Statistics
Condensation
Condensation

Forecast
Forecast

Preciseness
Preciseness

1 16

Comparison
Comparison

The
Law
ofof
Statistical
The
Law
Statistical
Regularity
Regularity

Testing of
Testing of The
Law
ofof
Inertia
ofof
Large
Number
Law
Inertia
Large
Numbe
HypothesesThe
Hypotheses

Expectation
Expectation

Limitations of Statistics

COMMON STATISTICAL ISSUES

DISTRUST OF STATISTICS

MISUSE OF STATISTICS

1 17

Summary

Managerial decision-making can be made efficient and effective by

analyzing

available data using appropriate statistical tools. Statistical

tools not only

have application in research (marketing research

included) but also in other functional areas like quality management,


inventory management, financial

analysis, human resource planning and

so on.

The word statistics is derived from the Italian word Stato which means
state; and Statista refers to a person involved with the affairs of

state. Thus, statistics originally was meant for collection of facts useful for
affaires of the
etc.
1
18

state, like taxes, land records, population demography,


Cont.

Significant contribution has also been made by Indians in the field of

statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer the study
of statistical science in India. He founded the Indian Statistical Institute
(ISI)

in1931. Mahalanobis viewed statistics as a tool in increasing the

efficiency of all human efforts and also concentrated on sample surveys.

Statistics is the classified facts representing the conditions of the

people in

the state. specially those facts which can be stated in

number or in table of numbers or in any tabular or classified arrangement.

Statistical methods are broadly divided into five categories. These are

Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential


Statistics and Applied Statistics.
1 19

Cont.

Statistics is an indispensable tool of production control and market

research.
motion

Statistical tools are extensively used in business for time and

study,

consumer

performance measurements

behaviour

study,

investment

decisions,

and compensations, credit ratings, inventory

management, accounting, quality control, distribution channel design, etc.

Statistical analysis is a vital component in every aspect of research.

Social

surveys, laboratory experiment, clinical trials, marketing research,

human

resource planning, inventory management, quality management,

etc., require statistical treatment before arriving at valid conclusions.

Functions of statistics are Condensation, Comparison, Forecast, Testing

of

hypotheses, Preciseness and Expectation.

1 20

Cont.

Statistical techniques, because of their flexibility and economy, have

become popular and are used in numerous fields. But statistics is not a
cure-all technique and has limitations. It cannot be applied to all kinds of
situations

and cannot be made to answer all queries.

More dangerous than distrust is misuse of statistics to draw convenient

conclusions to satisfy selfish or ulterior motives. Arguments and analysis


supported by facts, figures, charts, graphs, index numbers, etc. are indeed
very appealing and convincing. They can be used to intimidate opposing
views. Hence, statistics is open to manipulation.

1 21

Descriptive Statistics:
Statistics: Collection,
Collection,
Descriptive
Processing and
and Presentation
Presentation of
of Data
Data
Processing
S. No.

Reference
No.

1.

1 22

Particulars

Slide
From To

Learning Objectives

23 23

2.

Topic 1

Introduction

24 24

3.

Topic 2

Descriptive and Inferential Statistics

25 26

4.

Topic 3

Collection of Data

27 27

5.

Topic 4

Editing and Coding of Data

28 28

6.

Topic 5

Classification of Data

29 29

7.

Topic 6

Tabulation of Data

30 30

8.

Topic 7

Diagrammatic and Graphical Presentation of


Data

31 32

9.

Topic 8

Summary

33 36

Learning Objectives
After studying this chapter, you should be able to:

Describe descriptive and inferential statistics

Explain collection, editing and classification of primary and secondary

data

Define tabulation and presentation of data

Understand diagrammatic and graphical presentation

Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons

and Ogives
1 23

Introduction

Success of any statistical investigation depends on the availability of


accurate and reliable data.

These depend on the appropriateness of the method chosen for data


collection.

Data collection is a very basic activity in decision-making.

Data may be classified either as primary data or secondary data.

Successful use of the collected data depends to a great extent upon

the way it is arranged, displayed and summarized.

1 24

Descriptive
Statistics

and

Inferential

Descriptive Statistics

Descriptive statistics is
the type of statistics

that

probably comes to

most

of the minds of

people

when they hear

the word

1 25

statistics.

Cont.

Inferential Statistics

Inferential

studies

statistics

a statistical sample,

and from this analysis we are


able to say something

1 26

about

the population from

which

the sample came.

Collection of Data
Types of Data Primary and Secondary

Methods of Collecting Primary Data

Merits and Demerits of Collecting Primary Data

Methods of Collecting Secondary Data

Designing Questionnaire

1 27

Editing and Coding of Data


Editing Primary Data

Completeness

Consistency

Accuracy

Field Editing

Central Editing

Coding is the process of

assigning some symbols either


alphabetical or numeral or

Homogeneity

Editing Secondary Data

1 28

Coding of Data

both to the answers so that


the responses can be recorded
into a limited number of classes
or

categories.

Classification

of

Data
Classification refers to the
grouping

of

data

into

homogeneous classes and


categories.
process

It
of

is

the

arranging

things in groups or classes


according
resemblances
affinities.
1 29

to

their
and

Bases
Rulesof
of
Frequency
Classification
Classification
Distribution

Tabulation of Data

Types of
Types
of
Tabulatio
Tabulatio
n
n

Tabulation is arranging
the data in flat table

(two dimensional arrays)


format by grouping the
observations.

Table is a spreadsheet

One
One

Way
Way
Tabulatio
Tabulatio
n
n

Advantag
Advantag
es of
es
of
Tabulatio
Tabulatio
n
n

with rows and columns


with
stubs

headings

indicating class of

the data.

1 30

and
Two
Two

Way
Way
Tabulatio
Tabulatio
n
n

Multi
Multi

Way
Way
Tabulatio
Tabulatio
n
n

Diagrammatic and Graphical Presentation


of Data
Difference Between Diagrams
And Graphs

Difference between Diagram and Graphs


Diagram

Graph

1. Can be drawn on an ordinary 1. Can be drawn on a graph


paper.
paper.
2. Easy to grasp.

2. Needs some effort to


grasp.

3. Not capable of analytical


treatment.

3. Capable of analytical
treatment.

4. Can be used only for


comparisons.

4. Can be used to represent


a
mathematical relation.

5. Data are represented by


bars, and
rectangles, pictures, etc.

5. Data are represented by


lines curves.

1 31

Cont.

TYPES OF DIAGRAMS
TYPES OF DIAGRAMS
BAR DIAGRAM
BAR DIAGRAM
HISTOGRAM
HISTOGRAM
PIE DIAGRAM
PIE DIAGRAM
FREQUENCY POLYGON
FREQUENCY POLYGON
OGIVES
OGIVES
1 32

Summary

There are two major divisions of the field of statistics, namely

descriptive and
important, and

inferential statistics. Both the segments of statistics are


accomplish different objectives.

Data can be obtained through primary source or secondary source

according

to

need,

availability. The most

situation,

convenience,

time,

resources

and

important method for primary data collection is

through questionnaire. Data

must be objective and fact-based so that it

helps a decision-maker to arrive at a better decision.

Statistical data is a set of facts expressed in quantitative form. Data is

collected through various methods. Sometimes our data set consists of the
entire population we are interested in. In other situations, data may
constitute

1 33

a sample from some population.


Cont.

Type of research, its purpose, conditions under which the data are

obtained

will determine the method of collecting the data. If relatively

few items of

information are required quickly, and funds are limited

telephonic interviews are recommended. If respondents are industrial clients


Internet could also be used. If depth interviews and probing techniques are
to be used, it is necessary to employ investigators to collect data.

The quality of information collected through the filling of a questionnaire


depends, to a large extent, upon the drafting of its questions. Hence, it

is

extremely important that the questions be designed or drafted very

carefully

and in a tactful manner.

Before any processing of the data, editing and coding of data is

necessary to ensure the correctness of data. In any research studies, the


voluminous data

can be handled only after classification. Data can be

presented through tables and charts.


1 34

Cont.

Classification refers to the grouping of data into homogeneous classes

and categories. It is the process of arranging things in groups or classes


according

to their resemblances and affinities.

A frequency distribution is the principle tabular summary of either

discrete data or continuous data. The frequency distribution may show


actual, relative

or cumulative frequencies. Actual and relative frequencies

may be charted as

either histogram (a bar chart) or a frequency polygon.

Two commonly used

graphs of cumulative frequencies are less than ogive

or more than ogive.

Once the raw data is collected, it needs to be summarized and

presented to the decision-maker in a form that is easy to comprehend.


Tabulation not only

condenses the data, but also makes it easy to

understand. Tabulation is the

fastest way to extract information from the

mass of data and hence popular


1 35

statistical method.

even among those not exposed to


the
Cont.


also

The charts help in grasping the data and analyze it qualitatively. This
helps managers to effectively present the data as a part of reports.

Various types of chart are bar diagram, multiple bar diagrams, component
bar diagram, deviation bar diagram, sliding bar diagram, Histogram and Pie
charts.

A graphic presentation is another way of representing the statistical

data in a
which we have

1 36

simple and intelligible form. There are two types of graphs


discussed, line graphs and ogives.

Measures of
of Central
Central Tendency
Tendency
Measures
S. No.

Referenc
e No.

1.

Particulars

Slide
From To

Learning Objectives

38 38

2.

Topic 1

Introduction

39 39

3.

Topic 2

Characteristics of Central Tendency

40 41

4.

Topic 3

Arithmetic Mean

42 42

5.

Topic 4

Median

43 43

6.

Topic 5

Mode

44 44

7.

Topic 6

Empirical Relationship between Mean, Median and


Mode

45 45

8.

Topic 7

Limitations of Central Tendency

46 46

9.

Topic 8

Summary

47 51

1 37

Learning Objectives
After studying this chapter, you should be able to:

Understand the concept and characteristics of central tendency

Describe all the measures of central tendency: mean, median and

mode.

Explain merits and demerits of all measures of central tendency.

Discuss partition values or positional measures like quartiles, deciles

and percentiles.
1 38

Introduction

The concept of central tendency plays a dominant role in the study of


statistics.

In many frequency distributions, the tabulated values show a distinct


tendency to cluster or to group around a typical central value.

This behaviour of the data to concentrate the values around a central

part of

1 39

distribution is called Central Tendency of the data.

Characteristics of Central Tendency


A good measure of central tendency should possess as far as possible the
following characteristics:

Easy to understand.

Simple to compute.

Based on all observations.

Uniquely defined.

Possibility of further algebraic treatment.

Not unduly affected by extreme values.

1 40

Cont.

Common Measures of Central Tendency

Mean

Median

1 41

Mode

Arithmetic Mean

The arithmetic mean

of

series

quotient

is

the

obtained

by

dividing the sum of the


values

by the number of

items.

In

algebraic

language,

if

X3....... Xn

are the n

values of a

variate X.

1 42

X1,

X2,

Properties of Arithmetic Mean


Properties of Arithmetic Mean

Calculation of Simple Arithmetic


Calculation of Simple Arithmetic
Mean
Mean
Merits and Demerits of
Merits and Demerits of
Arithmetic Mean
Arithmetic Mean
Weighted Arithmetic Mean
Weighted Arithmetic Mean

Median
Median is the value, which divides the distribution of data,
arranged in ascending or descending order, into two equal
parts. Thus, the Median is a value of the middle observation.

1 43

Calculation of Median

Merits and Demerits of Median

Partition Values or Positional Measures

Quartiles

Deciles

Percentiles

Mode

Mode is the value

which has the greatest


frequency density. Mode is
denoted by Z.

Merits and Demerits of

Mode

1 44

Calculation of Mode

Graphic Location of Mode

Empirical Relationship between Mean, Median and


Mode

A distribution in which the mean, the median, and the mode

coincide is

known as symmetrical (bell shaped) distribution.

Normal distribution

is one such a symmetric distribution, which is

very commonly used.

If the distribution is skewed, the mean, the median and the mode

are not

equal. In a moderately skewed distribution distance between

the mean

and the median is approximately one third of the distance

between the mean and the mode. This can be expressed as:
Mean Median = (Mean Mode) / 3
1 45

Mode = 3 * Median 2 * Mean

Limitations of Central Tendency

In case of highly skewed data.

In case of uneven or irregular spread of the data.

In open end distributions.

When average growth or average speed is required.

When there are extreme values in the data.

Except in these cases AM is widely used in practice.

1 46

Summary

Measures of the central tendency give one of the very important

characteristics of the data. According to the situation, one of the various


measures

of

central

tendency

may

be

chosen

as

the

most

representative.

Arithmetic mean is widely used and understood. What characterizes

the three measures of centrality, and what are the relative merits of each
in the

given situation, is the question.

Mean summarizes all the information in the data. Mean can be

visualized as a single point where all the mass (the weight) of the
observations is

concentrated. It is like a centre of gravity in physics.

Mean also has some


useful in the context of
1 47

desirable mathematical properties that make it


statistical inference.

Cont.

To simplify the manual calculation, we may sometimes use shift of

origin and

change of scale. Shifting of origin is achieved by adding or

subtracting a
add

or

subtract

constant to all observations. In case of discrete data we


(usually

subtract)

constant

to

the

individual

observations. Whereas for grouped data, we add or subtract (usually


subtract) the constant to the class mark values.

There are cases where relative importance of the different items is not

the same. In such a case, we need to compute the weighted arithmetic


mean.

The procedure is similar to the grouped data calculations studied

earlier, when we consider frequency as a weight associated with the classmark.

Median is the middle value when the data is arranged in order. The

median

is resistant to the extreme observations. Median is like the

geometric centre in physics. In case we want to guard against the influence


1
of48a

Cont.

few outlying observations (called outliers), we may use the median.

Quantiles are related positional measures of central tendency. These

are useful and frequently employed measures. Most familiar quantiles are
Quartiles, Deciles, and Percentiles.
Quartiles are position values similar to the Median. There are three

quartiles denoted by Q1, Q2 and Q3. Q1 is called the lower Quartile or


first

quartile. The second quartile Q2 is nothing but the median. In a

distribution, one fourth of the item are less then Q1 and the other th
item are greater then Q1 is called the upper quartile (or) the 3rd quartile.
Inter-quartile range is defined as the difference between the first and

third

quartile. It is a measure of spread of the data.

D1, D2, D3 and D9 are the nine deciles. They divide a series into 10

equal

parts. One tenth of the items are less than or equal to D1. One

tenth of the items are more than or equal to D9 and one tenth of the items
between any
ascending
1
49

order

successive pairs of deciles when all the items are in


Cont.

Pth percentile of a group of observations is that observation below

which lie

P% (P percent) observations. The position of Pth percentile is

given by

, where n is the number of data points.

If the value of

is a fraction, we need to interpolate the

value.

The Mode of a data set is the value that occurs most frequently. There

are many situations in which arithmetic mean and median fail to reveal the
true

characteristics of a data (most representative figure), for example,

most

common size of shoes, most common size of garments etc. In such

cases,

mode is the best-suited measure of the central tendency.

A distribution in which the mean, the median, and the mode

coincide is

known as symmetrical (bell shaped) distribution.

Normal distribution is one such a symmetric distribution, which is very


1 50

commonly used.

Cont.

This can be expressed as:

Mean Median = (Mean Mode) / 3

Mode = 3 * Median 2 * Mean

No single average can be regarded as the best or most suitable under

all

circumstances. Each average has its merits and demerits and its own
particular field of importance and utility. A proper selection of an

average depends on the (1) nature of the data and (2) purpose of enquiry
or

1 51

requirement of the data.

Measures of
of Dispersion
Dispersion
Measures
S. No.

Reference
No.

1.

Particulars

Slide
From To

Learning Objectives

53 53

2.

Topic 1

Introduction

54 54

3.

Topic 2

Characteristics of Measures of Dispersion

55 55

4.

Topic 3

Absolute and Relative Measures of Dispersion

56 57

5.

Topic 4

Range

58 59

6.

Topic 5

Inter-quartile Range and Deviations

60 60

7.

Topic 6

Variance and Standard Deviation

61 62

8.

Topic 7

Summary

63 66

1 52

Learning Objectives
After studying this chapter, you should be able to:

Understand absolute and relative measures of variation

Learn about range and inter-quartile range

Discuss variance, standard deviation, mean deviation and coefficient

of

variation

Study the empirical relationship between different measures of

variation
1 53

Introduction
A

dispersion
any

data

extent

to

numerical
spread

measure

Data is useful:

of

or variation in
shows

the

which

the

values tend to

about an average.

To

results

current

with the past results.

of observations.
To

control

1 54

the

To compare two are more

sets

compare

suggest

methods

variation in the data.

to

Characteristics
Dispersion

of

Measures

of

It should be simple to understand.


It should be
be easy
easy to
to compute.
compute.
It
It should
should be rigidly defined.
defined.
It
It should
should be based on each individual item of the
distribution.
distribution.
It
It should
should be capable of further algebraic treatment.
It should have
have sampling
sampling stability.
It should not be unduly affected by the extreme items.
1 55

Absolute and Relative Measures of Dispersion

Relative or Coefficient of dispersion is the ratio or the percentage of

measure of absolute dispersion to an appropriate average.

A precise measure of dispersion is one which gives the magnitude of

the variation in a series, i.e. it measures in numerical terms, the extent of


the scatter of the values around the average.

1 56

Cont.

ABSOLUTE AND RELATIVE MEASURES OF


DISPERSION
Measures of

Relative Variability

Dispersion
The range

Relative range

The Quartile Deviation

Relative Quartile Deviation

The Mean Deviation

Relative Mean deviation

The Median Deviation

Coefficient of Variation

The Standard Deviation


Graphical Method

1 57

Range
The

Range

of

the

data

is

the

difference between the largest value


of data and smallest value of data.

1 58

Cont.

Merits and Demerits of Range


Merits

Range is a simplest method of studying dispersion.

It takes lesser time to compute the absolute and relative range.

Demerits

Range does not take into account all the values of a series, i.e. it

considers

only the extreme items and middle items are not given any

importance.

Range cannot be computed in the case of open ends distribution i.e.,

distribution where the lower limit of the first group and upper limit of

the higher group is not given.


1 59

Inter Quartile Range and Deviations


Inter-quartile Range

Inter-quartile range is a difference between upper quartile (third

quartile)

and lower quartile (first quartile).

Quartile Deviation

Quartile Deviation is the average of the difference between upper

quartile and lower quartile.


Mean Deviation

Mean deviation is the arithmetic mean of the absolute deviations of

the values about their arithmetic mean or median or mode.

1 60

Variance

and

Standard

Deviation
Variance is defined as the average
of squared deviation of data points
from their mean.

1 61

Cont.

Different Formulae
Different Formulae
for Calculating
for Calculating
Variance
Variance

Calculation
Calculation
of Standard
of Standard
Deviation
Deviation

Properties
Properties
of Standard
of Standard
Deviation
Deviation

Merits and
Merits and
Demerits of
Demerits of
Standard Deviation
Standard Deviation

Standard
Standard
Deviation of
Deviation of
Combined Means
Combined Means

Coefficient
Coefficient
of Variation
of Variation

Empirical
Empirical
Relationship
Relationship
Between
Between
Different Measures
Different Measures
of
of
Variation
Variation
1 62

Summary

Study of distribution is very important for decision-making. Usually,

measures

of central tendency and variability are adequate for taking

decision. However,

if data is quite different from normal distribution then

measure skewness and

kurtosis need to be considered. We discussed

measures of variability: Range,

Variance and Standard Deviation.

A measure of dispersion gives an idea about the extent of lack of

uniformity in the sizes and qualities of the items in a series. It helps us to


know the degree of uniformity and consistency in the series. If the difference
between items is large the dispersion or variation is large and vice versa.

1 63

Cont.

The measures of dispersion can be either absolute or relative.

Absolute

measures of dispersion are expressed in the same units in

which the original

data are expressed. For example, if the series is

expressed as Marks of the students in a particular subject; the absolute


dispersion will provide the value

in Marks. The only difficulty is that if two

or more series are expressed in

different units, the series cannot be

compared on the basis of dispersion.

The Range of the data is the difference between the largest value of

data and

smallest value of data. This is an absolute measure of

variability. However, if

we have to compare two sets of data, Range

may not give a true picture. In such case, relative measure of range, called
coefficient of range is used.

Inter-quartile range is a difference between upper quartile (third

quartile) and lower quartile (first quartile). Quartile Deviation is the average
Cont.

1 64

of the

difference between upper quartile and lower quartile.

Average used for calculating deviation can be the mean, the median or

the mode. However, usually the mean is used. There is also an advantage of
taking

deviations from the median, because Mean Deviation from median

is lowest

as compared to any other Mean Deviations. Since absolute

values of

deviations ignoring sign are taken for calculating Mean

Deviation, the mean

deviation is not amenable to further algebraic

treatment.

The variance is the average squared deviation of the data from their

mean.

For sample data, we take the average by dividing with (n-1) where n

is a sample size. This is to cater for degree of freedom. For population data,
we average by dividing with the population size N.

The Standard Deviation (SD) of a set of data is the positive square root

of the

variance of the set. This is also referred as Root Mean Square (RMS)

value of the deviations of the data points. SD of sample is the square root
of
Cont.

1 65

the sample

variance

There is no effect of shifting origin on standard deviation or variance.

The measures of deviation are very effective in making reports and

presentations by the business executives to present their data top general


public who do not understand statistical methods.

Variance analysis also helps in managing budgets by controlling

budgeted
compare two

1 66

versus actual costs. Without the standard deviation, you cant


data sets effectively.

Skewness and
and Kurtosis
Kurtosis
Skewness
S. No.

Reference
No.

1.

Particulars

Slide
From To

Learning Objectives

68 68

2.

Topic 1

Introduction

69 70

3.

Topic 2

Karl Pearsons Coefficient of Skewness (SKP)

71 71

4.

Topic 3

Bowleys Coefficient of Skewness (SKB)

72 72

5.

Topic 4

Kellys Coefficient of Skewness (SKK)

73 73

6.

Topic 5

Measures of Kurtosis

74 74

7.

Topic 6

Moments

75 75

8.

Topic 7

Summary

76 79

1 67

Learning Objectives
After studying this chapter, you should be able to:

Understand the concept and different types of skewness

Discuss various measures of kurtosis

Learn about moments, its properties and coefficients based on

moments

1 68

Introduction
Skewness is a measure that studies the degree and direction of departure
from symmetry.

Nature of Skewness
Skewness can be positive or negative or zero.
When the values of mean, median and mode are equal, there is no
skewness.

When mean > median > mode, skewness will be positive.

When mean < median < mode, skewness will be negative.

1 69

Cont.

Characteristic of a Good Measure of Skewness

It should be a pure number in the sense that its value should be

independent of the unit of the series and also degree of variation in the
series.

It should have zero-value, when the distribution is symmetrical.

It should have a meaningful scale of measurement so that we could

easily

interpret the measured value.

Mathematical measures of skewness can be calculated by:

Karl-Pearsons Method

Bowleys Method

Kellys method

1 70

Karl Pearsons Coefficient of Skewness


(SKP)
Karl

Person

has

suggested

two

formulae:

Where the relationship of mean and mode

is

established;

Where the relationship between mean and


median is not established.

1 71

Bowleys Coefficient of Skewness (SKB)

Bowleys method of skewness is based on the values of median, lower

and upper quartiles. This method suffers from the same limitations which
are in

the case of median and quartiles.


Wherever positional measures are given, skewness should be

measured by
open-end series,

Bowleys method. This method is also used in case of


where the importance of extreme values is ignored.

Absolute skewness = Q3 + Q1 2 Median


Coefficient of Skewness, (SkB) =
Where, Q is quartile.

1 72

Kellys Coefficient of Skewness (SKK)


Kellys coefficient of skewness is defined as:
Skk =
Where, P is percentile.
Example: Calculate the Kellys coefficient of skewness from the following
data:

1 73

Measures of Kurtosis

Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis,

more

and more peaked will be the distribution. The kurtosis is calculated

either as

an absolute or a relative value. Absolute kurtosis is always a

positive number.

Negative kurtosis indicates a flatter distribution than


the normal distribution, and called as platykurtic.
A positive kurtosis means more peaked curve, called
Leptokurtic.
Peakedness of normal distribution is called Mesokurtic.

1 74

Moments

The

mean
powers

arithmetic PROPERTIES OF MOMENTS


of

various

of

deviations

these
in

distribution is
the moments of
distribution about
mean.

1 75

any
called
the

COEFFICIENTS BASED ON MOMENTS

Summary

Measures of Skewness and Kurtosis, like measures of central tendency

and dispersion, study the characteristics of a frequency distribution.


Averages tell

us about the central value of the distribution and measures

of dispersion tell

us about the concentration of the items around a

central value.

When two or more symmetrical distributions are compared, the

difference in them is studied with Kurtosis. On the other hand, when two
or more symmetrical distributions are compared, they will give different
degrees of

Skewness. These measures are mutually exclusive i.e. the

presence of skewness implies absence of kurtosis and vice-versa.


1 76

Cont.

Bowleys method of skewness is based on the values of median, lower

and upper quartiles. This method suffers from the same limitations which are
in

the case of median and quartiles. Wherever positional measures are

given,

skewness should be measured by Bowleys method. This method is

also used in

case of open-end series, where the importance of extreme

values is ignored.

Kellys coefficient of skewness is defined as:


Skk =
Where, P is percentile.

1 77

Cont.

Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis,

more
either as

and more peaked will be the distribution. The kurtosis is calculated


an absolute or a relative value. Absolute kurtosis is always a

positive number. Absolute kurtosis of a normal distribution (symmetric bell


shaped distribution)

is taken as 3. It is taken as datum to calculate relative

kurtosis as follows:

Absolute kurtosis =

Relative kurtosis = Absolute kurtosis 3

1 78

Cont.

Moments about mean are generally used in statistics. We use a Greek


alphabet read as mu for these moments. Consider a mass attached at

each

point proportional to its frequency and take moments about the

mean. First, second, third and fourth moments can be used as a measure
of Central
of the curve.

1 79

Tendency, Variation (dispersion), asymmetry and peakedness

Correlation Analysis
Analysis
Correlation
S. No.

Reference
No.

1.

Particulars

Slide
From To

Learning Objectives

81 81

2.

Topic 1

Introduction

82 83

3.

Topic 2

Types of Correlation

84 84

4.

Topic 3

Methods of Calculating Correlation

85 85

5.

Topic 4

Scatter Diagram Method

86 86

6.

Topic 5

Co-variance Method The Karl Pearsons


Correlation Coefficient

87 88

7.

Topic 6

Rank Correlation Method

89 89

8.

Topic 7

Correlation Coefficient using Concurrent


Deviation

90 91

9.

Topic 8

Summary

92 98

1 80

Learning Objectives
After studying this chapter, you should be able to:

Understand the concept of correlation

Study about different types of correlation

Describe various methods of calculating correlation such as scatter

diagram method

Discuss various types of correlation coefficients viz, Karl Pearson

correlation coefficient, rank correlation and coefficient based on concurrent


deviations.
1 81

Introduction

Croxton

and

Cowden

say,

When the relationship is of a


quantitative

nature,

the

appropriate statistical tool for


discovering and measuring the
relationship and expressing it in
a brief formula is known as
correlation.
1 82

Cont.

The study of correlation helps managers in following ways:

To identify relationship of various factors and decision variables.

To estimate value of one variable for a given value of other if both are
correlated. E.g. estimating sales for a given advertising and promotion
expenditure.

To understand economic behaviour and market forces.

To reduce uncertainty in decision-making to a large extent.

1 83

Types
Correlation

of

Positive or Negative Correlation


Positive or Negative Correlation

Simple or Multiple Correlations


Simple or Multiple Correlations

Partial or Total Correlation


Partial or Total Correlation

Linear and Non-linear Correlation


Linear and Non-linear Correlation

1 84

Methods
Correlation

of

Calculating

Scatter
Diagram Method

Karl Pearsons
Coefficient of
Correlation
1 85

Concurrent
Deviation
Method

Rank
Method

Scatter
Method

It

gives

Diagram

us

two

types

of

variables

are

information.

Whether

the

related or not.

If so, what kind of relationship or


estimating

describes
1 86

equation

the relationship.

that

The pattern of points


The pattern of points
obtained by plotting
obtained by plotting
the observed points
the observed points
are knows as scatter
are knows as scatter
diagram.
diagram.

Co Variance Method The Karl Pearsons Correlation


Coefficient
The correlation coefficient measures the degree of association between two
variables X and Y.
Karl Pearsons formula for correlation coefficient is given as,

Where r is the Correlation Coefficient or


Product Moment Correlation Coefficient
between X and Y.
1 87

Cont.

Assumptions Underlying Karl Pearsons Correlation


Coefficient
Interpretation of R

Estimation of Probable Error

1 88

Rank
Method

Correlation

RANK CORRELATION WHEN RANKS ARE GIVEN

RANK CORRELATION WHEN RANKS ARE


NOT GIVEN

RANK CORRELATION WHEN EQUAL


RANKS ARE GIVEN
1 89

Correlation Coefficient using Concurrent Deviation

This is the easiest method to find the correlation between two variables.
Although the method is effective in giving the direction of the correlation

as

positive or negative but fails to give the accurate strength of the

correlation. In

this method we check the fluctuation in each data series as

increasing (+), or

decreasing (-) or equal values. Then we count the

number of items that increase or decrease or remains equal concurrently


and denote as c. The correlation coefficient is then calculated as,

Where, n = total number of pairs.


c = Number of concurrent changes
1 90

Cont.

Example: The data of advertisement expenditure (X) and sales (Y) of a


company for past 10 year period is given below. Determine the
correlation coefficient between these variables and comment the
correlation.

1 91

Summary

In this chapter the concept of correlation or the association between

two variables has been discussed. A scatter plot of the variables may
suggest that the two variables are related but the value of the Pearson
correlation

coefficient r quantifies this association.

Correlation is a degree of linear association between two random

variables.

In these two variables, we do not differentiate them as

dependent and

independent variables. It may be the case that one is the

cause and other is

an effect i.e. independent and dependent variables

respectively. On the other hand, both may be dependent variables on a


third variable.
1 92

Cont.

In business, correlation analysis often helps manager to take decisions

by

estimating the effects of changing the values of the decision variables

like promotion, advertising, price, production processes, on the objective


parameters like costs, sales, market share, consumer satisfaction,
competitive price. The decision becomes more objective by removing
subjectivity to certain extent.

sign

The correlation coefficient r may assume values between 1 and 1. The


indicates whether the association is direct (+ve) or inverse (-ve). A

numerical
value of zero

1 93

value of r equal to unity indicates perfect association while a


indicates no association.

Cont.

The correlation is said to be positive when the increase (decrease) in

the value of one variable is accompanied by an increase (decrease) in the


value

of other variable also. Negative or inverse correlation refers to the

movement

of the variables in opposite direction. Correlation is said to be

negative, if an

increase (decrease) in the value of one variable is

accompanied by a decrease

(increase) in the value of other.

In simple correlation the variation is between only two variables under

study

and the variation is hardly influenced by any external factor. In

other words, if one of the variables remains same, there wont be any
change in other

1 94

variable.

Cont.

In case of multiple correlation analysis there are two approaches to

study the

correlation. In case of partial correlation, we study variation of

two variables

and excluding the effects of other variables by keeping

them under controlled condition.

When the amount of change in one variable tends to keep a constant

ratio to the amount of change in the other variable, then the correlation is
said to be

linear. But if the amount of change in one variable does not

bear a constant

ratio to the amount of change in the other variable then

the correlation is said to be non-linear.

1 95

Cont.

Correlation analysis may also be necessary to eliminate a variable

which

shows low or hardly any correlation with the variable of our

interest. In

statistics, there are number of measures to describe degree of

association

between variables. These are Karl Pearsons Correlation

Coefficient,

Spearmans

rank

correlation

coefficient,

coefficient

of

determination, Yules coefficient of association, coefficient of colligation,


etc.

The correlation coefficient measures the degree of association between

two variables X and Y.

1 96

Karl Pearsons formula for correlation coefficient is given as,

Cont.

The purpose of computing a correlation coefficient in such situations is

to

determine the extent to which the two sets of ranking are in

agreement. The

coefficient that is determined from these ranks is known

as Spearmans rank

coefficient, rs. This is defined by the following

formula:

1 97

Cont.

Although the concurrent deviation method is effective in giving the

direction

of the correlation as positive or negative but fails to give the

accurate

strength of the correlation. In this method we check the

fluctuation in each

data series as increasing (+), or decreasing () or

equal values. Then we

count the number of items that increase or

decrease or remains equal

concurrently and denote as c. The

correlation coefficient is then calculated

Where, n = total number of pairs.


c = Number of concurrent changes

1 98

as,

Regression Analysis
Analysis
Regression
S. No.

Reference
No.

1.

Particulars

Slide
From To

Learning Objectives

100 100

2.

Topic 1

Introduction

101 101

3.

Topic 2

Regression Analysis

102 103

4.

Topic 3

Simple Linear Regression

104 106

5.

Topic 4

Coefficient of Regression

107 108

6.

Topic 5

Non-linear Regression Models

109 109

7.

Topic 6

Correlation Analysis vs Regression Analysis

110 110

8.

Topic 7

Summary

111 114

1 99

Learning Objectives
After studying this chapter, you should be able to:

Understand the concept of regression analysis

Discuss the applicability of regression

Describe simple linear regression and nonlinear regression model.

Learn about coefficient of regression and linear regression equations

1 100

Introduction

In regression analysis we develop an equation called as an

estimating

equation used to relate known and unknown variables.

Then correlation analysis is used to determine the degree of the

relationship between the variables.

In this chapter we will learn, how to calculate the regression line

mathematically.

1 101

Regression Analysis
According to Morris Myers Blair, regression is the measure
According to Morris Myers Blair, regression is the measure
of the average relationship between two or more variables in
of the average relationship between two or more variables in
terms of the original units of the data.
terms of the original units of the data.

1 102

Cont.

Applicability of Regression Analysis

Regression analysis is a branch of statistical


theory which is widely used in all the

scientific

disciplines. It is a basic technique

for measuring

or estimating the relationship

among economic variables that constitute the


essence of

1 103

economic theory and economic life.

Simple Linear Regression


The
The
highest
highest
power
power
of x is
of x is
called
called
as
as
order
order
of the
of the
model.
model.

1 104

This model is used if we have


This model is used if we have
bivariate distribution i.e. only two
bivariate distribution i.e. only two
variables are considered and the
variables are considered and the
best fit curve is approximated to a
best fit curve is approximated to a
straight line.
straight line.

Cont.

Simple Linear Regression Model

The linear regression model uses straight line relationship. Equation of

straight line is of the form,


(1)

Where is the predicted value of Y corresponding to x. and are

constants. Now if we assume the error (deviation) in Y direction is e, we


can write the relationship of X and Y in data points as,

Error e is the amount by which observation will fall off regression line.
Error e is due to random error a and b are called parameters of the
linear regression model whose values are found out from the observed

data.

1 105

Cont.

Linear Regression Equation

Suppose the data points are (x1, y1) (x2, y2) .. (xn, yn) . Then we can

write

from regression equation,


(2)

Thus, sum square of errors is,

To have minimum sum of squares of errors (SSE) we must have the

condition,

1 106

Coefficient of Regression
The coefficients of regression are bYX and bXY. They have following
implications:

Slopes of regression lines of Y on X and X on Y viz. bYX and bXY must

have

same signs (because r cannot be negative).

Correlation coefficient is geometric mean of bYX and bXY.


If both slopes bYX and bXY are positive correlation coefficient r is

positive. If

both bYX and bXY are negative the correlation coefficient r is

negative.

If

indicating perfect correlation.

1 107

Both regression lines intersect at point

Cont.

Properties of Regression Coefficients

The coefficient of correlation is the geometric mean of the

two regression coefficients.

Both the regression coefficients are either positive or

negative. It

means that they always have identical sign i.e.,

either both have positive sign or negative sign.

The coefficient of correlation and the regression coefficients

will also have same sign.

Regression coefficients are independent of the change in

the origin but not of the scale.

1 108

Non
Models

Linear

Regression

Second Degree Model


Other Regression Models
Seasonal Model
Seasonal Model with Trend
Coefficient of Determination

1 109

Correlation Analysis vs Regression Analysis

Degree and Nature of Relationship

Cause and Effect Relationship

Like in correlation, regression analysis can also be studied as simple

and multiple, total and partial, linear and nonlinear, etc.

In correlation, there is no distinction between independent and

dependent

1 110

variables.

Summary

In this chapter, the concept of regression between dependent and

independent variables has been discussed. Regression provides us a


measure
for a value of

of the relationship and also facilitates to predict one variable


other variable.

Unlike correlation analysis, in regression analysis, one variable is

independent and other dependent. Please note that this relationship need
not be a cause-effect relationship.

Regression analysis is a branch of statistical theory which is widely

used in all the scientific disciplines. It is a basic technique for measuring or


estimating the relationship among economic variables that constitute
the essence of economic theory and economic life. The uses of regression
analysis are not confined to economic and business activities. Its
applications are extended to almost all the natural, physical and social
sciences.
1
111

Cont.


i.e.

Simple linear regression model is used if we have bivariate distribution


only two variables are considered and the best fit curve is

approximated to

a straight line. This describes the liner relationship

between two variables.

Although it appears to be too simplistic, in many

business situations, it is

adequate. At least, initial study can be based on

this model for any decision

making situation.

We have studied simple linear, non-linear and multiple regression

models. For multiple regression and non-linear regression models, MS Excel


or any

other computer package would help in reducing voluminous

calculations. We also discussed coefficient of determination as a measure


of the strength of
1 112

relationship.
Cont.

Least square principle can also be applied to the fitting of a second

degree

polynomial which may be useful in business situation if we have

some idea

that the relationship between two variables is parabolic. In

any case second

degree polynomial fit is more likely to be better

approximation of the actual

relationship. We may use second order

model (parabolic trend) if we feel that

the variation is parabolic.

The least square approximation can be calculated easily for low degree
polynomials, like linear, parabolic, cubic, etc. But for higher degrees

(more

than three), the system of normal equations becomes ill

conditioned. This causes large errors in values of coefficients. Then the


approximation

becomes

orthogonal polynomials are

1 113

incorrect.

To

avoid

these

problems,

used for approximation.

Cont.

Mean Square Error (MSE) is an estimate of the variance of the

regression

error. MSE depends on the values of data and its scales.

Hence we need a measure that calculates relative degree of variation so


that it can be

compared for the fits obtained from different models and

for different data sets. Coefficient of determination is such a measure.

Coefficient of determination is a measure of the strength of the

regression fit.

It is an estimator of population parameter of correlation

and can be obtained

directly from a decomposition of variation in Y into

two components, viz. due

to error and due to regression. Error is a

deviation of a data point from its

respective group mean. Thus error is

the deviation of a data from its

predicted values explained by the

regression line.

1 114

Theory of
of Probability
Probability
Theory
S. No.

Reference
No.

1.

1 115

Particulars

Slide
From To

Learning Objectives

116 116

2.

Topic 1

Introduction

117 117

3.

Topic 2

Important Terms in Probability

118 119

4.

Topic 3

Kinds of Probability

120 120

5.

Topic 4

Simple Propositions of Probability

121 125

6.

Topic 5

Addition Theorem of Probability

126 127

7.

Topic 6

Multiplication Theorem of Probability

128 128

8.

Topic 7

Conditional Probability

129 129

9.

Topic 8

Law of Total Probability

130 131

10.

Topic 9

Independence of Events

132 132

11.

Topic 10

Combinatorial Concept

133 133

12.

Topic 11

Summary

134 134

Learning Objectives
After studying this chapter, you should be able to:

Understand the meaning and important terms of probability

Learn about addition theorem and multiplicative theorem of probability

Understand the concept of independence of events, combinatorial

concepts

like permutation and combination

Solve problems of conditional probability and Bayes Theorem and other


concepts of probability

1 116

Introduction

A probability is a quantitative measure of risk.


This chapter provides exposure to fundamental concepts, since

probability

1 117

is inseparable from statistical methods.

Important Terms in Probability


Probability and sampling are inseparable parts of statistics.

Random Experiment
Random Experiment
Random experiment is an experiment whose outcome
Random experiment is an experiment whose outcome
is not predictable in advance.
is not predictable in advance.

1 118

Cont.

Sample Space

1 119

Event

Event Space

Union of events

Intersection of events

Mutually exclusive events

Collectively exhaustive events

Complement of event

Kinds
Probability

of

Classical
Classical
Probability
Probability

Relative
Relative
Frequency
Frequency
Probability
Probability

Axiomatic
Axiomatic
Probability
Probability

Subjective
Subjective
Probability
Probability

1 120

Simple Propositions of Probability


Proposition 1
P (EC) = 1 P (E)
Probability of compliment: Let even EC denote complement of the event
E. Obviously by definition of complement, EC has all elements from the
sample space S that are not in E. Thus, E and EC are mutually exclusive and
collectively exhaustive. Therefore, by axiom 2 and 3 we have,
1 = P(S) = P (E EC) = P (E) + P (EC)
or, P (EC) = 1 - P (E)

1 121

Cont.

Proposition 2
If E F, then P (E) P (F)
If the event E is contained in event F, that is, then we can express,
F = E (EC F).
However, as events E and (EC F) are mutually exclusive, we get,
P (F) = P (E) + P (EC F)
But, by axiom 1, P (EC F) 0. Therefore, we have proved the
proposition,
P (E) P (F)
1 122

Cont.

Proposition 3
P (E F) = P (E) + P (F) P (E F)
Probability of unions: Event E F can be written as the union of the
two disjoint events namely E and (EC F). Thus, from axiom 3,
P (E F) = P [E (EC F)] = P (E) + P (EC F) (1)
Also, F = (E F) (EC F), hence,
P (F) = P (E F) + P (EC F) (2)
From (1) and (2) we get the proposition 3 as,
P (E F) = P (E) + P (F) - P (E F)
Extended statement of this proposition for n events is also called as
inclusion-

exclusion principle.

P(E F G) = P(E) + P(F) + P(G) P(EF) P(FG) P(EG) +


P(EFG)
1 123

Cont.

Proposition 4
Mutually exclusive events: When the sets corresponding to two
events are

disjoint (have no common elements, or the intersection is null),

the two events

are called mutually exclusive.

E F = Therefore,
P (E F) = P () = 0
Also, for mutually exclusive events E and F,
P (E F) = P (E) + P (F)

1 124

Cont.

Proposition 5
P (ECF) = P (F) P (EF)
From set theory, F can be written as a union of two disjoint events E F
and EC F . Hence, by Axiom III, we have, P(F) = P(E F) + P(EC F). By rearranging the terms we get the result.

1 125

Addition Theorem of Probability

The addition theorem in the probability concept is the process of

determination of the probability that either event A or event B occurs or


both occur. The notation between two events A and B the addition is
denoted as and pronounced as Union.

Let A and B be two events defined in a sample space. The


Let A and B be two events defined in a sample space. The
union of events A and B is the collection of all outcomes
union of events A and B is the collection of all outcomes
that belong either to A or to B or to both A and B and is
that belong either to A or to B or to both A and B and is
denoted by A or B.
denoted by A or B.
1 126

Cont.

The result of this addition theorem generally written using Set notation, P
(A B) = P (A) + P (B) P (A B),
Where, P (A) = probability of occurrence of event A
P (B) = probability of occurrence of event B
P (A B) = probability of occurrence of event A or event B.
P (A B) = probability of occurrence of event A or event B.Addition
theorem probability can be defined and proved as follows: Let A and B
are Subsets of a finite non empty set S then according to the addition rule
P (A B) = P (A) + P (B) P (A). P(B),
On dividing both sides by P(S), we get
P (A B) / P(S) = P (A) / P(S) + P (B) / P(S) P (A B) / P(S) (1).
1 127

Multiplication Theorem of Probability

Probability is the branch of mathematics which deals with the

occurrence of
probability for two

samples. The basic form of Multiplication theorems on


events X and Y can be stated as,

P (x. y) = p (x). P(x / y)

Here p (x) and p (y) are the probabilities of occurrences of events x and

respectively.
P (x / y) is the Conditional Probability of x and the condition is that y

has occurred before x.


P (x / y) is always calculated after y has occurred. Here, occurrence of
x

depends on y. y has changed some events already. So, occurrence of

x also changes.
1 128

Conditional Probability

Conditional

probability
that

is

the

that an event will occur given


another

occurred. If A

event

has

already

and B are two events,

then the

conditional probability of A

given B is

written as P(A/B) and read as

the

probability of A given that B has

already occurred.

1 129

probability

Law
of
Probability

Total

Consider two events, E and F. Whatsoever be the events,

we can

always say that the probability of E is equal to the

probability of

intersection of E and F, plus, the probability of

the intersection

of E and complement of F. That is,

P (E) = P (E F) + P (E F C)

1 130

Bayess Formula
Let, E and F are events.
E = (E F) U (E F C)
For any element in E, must be either in both E and F or be in E but not in
F. (E F) and (E FC) are mutually exclusive, since former must be in F and
latter must not in F, we have by Axiom 3,
P (E) = (E F) + (E FC) = P(E/F) P(F) +P(E/FC) P(FC) = P(E/F) P(F) +
()[1()]

1 131

Independence of Events
if
h other
c
a
e
f
o
endent
p
e
d
n
i
e
a id t o b
s
e
r
a
s
nt
Two eve
ns hold:
o
i
t
i
d
n
o
three c
g
n
i
w
o
l
l
if the fo
y
l
n
o
d
an
useful
t
s
o
m
he
This is t
(
)
F
(
P
P(E)
=
)
F
E
(
P
result.)
P(E)
P(E|F) =
P(F)
P(F|E) =

1 132

Combinatorial
Concept

1
Product
Rule of
Counting

1 133

Sum Rule
of
Counting

Permutation

Combination

Summary

In this chapter, we discussed basic idea of probability. We defined

probability

in different ways and pointed out serious limitations of each

definition.

Then we discussed axioms of probability, which are the backbone of

theory

of probability. Then we studied number of useful propositions of

probability.

We also defined conditional probability, law of total probability, and

Bayes

Theorem. We also defined mutually exclusive events, and

independence of events.

1 134

Lastly, we discussed few important concepts of combinatorial analysis,

which

comes very handy while calculating probability of an event.

Probability Distribution
Distribution
Probability
S. No.

Reference
No.

1.

1 135

Particulars

Slide
From To

Learning Objectives

136 136

2.

Topic 1

Introduction

137 137

3.

Topic 2

Random Variable

138 139

4.

Topic 3

Probability Distributions of Standard


Random Variables

140 140

5.

Topic 4

Bernoulli Distribution

141 142

6.

Topic 5

Binomial Distribution

143 145

7.

Topic 6

Poisson Distribution

146 147

8.

Topic 7

Normal Distribution

148 149

9.

Topic 8

Summary

150 153

Learning Objectives
After studying this chapter, you should be able to:

Differentiate between discrete and continuous random variables

Discuss probability distributions of standard random variable

Understand discrete probability distribution which include Binomial and


Poisson Distribution

Explain continuous probability distribution which includes Normal

distribution
1 136

Introduction

We will study a few common distributions in this chapter.

Normal distribution has extensive use in statistical tools and therefore


readers are advised to study it in detail.

1 137

Knowledge of sequences, series and calculus is expected.

Random Variable
Arandom variable, usually writtenX, is a variable whose possible values
are numerical outcomes of a random phenomenon.

1 138

Cont.

Discrete and Continuous Random Variables


Probability Mass Function (P.M.F.)
Probability Density Function
Cumulative Distribution Function

Expectation Value of Random Variables


Expected Value of a Function of a Random Variable

Variance and Standard Deviation of Random


Variable
1 139

Probability
Variables

Distributions

of

Standard

3
Bernoulli
Distribution

Normal
Distribution

Binomial
Distribution

Poisson
Distribution

4
1 140

Random

Bernoulli
Distribution

1 141

Cont.

Application
Distribution

of

Bernoulli

Bernoulli trial is fundamental to many discrete distributions


like Binomial, Poisson, Geometric, etc. Situations where
Bernoulli distribution is commonly used are:

Sex of newborn child; Male = 0, Female = 1 say.


Items produced by a machine are Defective or Non-

defective.

During next flight an engine will fail or remain

serviceable.
1 142

Student appearing for examination will pass or fail.

Binomial
Distribution
Abinomial

random

variableis

the

number of successes xinnrepeated trials


of a binomial experiment. Theprobability
distributionof a binomial random variable
is called abinomial distribution(also
known as aBernoulli distribution).

1 143

Cont.

Applications of Binomial Distribution

Trials are finite (and not very large), performed repeatedly for n times.

Each trial (random experiment) should be a Bernoulli trial, the one that
results in either success or failure.

Probability of success in any trial is p and is constant for each trial.

All trials are independent.

1 144

Cont.

Following are some of the real life examples of applications of


binomial distribution.

Number of defective items in a lot of n items produced by a machine.

Number of male births out of n births in a hospital.

Number of correct answers in a multiple-choice test.

Number of seeds germinated in a row of n planted seeds.

Number of re-captured fish in a sample of n fishes.

Number of missiles hitting the targets out of n fired.

1 145

Poisson
Distribution
A random variable X, taking one of the values 0, 1, 2 is
said to be a Poisson random variable with parameter , if for
some > 0,

P(X = i) is a probability mass function (p.m.f.) of the Poisson


random variable. Its expected value and variance are,
= E [X] =
Var [X] =

1 146

Cont.

Some of the common examples where Poisson random


variable can be used to define the probability distribution
are:

Number of accidents per day on expressway.

Number of earthquakes occurring over fixed time span.

Number of misprints on a page.

Number of arrivals of calls on telephone exchange per minute.

Number of interrupts per second on a server.

1 147

Normal Distribution

Equation For Normal Probability Curve

Standard Normal Distribution

Properties Of Normal Distribution

Areas Under Standard Normal Probability Curve

Importance Of Normal Distribution


1 148

Cont.

Area under the Normal Curve

1 149

Summary

with

Random variable is a real valued function defined over a sample space


probability associated with it. The value of the random variable is

outcome of

an experiment. Random variables are neither random nor

variable.

In this chapter we discussed several important random variables, the

associated formulae, and problem solving using formulae. A discrete random


variable is the one that takes at the most countable values. A continuous
random variable can take any real value.
1 150

Cont.

We also discussed probability distributions of random variables. Binomial


distribution is used if an experiment is carried out for finite number of n
independent trials; all trials being Bernoulli trials with constant

probability of

success p.

Random variable will follow Poisson distribution if it is the number of

occurrences of a rare event during a finite period. Waiting time for a rare
event

is exponentially distributed. Negative binomial distribution is used if

numbers

of Bernoulli trials are made to achieve desired number of

successes.
1 151

Cont.

One of the continuous random variable required often is uniform random


variable. Waiting time for an event that occurs periodically follows

uniform distribution.

Normal probability distribution is the most important distribution in

statistics. We defined normal distribution with parameters (, ) where is


mean and is standard deviation.

Further, we defined standard normal distribution, which is a special case

of

normal distribution with parameters (0, 1).

1 152

Cont.

We also discussed transformation of normal random variable X to

standard

random variable Z using xzms= Z distribution is very

convenient for manual

calculation as we can use standard normal tables

which are extensively plotted, to find probability and interval.

Normal distribution is used as a model in many real world situations,

both as

a continuous distribution or an approximation to discrete

distributions like binomial or Poisson.

1 153

Use of
of Excel
Excel Software
Software for
for Statistical
Statistical
Use
Analysis
Analysis
S. No. Reference
Particulars
Slide
No.
1.

1 154

From To
Learning Objectives

155 155

2.

Topic 1

Introduction

156 157

3.

Topic 2

Introduction to Excel

158 168

4.

Topic 3

Entering Data in Excel

169 169

5.

Topic 4

Descriptive Statistics

170 172

6.

Topic 5

Basic Built-in Functions (Average, Mean,


Mode, Count, Max and Min)

173 177

7.

Topic 6

Statistical Analysis

178 182

8.

Topic 7

Normal Distribution

183 183

9.

Topic 8

Brief about SPSS

184 189

10.

Topic 9

Summary

190 194

Learning Objectives
After studying this chapter, you should be able to:

Understand the basic concepts of using Microsoft Excel

Discuss how to enter data in excel and basic built-in functions

Gain knowledge about SPSS

1 155

Introduction
The most popular software in the MS Office Suite includes the following:

Microsoft Word

Microsoft Excel

Microsoft PowerPoint

Microsoft Access

Microsoft Project Plan

Microsoft Outlook

1 156

Cont.

MICROSOFT OFFICE SUITE

1 157

Suite
Product

Home and
Student

Home and
Business

Profession
al

Word
2010

Included

Included

Included

Excel
2010

Included

Included

Included

PowerPoint
2010

Included

Included

Included

OneNote
2010

Included

Included

Included

Included

Included

Outlook
2010

Access
2010

Included

Publisher
2010

Included

Introduction to Excel
Opening A Document

Click on File-Open (Ctrl+O) to open/retrieve an existing workbook;

change the directory area or drive to look for files in other locations.

1 158

To create a new workbook, click on File-New-Blank Document.

Cont.

Saving And Closing A Document

To save your document with its current filename, location and file

format

When you have finished working on a document you should close it.

Go to

1 159

either click on File - Save.

the File menu and click on Close.

Cont.

Excel Screen
Menu Bar in Excel

1 160

Cont.

Excel Screen

1 161

Cont.

Workbooks
Worksheets

and

Cell
Cell
Row
Row
Column
Column
Spreadshe
Spreadshe
et
et
Workbook
Workbook
1 162

Cont.

Cell Name Box

Spreadsheet Tabs in Excel

1 163

Cont.

Moving
Around
Worksheet

the

Margins

Orientation

Paper Size

Print Area

1 164

Cont.

Margin Options in Excel

1 165

Cont.

Orientation Options in Excel

1 166

Cont.

Print Area Selection

1 167

Cont.

Moving between Cells

While working with any Office productivity tool, the clipboard functions

are invaluable.

The most common clipboard functions are Cut, Copy and Paste.

In the Microsoft Office suite, there are keyboard shortcuts for these
functions.

KEYBOARD SHORTCUTS

1 168

Cut

Ctrl + X

Copy

Ctrl + C

Paste

Ctrl + V

Entering Data in Excel

A new worksheet is a grid

of

rows and columns. The

rows

are

labeled

and the columns

are labeled

with letters. Each

column is

1 169

of a row and a
acell.

Entering
Values

Rounding
Numbers that
Meet Specified
Criteria

Sorting by
Columns

with

numbers,

intersection

Entering
Labels

Descriptive Statistics

Excel includes elaborate and customisable toolbars, for example the

standard toolbar shown here:

Some of the icons are useful mathematical computation:

Autosum

is the

icon, which enters the formula =sum () to add up a range of

cells.

functions

1 170

is the Function Wizard icon, which gives you access to all the
available.

Cont.


available,

1 171

is the Graph Wizard icon, giving access to all graph types


as shown in this display:

Cont.

Excel can be used to generate measures of location and variability for a


variable. Suppose we wish to find descriptive statistics for a sample data: 2,
4, 6, and 8.

Step1: Select the Tools *pull-down menu, if you see data analysis, click

on this

option, otherwise, click on add-in.. option to installanalysis tool pak.

Step 2: Click on the data analysis option.

Step 3: Choose Descriptive Statistics from Analysis Tools list.

Step 4: When the dialog box appears:


Enter A1:A4 in the input range box, A1is a value in column A and row 1;

in
until

this case this value is 2. Using the same technique enters other VALUES
you reach the last one.
Step 5: Select an output range, in this case B1. Click on summary

statistics to see the results.


1 172

SelectOK.

Basic Built in Functions (Average, Mean, Mode,


Count, Max and Min)
Manual Equation Entry

1 173

Cont.

Arithmeti

Function, Syntax and Description

c
Functions
in Excel

1 174

Cont.

SUM Function
The SUM function is probably the most commonly used function in Excel. It
comes in three flavours in Excel, namely:

SUMIFS()
SUM()
S
U
M
I
F
(
)
1 175

3
Cont.

Logical Functions
AND ()

FALSE

IF ()

TRUE

IFERROR ()

OR ()

NOT

1 176

Cont.

Statistical Functions

Statistical functions are invaluable in any mathematical calculations.


They can provide insights into trends provide data for detailed

analysis as

as help identify gaps that need to be plugged.

Excel provides a wide range of functions that can be used to perform

basic

1 177

well

statistical analyses.

Statistical Analysis
Creating Charts

Select the data range (only numbers) for which the chart needs to be

created.

Under the Insert Ribbon, in the Chart section, click on the type of chart

you want to create and the category. Here the clustered chart has been
used.

Select the chart and click on Select Data button in Data section of the

Design
1
178

Layout.
Cont.

In the Select Data Source dialog, select Series 1 and click on Edit

Select Data Source

1 179

Cont.

This opens the Edit Series dialog that allows you to change the range of
values in series and provide a Series name. For the series name,

click

on icon to select the column title of Series 1.


Edit Series

1 180

Cont.

Histogram
Now follow the steps given below to draw histogram.

Select the first two columns i.e. class interval and frequency in the Excel
sheet.

Click on Chart Wizard icon on tool bar or select from menu [Insert

Chart..] From insert drop down menu. A dialogue box with title Chart
Wizard Step 1 to 4 Chart type will appear.

In the menu Standard Type, select Column. Click on Next button.

Now the next menu with title Chart Wizard Step 2 to 4 Chart Source

Data

will appear. Since we have already selected the source data, select

Next. Dont forget to check that column is selected in data series.

Now the next menu with title Chart Wizard Step 3 to 4 Chart Options

will appear.
1 181

Cont.

Correlation Plot and Regression Analysis


Using MS Excel for calculating Karl Pearsons correlation coefficient
Calculating Karl Pearsons correlation coefficient using MS Excel is very
simple. The steps are as follows:

Open an Excel worksheet and enter the data values of X and Y variables

as

two arrays (columns or rows). Keep these contiguous if possible.

Select the cell where you want to store the result r. Enter the formula

with

syntax as,
=CORREL (array1, array2)
array1 is a cell range of values and array2 is a second cell range of

values.

1 182

Normal Distribution
NORMDIST returns the normal distribution for the specified mean and
standard deviation. This function has a very wide range of applications in
statistics, including hypothesis testing.
Syntax: NORMDIST(x,mean,standard_dev,cumulative)

X is the value for which you want the distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

1 183

Brief about SPSS


SPSS Statistics is a software package used for statistical analysis.

SPSS Files

data

SPSS uses several types of files. First, there is the file that contains
view and variable view. These have been entered using SPSS

Data Editor Window. It is known as an SPSS system file.

1 184

Cont.

SPSS Data Editor Window Data View

1 185

Cont.

Data Editor Window Variable View

1 186

Cont.

Define Variable Dialog


Box
Student Motivation
Student Motivation
Not willing
Not willing
Undecided
Undecided
Willing
Willing

1 187

Cont.

Value Labels Dialog Box

Value Labels Coded with Value and


Value Label

1 188

Cont.

SPSS Data Editor Window with all


Record Entered

1 189

Summary

Microsoft office is one of the most powerful office productivity tools in

the market today. The entire suite is vast and covers a wide range of
software

solutions catering to various aspects of modern businesses.

Microsoft excel is a powerful accounting and calculation solution. It has a


standard tabular layout and it supports a wide range of arithmetic,

accounting

and statistical functions.

The Microsoft Outlook is the mail client that can be set up to download

mails

from a mail server as well as send and receive emails as desired.

Being a part of the Microsoft Office suite, this tool is compatible with other
applications in
1 190

the suite.
Cont.

One of the most popular and widely used Microsoft Office Suites is the

MS

Office 2003. Later Microsoft released two other versions of Office,

namely

Office 2007 and Office 2010. Although Office 2010 is the latest

version, many

businesses still continue to use Office 2003. From Office

2003 to Office 2007,

Microsoft radicalised the overall look and feel of the

office suite.

Excel is built on the concept of cell, rows, columns, spreadsheets and

workbooks. The entire structure is hierarchical, and this allows it to be


scalable and versatile enough to adapt to varying needs for users from
different specialisations. Understanding the following concepts is pretty
useful
1 191

in developing complex reports and models.

Cont.

As long as you work on the soft copies, page layouts are not really

important

you can scroll a spreadsheet to view the contents. However,

when it comes to printouts it is important that one gets the page layouts
sorted out. Excel 2010

has all the page layout options under Page Layout

menu item.

While working with any Office productivity tool, the clipboard functions

are invaluable. The most common clipboard functions are Cut, Copy and
Paste. In the Microsoft Office suite, there are keyboard shortcuts for these
functions.

Once you become conversant with the Excel functions, you

would prefer to use


use than the mouse.
1 192

the keyboard shortcuts as they are faster and easier to


Cont.

A new worksheet is a grid of rows and columns. The rows are labelled

with

numbers, and the columns are labelled with letters. Each

intersection of a row

and a column is a cell. Each cell has an address, which

are the column letter and the row number. The arrow on the worksheet to
the right points to cell A1, which is currently highlighted, indicating that it is
an active cell. A cell must be active to enter information into it.

Excel is a very powerful accounting tool, but before going to the real

complex

functions, let us sees how to use Excel for simple calculations.

There are two


the actual

ways of using Excel for simple calculations: you can enter


arithmetic equations in the cell or use pre-defined Excel

formulas to do the
1 193

same.
Cont.

Statistical calculations for exponential random variables could be

calculated

using statistical functions available in MS Excel. NORMDIST

returns the
deviation. This
including

normal distribution for the specified mean and standard


function has a very wide range of applications in statistics,
hypothesis

testing.

Syntax:

NORMDIST(x,mean,standard_dev,cumulative)

SPSS Statistics is a software package used for statistical analysis. Long

produced by SPSS Inc., it was acquired by IBM in 2009. The current versions
(2014) are officially named IBM SPSS Statistics. Companion products in
the same family are used for survey authoring and deployment (IBM SPSS
Data

Collection), data mining(IBM SPSS Modeler), text analytics, and

collaboration

1 194

and deployment (batch and automated scoring services).

1 195

S-ar putea să vă placă și