Sunteți pe pagina 1din 28

Dr. Md.

Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail:
mail: akanda@du.ac.bd

Statistics: Statistics is concerned with scientific methods for collecting, organizing,


summarizing, presenting and analyzing sample data as well as drawing valid conclusions
about population characteristics and making reasonable decisions on the basis of such
analysis.
Population: An aggregate of all individuals or items (actual or possible) defined on some
common characteristics is called a population. For example,, first semester MBA students in
Bangladesh University of Professionals constitute a population. Here common
ommon characteristics
are:
(i) students of Bangladesh University of Professionals
Professionals;
(ii) students of first semester in MBA
Sample: A small but representative part with finite number of individuals or items of a
population is called a sample. For example, a group of students, representing the first
semester MBA students (a population), is called a sample.

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Difference between population and sample:


Population
(1) An aggregate of all individuals or items
(actual or possible) defined on some
common characteristics is called a
population.
(2) It may be finite or infinite.
(3) The statistical constants of population
are usually referred to as parameters.
(4) Population size is always greater than
the sample size.
(5) Census survey deals with the
population.
(6) Population is considered as a universal
set.
(7) Capital letters are used to denote
population size usually by .

Sample
(1) A small but representative part with
finite number of individuals or items of
a population is called a sample.
(2) A sample is always finite
(3) The statistical measures obtained from
the sample observations has been
termed as statistics
(4) Sample size is always smaller than the
Population size.
(5) Sample survey deals with the sample.
(6) Sample is a subset of the population.
(7) Small letters are used to denote sample
size usually by .

Identifying Data Sets


In a recent survey, 1500 adults in the United States were asked if they thought there was solid
evidence of global warming. Eight hundred fifty-five of the adults said yes. Identify the
population and the sample. Describe the sample data set.
Solution
The population consists of the responses of all adults in the United States, and the sample
consists of the responses of the 1500 adults in the United States in the survey. The sample is a
subset of the responses of all adults in the United States. The sample data set consists of 855
yess and 645 nos.

Parameter: A constant, which is a function of population values, can characterize the


variable of the underlying population to some extent and is usually unknown, is called a
parameter.
For example: population mean, variance etc.
Statistic: Any function or any numerical expression of sample values which is an estimate of
the parameter and which is a known value is called a statistic.
For example: Sample mean, variance etc.

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Distinguishing Between a Parameter and a Statistic


Decide whether the numerical value describes a population parameter or a sample statistic.
Explain your reasoning.
1) A recent survey of 200 college career centers reported that the average starting salary
for petroleum engineering majors is $83,121.
2) The 2182 students who accepted admission offers to Northwestern University in 2009
have an average SAT score of 1442.
3) In a random check of a sample of retail stores, the Food and Drug Administration
found that 34% of the stores were not storing fish at the proper temperature.
Solution
1) Because the average of $83,121 is based on a subset of the population, it is a sample
statistic.
2) Because the SAT score of 1442 is based on all the students who accepted admission
offers in 2009, it is a population parameter.
3) Because the percent of 34% is based on a subset of the population, it is a sample
statistic.
Difference between Parameter and Statistic:
Parameter
(1) Any function of the population values is
called parameter.
(2) Parameter is an unknown constant.
(3) Parameters are not used to estimate
population characteristics.
(4) Parameters are free from sampling and
other errors.
(5) There is no distribution of parameter.
(6) The population mean , variance 2 etc
are called parameter.

Statistic
(1) Any function of the sample observation
is called statistic.
(2) Statistic does not contain unknown
constant.
(3) Statistic are used to estimate population
characteristics (such as parameters)
(4) Statistics are subject to sampling and
non-sampling error.
(5) Statistic has distribution, which is called
sampling distribution.
(6) The sample mean x , variance s 2 etc are
called statistic.

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail:
mail: akanda@du.ac.bd

Variable: A variable is measurable quantity and quality which can assume any of a
prescribed set of values, called the domain of the variable. Thus the height of a person, the
yield of a crop, the price of a commodity, and the number of children in a family are some
examples of variables.
Constant: The term constant refers to a property whereby the members of a group or
category remain fixed and do not differ one from another.

Types of variable: Variables may be either qualitative or quantitative.

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Qualitative Variable: A qualitative variable is one for which numerical measurement is not
possible, such as hair color (brown, black, white etc.) religion (Muslim, Hindu, Christian,
etc.).
Quantitative Variable: A quantitative variable is one for which the resulting observations
are numeric and thus possesses a natural ordering. Examples of such variable include height,
weight, family size, age, number of accidents, etc.
Difference between quantitative and qualitative variables:
Qualitative variable
(1) A qualitative variable is one for which
numerical measurement is not possible.
(2) Qualitative variable can only be counted.
(3) These types of variable are always
discrete.
(4) The average of qualitative variable can
be measured by median and mode.
(5) Intelligence, beauty are the examples of
qualitative variable.

Quantitative variable
(1) Quantitative variable can be expressed
numerically.
(2) Quantitative variable can be counted and
measured.
(3) These types of variable may be discrete
or continuous.
(4) The average of quantitative variable can
be measured by any measure of central
tendency.
(5) Heights, weights, price of commodity are
the examples of quantitative variable.

Attribute: The distinct categories of qualitative variables are sometimes called attribute. In
other words, the characteristics used to classify an individual into different categories are
called an attribute. A worker when reported to be smoking is attributable to the category
smoker. His smoking behavior is used to classify him as smoker and thus it is an attribute.
Types of Quantitative Variable: Quantitative Variable may be either discrete or continuous.
Discrete Variable: When a variable can assume only isolated values, it is called a discrete
variable. For example, if the number of children in a family is the variable of interest, it is
obvious that it cannot assume fractional values and hence it is a discrete variable.
Continuous Variable: A variable is said to be continuous if it can theoretically assume any
value within a given range or ranges. Such variables, for instance, are height of a person,
price of a commodity and time.
Difference between discrete and continuous variables:
Discrete variable
Continuous variable
(1) When a variable can assume only (1) A variable is said to be continuous if it
isolated values, it is called a discrete
can theoretically assume any value
variable.
within a given range or ranges.
(2) Discrete variable can only assume (2) Continuous variable can assume both
integral values.
integral and fractional values.
(3) Discrete variables are countable.
(3) Continuous variables are measurable.
(4) The number of children in a family is (4) Height of students of a class is an
an example of discrete variable.
example of continuous variable.

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Measurement: Measurement is a process of assigning numbers to some characteristics or


variables or events according to scientific rules. There are four types of measurement scale:
Nominal scale: The measurement scale, in which numbers are assigned to the categories or
variable values for identification only, is called a nominal scale. The arithmetic operation for
nominal scale data is counting only.
Example: In the measurement of sex, we are assigning the numbers 1 and 2 nominally which
identify only the categories of sex (male-1 and female-2) with no further numerical
implication. The variables region, religion, color, race etc. are appropriate to measure by this
scale.
Ordinal Scale: The measurement scale, in which numbers are assigned to the categories or
variable values for identification as well as ranking is called an ordinal scale. The arithmetic
operations for ordinal scale data are counting and ranking.
Example: Consider the variable economic status, which can be categorized as rich (1),
middle class (2) and poor (3). Clearly, the rich category belongs to higher economic status
than both the categories middle class (2) and poor (3). As another example, we can say that
the variable beauty can be measured assigning numbers by this scale.
Interval scale: The measurement scale, in which numbers are assigned to the variable values
in such a way that the level of measurement is broken down on a scale of equal units and zero
value on the scale is not absolutely zero, is called an interval scale. The arithmetic operations
for interval scale data are counting, ranking, addition and subtraction.
Example: The variable temperature can have values 00 c , 100 c , 200 c etc. Here, the value 0 0 c
does not mean the absence of temperature. Thus, the value zero in interval scale is not
absolutely zero. Note that, we cannot say 40 0 c is double the temperature of 20 0 c .
`
Ratio Scale: The measurement scale, in which numbers are assigned to the variable values in
such a way that the level of measurement is broken down on a scale of equal units and the
zero value on the scale is absolutely zero, is called a ratio scale. The arithmetic operations for
ratio scale data are counting, ranking, addition, subtraction, multiplication and division.
Example: Age, height, weight, pulse rate, parity etc. can be measured most appropriately by
this scale.
Comparative study of scales of measurement:
Scale of
measurement
Nominal
Ordinal
Interval
Ratio

Mathematical operations

Examples

Classification & Counting


Classification, Counting & Ranking
Classification, Counting, Ranking,
Addition,
Subtraction & Zero is not absolutely zero
Classification, Counting, Ranking, Addition,
Subtraction, Multiplication, Division & Zero is
absolutely zero

Gender, Religion
Economic Status
Temperature,
IQ
score
Age, Family size

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Criteria for selection of Measurement Scale:

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

The important points in the selection of measurement scale for a variable are:
(i) Scale selected should be appropriate for the variables one wishes to categories.
(ii) Scale should be of practical use.
(iii) Scale should be clearly defined.
(iv) The number of categories created (when necessary) should cover all possible values.
(v) The number of categories created (when necessary) should not overlap that is, it
should be mutually exhaustive.
(vi) The scale should be sufficiently powerful.
Data: Data are the raw, disorganized facts and figures collected from any field of inquiry.
Types of data: Statistical data depending upon the sources are of two types
(a) Primary data
(b) Secondary data
Primary data: The data which are originally collected by an investigator or an agent for the
first time for the purpose of statistical enquiry are known as primary data.
Example: An investigator wants to study the salaries of teachers working in the campuses.
Then the data collected for this purpose by the investigator himself or with the help of his
representative, are primary data.
Secondary data: Data which are originally collected but obtained from some published or
unpublished sources are secondary data.
Example: The reports and publications made by Bangladesh Bureau of Statistics are primary
for that organization but secondary for those who use it.
Difference between the primary and secondary data:
The main difference between primary and secondary data is only of degree one. Data which
are primary in the hands of one becomes secondary in the hands of other. That is primary data
once collected and published becomes secondary data for other investigators. For example:
the data relating the population of Bangladesh published by Bangladesh Bureau of Statistics
are primary for that organization but secondary for those who use it.
There are the following differences between primary and secondary data:
Basis
(1) Definition

(2) Originality

(3) Expenses

(4) Suitability

Primary data
The data which are obtained by
direct observations from the
population or sample is called
primary data.
It is original. Primary data are
collected from the original
sources.
It involves large expanses in
terms of time, energy and
money.
If the data has been collected in

Secondary data
The data which are already obtained by
some other persons or organizations
and are already published or utilized
are called secondary data
It is not original. Secondary data is
collected from some organizations,
journals, newspapers etc.
It is relatively a less costly method.

It may or may not suit the objective of

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

(5) Reliable
(6) Dependency
(7) Precautions

(8) Qualified
interviewers

a systematic manner, its


suitability will be positive.
Primary data is more reliable
than secondary data.
Primary data is completely
independent.
No extra precautions need be
taken in making use of primary
data.
Any reliable primary data can
be obtained only by the
qualified interviewers.

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

the survey.
Secondary data is less reliable than
primary data.
Secondary data depend on the primary
data.
It should be used with care.

In case of secondary data there is no


need of the qualified interviewers.

Classification: Classification is the process of arranging data into sequences and groups
according to their common characteristics of separating them into different and related parts.
Types of Classification:
Broadly, the data can be classified on the following four basis:
(i)
Geographical, i.e., area-wise, e.g., cities, districts, etc.
(ii)
Chronological, i.e., on the basis of time.
(iii) Qualitative, i.e., according to some attributes.
(iv)
Quantitative, i.e., in terms of magnitudes.
(i) Geographical classification: In geographical classification data are classified on the
basis of geographical or location differences between the various items. For example,
when we present the production of sugarcane, wheat, rice, etc., for various districts,
this would be called geographical classification.
(ii) Chronological classification: When data are observed over a period of time the type
of classification is known as chronological classification. Time series are usually listed
in chorological order normally starting with the earliest period. When the major
emphasis falls on the most recent events, a reverse time order may be used.
(iii) Qualitative classification: In qualitative classification, data are classified on the basis
of some attribute or quality such as sex, color of hair, literacy, religion, etc. The point
to note in this type of classification is that the attribute under study cannot be
measured: one can only find out whether it is present or absent in the units of the
population under study.
(iv) Quantitative classification: Quantitative classification refers to the classification of
data according to some characteristics that can be measured, such as height, weight,
income, sales, etc.
Tabulation: Tabulation is a scientific process of involving the presentation of classified data
in an orderly manner so as to bring out there essential features and chief characteristics. The
purpose of the tabulation is to simplify the presentation of data and to facilitate comparison
between related information.
Different parts of a table: The different parts of a table depend upon the nature of the data
and the purpose of investigation. Generally, the main parts of a table are mentioned below:

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

1. Title of the table: Every table should have a suitable title. Title should be brief, clear,
simple and self-explanatory.
2. Table number: Every table must have a number for proper identification and for easy and
ready reference for future.
3. Caption: The title of a column is known as caption.
4. Stubs: Stubs mean the row headings.
5. Body of the Table: This is the important part of the table. The body of table is formed by
the arrangement of the data according to the description given in the captions and stubs.
6. Head note: Head note is a statement written below the title centered and enclosed in
brackets. It helps to clarify the points relating the content of the table that have not been
included in the title nor in caption and stub.
7. Footnote: Anything in table which cannot be understood by the reader from the title,
captions and stubs should be explained in footnotes. Footnotes are written directly below the
body of the table whenever necessary.
8. Source: The source from which the data have been taken should be mentioned.
Frequency distribution: A set of values together with the frequencies of occurrence of values
in each class in a given set of data, presented in a tabular form, is referred to as a frequency
distribution.
Principle of frequency distribution:
In statistics most important form of tabulation is known as frequency distribution. In frequency
distribution the following principles should be taken into account:
Raw data are grouped into classes or are groups of appropriate size.
Numbers of observation belonging to each class is recorded.
Number of observations in a particular class is called class frequency or frequency.
Construction of a frequency distribution: The first step in the construction of a frequency
distribution is to decide on the size of the groups or the class intervals. Generally we led to
use about 5 to 25 classes. The exact number of classes to be used will depend on the nature
and characteristics of data, the accuracy desired and the purpose of grouping. In particular, it
will depend on
(i) the range of the data and
(ii) the total number of observations
Suggested below are some useful rules for the construction of a frequency distribution:
(1) Find the range of the variable by subtracting the lowest value from the highest value.
(2) Divide the range by 5 and 25, and round the numbers to the same degree of accuracy
as found in the original data. Call these numbers  and . The class interval should
normally be between  and . By a little trial and error determine a suitable interval
and the starting points of the class intervals.

10

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

(3) Arrange a sheet with the headings: class interval, mid value, tally marks, frequency
and cumulative frequency. Begin at the top with the class interval which contains the
smallest value, and continue until the interval with the highest value is reached.
(4) Read off the items on the original table and put, for each value, a tally mark against
the appropriate class interval. It is convenient to mark each fifth by a diagonal.
(5) Count the number of tally marks opposite each interval, and write the result in the
frequency column.
Some important terms involved in a frequency distribution:
Class: In the process of condensation, raw data are assigned to some chosen groups of
appropriate size. These groups are called classes.
Frequency: The number of observations or values falling into each group or class is called
class frequency or simply frequency. For example, if in a set of data, a value 10 occurs 6
times, then 6 is the frequency of 10.
Relative frequency: The relative frequency of a class is the portion or percentage of the data
that falls in that class. To find the relative frequency of a class, divide the frequency  by the
sample size .

   =

   
=

 


Cumulative frequency: The cumulative frequency of a class is the sum of the frequencies of
that class and all previous classes. The cumulative frequency of the last class is equal to the
sample size .
Class Interval: Ordinarily, for numerical data, the frequencies of a particular class are
bounded by two values. The width or length of the class formed by these two boundary
values is known as the class interval.
Class width: The size of the class is referred to as the class width and is the distance between
lower (or upper) limits of consecutive classes. For a class with 45 as lower limit and 50 as
upper limit, the interval 45-50 has a class-width 5 and a mid-point

1
(45 + 50) = 47.50 .
2

Class limits: The smallest value of a class is technically known as the lower limit of the
interval, while the largest value is known as the upper class limit of the interval. Thus for a
class interval 15-19, 15 is the lower limit and 19 is the upper limit.
Midpoint: The midpoint of a class is the sum of the lower and upper limits of the class
divided by two. The midpoint is sometimes called the class mark.
  

 +   




 =
2
Number of class interval: It is the number into which the total range of the data is divided
the number should be so decided that it gives the good description of the data presented into
the frequency table. The Sturges rule for deciding the number (k) is given by the following
formula:

11

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

K = 1 + 3.322 log10 N
Where  = total number of observations.
By use of this formula, the size of class interval C can be written as
Range
C=
1 + 3.322 log 10 N
Methods of forming class-intervals:
Here we classify the data according to class interval. There are two ways of forming class
interval:
1. Exclusive method
2. Inclusive method
1. Exclusive method: The formation of class interval by this method is that the upper limit
of one class is the lower limit of the next class so as to make continuous without any gap.
This type of method is mostly useful in case of continuous variable. In exclusive method
class interval is obtained by taking the difference between the lower and upper limits. In
this case, if a value is exactly equal to the upper limit it will be included in the next class.
For example, if a value is 20 it will be included in the class 20-30.
Rainfall (in mm): 10-20 20-30 30-40
40-50 50-60
2. Inclusive method: The formation of the class interval by this method is that both lower
and the upper limits are included in a particular class. This method is mostly used in case
of discrete variable. In inclusive method class interval is obtained by taking the
difference between the two upper limits.
Number of students: 10-19 20-29 30-39
40-49 50-59
Some notes:
As far as possible one should avoid odd values of class intervals e.g. 3,11,26,39 etc.
Preferably, one should have class intervals of either five or multiples of five like
5,10,20,25,100 etc.
The starting point, i.e. lower limit of the first class, should either be zero or 5 or
multiple of 5.
For inclusive method class boundary is necessary which is obtained by
s
s
xiL xiH = ( xil ) ( xih + )
2
2
where, xil-xih be the ith class interval, xiL-xiH be the ith class boundary and s= smallest
unit of scale of measurement.
nearest (s=1)
nearest 10th (s=.1)
nearest 100th (s=.01)
Class
Class
Class interval Class boundary Class interval Class boundary
interval
boundary
25-29
24.5-29.5
25.0-29.9
24.95-29.95
25.0-29.99
24.995-29.995
30-34
29.5-34.5
30.0-34.9
29.95-34.95
30.0-34.99
29.995-34.995
35-39
34.5-39.5
35.0-39.9
34.95-39.95
35.0-39.99
34.995-39.995
40-44
39.5-44.5
40.0-44.9
39.95-44.95
40.0-44.99
39.995-44.995
45-49
44.5-49.5
45.0-49.9
44.95-49.95
45.0-49.99
44.995-49.995
Example: The following are the marks obtained by the candidates for the selection to a post of a
reputed Pharmaceutical company.

12

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

20
18
25
22
16
29
35
23
58
37
65
37
35
42
49
48
63
53
49
55
65
45
39
58
48
57
67
Construct a frequency distribution table by taking a suitable class interval.

42
65
69

Solution:
Let us determine the suitable class-interval with the help of the Sturges rule:
R
C=
1 + 3.322 log 10 n
  =  = 69 16 = 53,  = 30
53
C=
= 8.97 9
1 + 3.22 1.4771
Since values like 3, 7, 9 etc. should be avoided we will take 10 as the class interval and the
first class be 15-25.
Frequency distribution table of the profits of 30 companies for the year 1989-1990
(Exclusive Method)
Class interval
Tally
Relative
Cumulative
Frequency ()
(Profits (Tk. Lakhs))
marks (No. of Companies) frequency ( . . ) frequency (. .)
15-25
5
5/30
5
25-35
||
2
2/30
7
35-45
7
7/30
14
45-55
55-65
65-75
Total

6
5
5
 = 30

6/30
5/30
5/30

20
25
30

Example: The daily room rents in taka of 25 hotels in Dhaka City in June 1995 were 115,
160, 170, 80, 60, 90, 90, 80, 70, 70, 80, 80, 100, 90, 100, 90, 110, 120, 110, 100, 110, 130,
120, 140, and 105. Represent the data in a suitable frequency table and find:
(a) the highest rent
(b) the lowest rent
(c) the five highest ranking rents
(d) how many houses charged Tk. 90 or more as daily rent
(e) what percentage of houses charged above Tk. 100 but less than Tk. 110 per day
Solution: Here we form a frequency table by using frequency array in which each item is
written against the class in which it lies:
Rent
(in Tk.)
60-70
70-80
80-90
90-100
100-110
110-120
120-130

Mid
value
65
75
85
85
105
115
125

Tally marks
| ( 60)
|| (70, 70)
|||| (80, 80, 80, 80)
|||| ( 90,90,90,90)
|||| (100,100,100,105)
|||| (115,110,110,110)
|| (120,120)

Frequency
(!)
1
2
4
4
4
4
2

Relative
frequency (r.f.)
1/25
2/25
4/25
4/25
4/25
4/25
2/25

Cumulative
frequency (")
1
3
7
11
15
19
21

13

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

130-140
140-150
150-160
160-170
170-180

235
145
155
165
175

| (130)
| (140)
| (160)
| (170)

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

1
1
0
1
1

1/25
1/25
0
1/25
1/25

22
23
23
24
25

From the frequency table given above, it is now easy to answer the remaining questions:
(a) The highest rent = Tk. 170
(b) The lowest rent = Tk. 60
(c) The five highest ranking rents are Tk. 170, Tk. 160, Tk. 140, Tk. 130 and Tk. 120,
(d) The number of houses charging daily rent above Tk. 90 or more =
4+4+4+2+1+1+1+1=18
(e) The number of houses charging daily rent above Tk. 100 but less than Tk. 110 = 1
The required percentages =

1
100 = 4%
25

Example:
From a frequency distribution by taking a suitable class interval for the following data giving the
ages of 52 employees in a Pharmaceutical company.
67
34
36
48
49
31
61
34
43
45
38
32
28
61
29
47
36
50
46
34
46
32
30
33
45
49
48
41
53
36
37
47
47
30
46
50
28
35
35
38
46
43
34
36
62
69
50
28
44
43
60
39
Solution:
The lowest value is 28 and the largest value is 69.
So,   =  = 69 28 = 41.
41
Therefore, class interval is C =
= 6.119 6
1 + 3.22 1.716
Since class interval should preferably be multiple of 5, we have taken 5 as class interval.

Class
(Ages)
25-29
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
Total

Frequency distribution table for ages of 52 employees


interval Tally marks Frequency () Relative frequency
( . . )
4
4/52
||||
10
10/52
10
10/52
5
5/52
13
13/52
4
4/52
||||
0
0/52
4
4/52
||||
2
2/52
||
 = 52

Cumulative
frequency (#)
4
14
24
29
42
46
46
50
52

Exercise-1: The following is a record of grades gained by 40 examinees at an examination:

14

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

30
90
54
57
64
58
69
78
28
44
83
88
17
70
93
33
59
55
20
46
23
91
51
63
18
53
38
69
27
65
61
73
85
41
87
95
67
75
15
40
Tabulate the results in the form of a frequency distribution grouping by intervals of 10
grades. Also with reference to the records of grades, find
(a) the highest grade
(b) the lowest grade
(c) the range
(d) the grades of five highest ranking students
(e) the grades of five lowest ranking students
(f) how many students received grades of 70 or higher
(g) how many students received grades below 70
(h) what percentage of students received grades higher than 70 but less than 95.
Exercise-2: The population of villages in a district is
suitable class interval, prepare a frequency distribution:
42
34
33
29
27
37
51
39
21
31
42
21
38
42
49
52
38
53
39
71
17
33
61
59
27
19
54
61
59
43

given below in hundreds. Taking a


59
51
14
47
42

53
37
39
57
16

41
42
42
57
37

53
37
44
7
66

Example 3: The following are the ages of 48 patients admitted to the emergency room of a
hospital. Construct a frequency distribution using a suitable class interval.
32
53
16
30

43
23
13
29

25
24
61
42

63
35
53
44

46
12
16
28

23
21
30
16

27
22
55
13

33
54
42
48

61
13
31
28

23
17
14
28

21
23
34
26

57
38
51
37

Example 4: The following are the number of babies born during a year in 60 community
hospitals.
30
40
54
59
34

37
59
48
42
24

32
43
42
53
47

39
45
54
31
24

55
34
53
32
53

52
29
31
35
28

55
58
45
42
57

26
46
32
21
56

56
56
29
24
57

57
54
30
57
59

27
53
22
46
50

52
49
49
54
29

Construct a frequency distribution with a suitable class interval.

15

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Graphical Representation of Data:


In addition to presenting statistical data through tabular form, one can present the same
through some visual aids. This refers to graphs and diagrams. This is one of the most
convincing and appropriate ways in which statistical data may be presented.
Uses:
(i)
(ii)
(iii)
(iv)

Display the main features of a set of data.


Suggests appropriate methods of analysis.
Used to explain the main features so that conclusion can be drawn easily.
Gross error contained in the data can be detected with the help of a graph.

Limitations:
(i) Graphs are sometimes misleading unless drawn and studied carefully.
(ii) Conclusions from graphs are crude.
Difference between diagrams and graphs:
1. Diagrams are constructed on plain paper whereas the graphs are on graph paper.
2. Diagrams are used only for the comparison but graphs help in studying the
mathematical relationship between two variables.
3. In diagrams, the numerical data are presented by bars, rectangles, circles, cubes etc.
whereas in graphs, the data are presented in terms of points and lines.
4. Presentation of frequency distribution in diagrams is not used but the presentation of
frequency distribution and time series in graph is more appropriate.
Some graphs and diagrams:
(i) Bar diagram
(ii) Pie diagram
(iii) Line diagram
(iv) Histogram
(v) Frequency Polygon
(vi) Cumulative frequency polygon/ Ogive
(vii) Scatter diagram

Preliminary concept of construction of graphs and diagrams:


Name of diagram
Histogram
Line diagram
Frequency polygon
Ogive or cumulative frequency curve
Historigram

Setting in X-axis
Class interval of the class
Midpoint of the class
Midpoint of the class
Upper limit of the class
Time

Setting in Y-axis
Frequency
Frequency
Frequency
Cumulative frequency
Value of the variable

(i) Bar diagram:


Bar diagram are used mainly for portraying qualitative data (nominal or ordinal data) and the
bars are arranged vertically.
Guide lines:
(i) Label frequencies along one axis and categories of variable along the other axis.

16

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

(ii)

Construct a rectangle at each category of the variable with a height equal to the
frequency in the category.
(iii) Leave a space between each category to provide distinct, separate categories and to
clearly the presentation.

Example: Consider the health professional data. The number of responses in each category
was totaled to give the following distribution.
Response
Frequently
Occasionally
Rarely
Never

Frequency
49
71
24
6

Figure: Vertical bar diagram for health centre visit data

80

60

40

20

0
Frequen tly

Occasion ally

Ra rely

Never

(ii) Pie diagram:


The pie diagram is intended to compare the distinct components which together constitute a
whole. The whole is represented by a circle of arbitrary radius and the segments of the circle
represent the component parts.
Guide lines:
(i) Convert the frequencies or percentages into angles.
(ii) A circle is drawn on a plain paper. As a circle consists of 3600, the whole quantity to
be represented is equated to 3600.
(iii) This 3600 is then proportionately divided among the various components of the
whole.
Example: Draw pie diagram using the following table:
Table: Forest Types (in sq. km) of Bangladesh, 1991
Forest type
Sq. km.
angle (k0)
Evergreen

7820.45

Most deciduous

1029.14

7820.45
360 = 173.13
16261.99
1029.14
360 = 22.78
16261.99

17

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Mangrove

7412.40

7412.40
360 164.09
16261.99

Total

16261.99

360.00

Figure: Pie diagram

Mangrove,
7412.4

Evergreen
, 7820.45

Most
Deciduous
, 1029.14

(iii) Line diagram


If we are given the values of a variable at different points of time, the set of values is known
as a time series. The line diagram is used to represent this type of data.
Guide lines:
(i) On a graph paper, two axes are taken with their crossing at the origin.
(ii) Appropriate scales are taken for both the sides and co-ordinate points as obtained
from the given data are plotted.
(iii) Each point of the co-ordinate corresponds to a paired value indicating the variable
value and its frequency.
(iv) The consecutive points are joined by a straight line or smooth curve. The resulting
graph or chart is called line diagram or chart.
Example: Draw line diagram using the following table:
Table: Census population of Bangladesh in Million: 1901-1991
Year Population Year Population
1901 28.9
1951 44.2
1911 31.6
1961 55.2
1921 33.2
1971 76.4
1931 35.6
1981 89.9
1941 42.0
1991 111.5

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

120

Population

100
80
60
40
20
0
1880

1900

1920

1940

1960

1980

2000

Census year

(iv) Frequency Polygon:


Frequency polygon is diagram used to represent a frequency distribution.
Guide lines:
(i) The mid-values of the class intervals of the frequency distribution are placed on the
X- axes and the corresponding frequencies are represented on the Y-axes.
(ii) The co-ordinate points thus obtained are joined by straight lines.
(iii) The left most co-ordinate point is to be joined with the mid-value of immediate left
class.
(iv) The right most co-ordinate point is joined with the mid-value of next right class. Thus
we obtain the polygon which is called a frequency polygon.
Example (Temperature Data): These data represent the record high temperatures in degrees
Fahrenheit (0F) for each of the 50 states. Construct a grouped frequency distribution for the
data using 7 classes.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Source: The World Almanac and Book of Facts.
The completed frequency distribution is:
Class limits
Class boundaries Tally
Frequency
Cumulative
frequency
100-104
99.5-104.5
2
2
105-109
104.5-109.5
8
10
110-114
109.5-114.5
18
28
115-119
114.5-119.5
13
41
120-124
119.5-124.5
7
48
125-129
124.5-129.5
1
49
130-134
129.5-134.5
1
50
 = $  = 50

19

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

(v)
Histogram:
Histogram is a graphical method of representing a frequency distribution in which a
frequency distribution can be shown in the form of a diagram. It shows the pattern of the
distribution, whether for example, it is symmetrical or not. The histogram is particularly
important when the variable is continuous. A discrete variable can also be treated as a
continuous one while constructing a histogram.
Guide lines:
(i) Horizontal axes are divided into segments corresponding to the class boundaries of
the frequency distribution.
(ii) On each segment a rectangle with area proportional to the frequency in the class is
created.
(iii) The set of adjacent rectangles so constructed, constitute a histogram.
Note:
Histogram is the area, not the height that represents frequency of a class. Thus if a histogram
of frequency distribution with unequal class-widths is to be constructed, necessary
modification must be made to adjust the vertical height of the rectangle, so that the area of the
rectangle represents the frequency.
Table 1: Data for constructing histogram with equal class intervals
Class interval Class frequency Class width Height of the rectangles
04.5-9.5
8
5
8
9.5-14.5
29
5
29
14.5-19.5
27
5
27
19.5-24.5
12
5
12
24.5-29.5
4
5
4
Total
80
Table 2: Data for constructing histogram with unequal class intervals
Class interval Class frequency Class width Height of the rectangles
48.5-58.5
4
10
4/10=.4
58.5-68.5
8
10
8/10=.8
68.5-73.5
5
5
5/5=1.0

Col.4 10
4
8
10

20

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

73.5-78.5
78.5-98.5
Total

5
28
50

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

5
20
-

5/5=1.0
28/20=1.4
-

10
14
-

Difference between histogram and bar diagram:


Histogram
Bar diagram
1) Histogram is used to present a frequency 1) Bar diagram is used to present a
distribution.
qualitative or time series data and not to
represent a frequency distribution.
2) The area of a rectangle is proportional to 2) It is the height of the bar that counts.
the relevant frequency.
3) Rectangles of a histogram are adjacent.
3) Bars are separated by spaces.

Example: Constructing histogram for the temperature data

(vi) Cumulative Frequency Curve/ (Ogive):


In order to present a frequency distribution in the form of a cumulative frequency curve,
cumulative frequencies are plotted against the upper mathematical limits of the class
intervals. Sometimes relative cumulative frequency instead of cumulative or cumulative
proportion is used in this diagram.
Guide lines:
(i) Cumulative frequencies are represented on the Y-axes and the upper limits of the
class intervals on the X-axes, by taking appropriate scales in both the cases.
(ii) The points are plotted by taking upper limits of the class intervals as X- coordinate
cumulative frequencies as Y-coordinate.
(iii) Then the consecutive points are joined by straight lines. The resulting graph is called
cumulative frequency curve.

21

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

(iv) If the coordinate points are joined by a free hand smooth curve, the resulting graph is
called a cumulative frequency curve or ogive.
Example: Construct ogive curve for the temperature data:

Difference between histogram and frequency polygon:


Histogram
Frequency polygon
(1) For
continuous
variables, (1) For frequency, polygon frequency is to be
histogram is to be preferred.
preferred.
(2) Histogram can be used for unequal (2) Frequency polygon is admissible only for
class intervals.
equal class intervals.
(3) To draw histogram we consider (3) To draw a frequency polygon we consider the
class boundaries in horizontal
mid-point of the class intervals in horizontal
axes.
axes.
(vii) Scatter diagram:
Sometimes the data consists of pairs values of two related variables, x and y, and the
statistical problem is to investigate the inter-relationship between the variables. Thus x may
represent the height and y, the weight of a university student; x may be the price of a
commodity and y, the corresponding demand. When the given pairs of values are plotted on
ordinary graph paper, we get a dot diagram or scatter diagram. It is called a dot diagram
because it gives a series of dots, each of which has x and y as its co-ordinates. A set of n pairs
of observations thus provide n dots on the diagram and the scatter or clustering of the
points exhibits the relationship between the variables. Hence the alternative name scatter
diagram. This diagram is frequently useful in deciding whether the relationship between two
variables can be represented by, say, a straight line or a parabola.
Example: Construct a scatter plot for the data obtained in a study on the number of absences
and the final grades of seven randomly selected students from a statistics class.
The data are shown here.
Student Number of absences x Final grade y (%)
A
6
82
B
2
86
C
15
43

22

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

D
E
F
G

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

9
12
5
8

74
58
90
78

Solution:
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph, as shown in Figure

Stem and leaf plot:


Stem and leaf plots provide information regarding:
Range of the data set
Shows the location of the highest concentration of measurements
Reveals the presence or absence of symmetry in the data
Guide lines:
5. Spelt each scores or values in to two sets of digits. The first set (leading set) of digits
is the stem and the second set (trailing set) of digits is the leaf.
6. List all possible stem digits from lowest to highest.
7. For each score or value in the mass of data, write down the leaf numbers on the line
labeled by the appropriate stem number.
8. The plot can be made a list neater by ordering the data within a raw from lowest to
highest value.
Example:
The following data represent the marks obtained by 20 students of the Pharmacy department
in the Biostatistics course BPH-112.
84 17 78 45 47 53 76 54 75 22
66 65 55 54 51 33 39 19 54 72
Use a stem and Leaf plot to display the data.
Solution:
Lowest score is 17 and the highest score is 84. Therefore the stem and leaf plot is

23

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Stem Leaf
1
79
2
2
3
839
4
57
5
345414
6
65
7
652
8
4
Key: 1|7 represents 17
Therefore the final figure is
Stem
1
2
3
4
5
6
7
8

Leaf
79
2
389
57
134445
56
256
4

Example: Display the values 6,8,12,14,14,15,16,18,19,19,23,23,24,26,26 by stem and leaf


diagram.
Solution:
Stem
Leaf
5
13
10 2 4 4
15 0 0 1 3 4 4
20 3 3 4
25 1 1
Key: 5|1 means 6, 15|3 means 18
Comment about symmetry and skewness:

24

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

25

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

Sample questions:
1. Define Classification and Tabulation. What is meant by frequency distribution?
Describe how will you construct frequency distribution from raw data?
2. What do you mean by Graphical representation of data? State the uses and limitations
of graphs and diagrams.
3. The following table shows the number of hours 45 hospital patients slept following
the administration of a certain anesthetic.
7
10
12
4
8
7
3
8
5
12
11
3
8
1
1
13
10
4
4
5
5
8
7
7
3
2
3
8
13
1
7
17
3
4
5
5
3
1
17
10
4
7
7
11
8
a) From these data construct:
(i) A frequency distribution
(ii) A relative frequency distribution
(iii)A histogram
(iv) A frequency polygon
(v) A cumulative frequency curve/ogive
b) Construct a stem and leaf display from these data. Describe these data relative to
symmetry and skewness.
4. The following are the numbers of babies born during in 60 community hospitals.
30 55 27 45 56 48 45 49 57 47 56
37 55 52 34 54 52 32 59 46 24 57

26

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

32 26 40 28 53 54 29 42 54
39 56 59 58 49 53 30 21 34
52 57 43 46 54 31 22 24 24
(a) From these data construct:
(i) A frequency distribution
(ii) A histogram
(iii)A relative frequency distribution
(iv) A frequency polygon
(v) A cumulative frequency curve/ogive
(b) Construct a stem and leaf display from these data.
symmetry and skewness.

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

53 59
28 50
57 29

Describe these data relative to

5. The following are the ages of 30 patients seen in the emergency room of a hospital on
a Friday night. Construct a stem and leaf display from these data. Describe these data
relative to symmetry and skewness.
35
36
12
45
45
38
21
35
43
45
36
22
10
44
56
39
37
54
64
55
45
32
34
55
45
60
53
22
56
57
6. In a study of physical endurance levels of male college freshmen the following
composite endurance scores based on several exercise routines were collected. From
these data construct:
(i) A frequency distribution
(ii) A relative frequency distribution
(iii) A histogram
(iv) A frequency polygon
(v) A cumulative frequency polygon/ogive
7. Ellis et al. (A-3) conducted a study to explore the platelet imipramine binding
characteristics in manic patients and to compare the results with equivalent data for
healthy controls and depressed patients. As part of the study the investigators obtained
maximal receptor binding (Bmax) values on their subjects. The following are the values
for the 57 subjects in the study who had a diagnosis of unipolar depression.
1074
392
286
179
a)

372 473 797 385 769 797 485 334 670 510 299 333 303 768
475 319 301 556 300 339 488 1114 761 571 306 80
607 1017
511 147 476 416 528 419 328 1220 438 238 867 1657 790 479
530 446 328 348 773 697 520 341 604 420 394
From these data construct:
(i) A frequency distribution
(ii) A histogram
(iii) A relative frequency distribution
(iv) A frequency polygon
(v) A cumulative frequency distribution
(vi) A cumulative relative frequency distribution
(vii) A cumulative frequency curve/ogive
b) What percentage of the measurements are less than 500?
c) What percentages of the measurements are between 500 and 999 inclusive?
d) What percentage of the measurements are greater than 749?

27

Dr. Md. Abdus Salam Akanda


Associate Professor of Statistics, DU

Web: http://statdu.ac.bd/akanda/
E-mail: akanda@du.ac.bd

e) Describe these data relative to symmetry and skewness.


f) How many of the measurements are less than 1000?

28

S-ar putea să vă placă și