Documente Academic
Documente Profesional
Documente Cultură
1
Q.1 What do you mean by sample survey? What are the different sampling methods?
Briefly describe them.
Answer: - In different fields of human activity, in doing the ordinary actions of our daily life, the
decision making process is based on the observations of few units which forms a portion of the
total population. This process of studying only a portion of the population and making decisions
involves risk, the risk of making wrong decisions.
Sample: It is a finite subset of a population drawn from it to estimate the characteristics of the
population. Sampling is a tool which enables us to draw conclusions about the characteristics
of the population.
Sample Survey: It can also be described as the technique used to study about a population
with the help of a sample. Population is the totality all objects about which the study is
proposed. Sample is only a portion of this population, which is selected using certain statistical
principles called sampling designs.
Statistical inference is the process of using data from a sample to make estimates or test
hypotheses about a population. The field of sample survey methods is concerned with effective
ways of obtaining sample data. The three most common types of Sample Survey are mail
surveys, telephone surveys, and personal interview surveys. All of these involve the use of a
questionnaire, for which a large body of knowledge exists concerning the phrasing,
sequencing, and grouping of questions. There are other types of sample surveys that do not
involve a questionnaire. For example, the sampling of accounting records for audits and the
use of a computer to sample a large database are sample surveys that use direct observation
of the sampled units to collect the data.
A goal in the design of sample surveys is to obtain a sample that is representative of the
population so that precise inferences can be made. Sampling error is the difference between a
population parameter and a sample statistic used to estimate it. For example, the difference
between a population mean and a sample mean is sampling error. Sampling error occurs
because a portion, and not the entire population, is surveyed.
a. Probability Sampling
b. Non-Probability Sampling
Probability Sampling.
It is provides a scientific technique of drawing samples from the population according to the law
in which each unit has a predetermined probability of being included in the sample. Different
ways of assigning probability are
Probability sampling methods, where the probability of each unit appearing in the sample is
known, enable statisticians to make probability statements about the size of the sampling error.
Most governmental and professional polling surveys employ probability sampling. It can
2
generally be assumed that any survey that reports a plus or minus margin of error has been
conducted using probability sampling. Statisticians prefer probability sampling methods and
recommend that they be used whenever possible. A variety of probability sampling methods
are available.
Cluster sampling involves partitioning the population into separate groups called clusters.
Unlike in the case of stratified simple random sampling, it is desirable for the clusters to be
composed of heterogeneous units. In single-stage cluster sampling, a simple random sample
of clusters is selected, and data are collected from every unit in the sampled clusters. In two-
stage cluster sampling, a simple random sample of clusters is selected and then a simple
random sample is selected from the units in each sampled cluster. One of the primary
applications of cluster sampling is called area sampling, where the clusters are counties,
townships, city blocks, or other well-defined geographic sections of the population.
Under this technique sample units are drawn in such a way that each and every unit in the
population has an equal and independent chance of being included in the sample. If sample
unit is replaced before drawing next unit, then it is known as Simple Random Sampling with
replacement [SRSWR]. If the sample unit is not replaced before drawing next unit, then it is
called Simple Random Sampling without replacement [SRSWOR]. In first case probability of
drawing a unit is 1/N, where N is the population size. In the second case probability of drawing
a unit is 1/Nn.
Simple random sampling provides the basis for many probability sampling methods. With
simple random sampling, every possible sample of size n has the same probability of being
selected. Stratified simple random sampling is a variation of simple random sampling in which
the population is partitioned into relatively homogeneous groups called strata and a simple
random sample is selected from each stratum. The results from the strata are then aggregated
to make inferences about the population. A side benefit of this method is that inferences about
the subpopulation represented by each stratum can also be made.
Selection of Simple Random Sampling can be done by a) Lottery Method b) the use of table of
random numbers.
a) In lottery Method we identify each and every unit with distinct numbers by allotting an
identical card. The cards are put in a drum and thoroughly shuffled before each unit is drawn.
b) There are several Random Numbers Tables. They are Tippet’s Random Number Table,
Fisher’s and Yate’s Tables, Kendall and Babington Smiths random tables, Rand Corporation
random number etc Specimen of Random Numbers by Tippetts is given below.
3
Suppose we want to select 10 units from a population size of 100. we number the population
units from 00 to 99. Then we start taking 2 digits. Suppose we start with 41 (second row) then
the other numbers selected will be 67, 95, 24, 15, 45, 13, 96, 72, 03.
This sampling design is most appropriate if the population is heterogeneous with respect
to characteristic under study or the population distribution is highly skewed.
The criterions used for stratification are geographical, sociological, age, sex, income etc.
The population of size N is divided into ‘K’ strata relatively homogenous of size N1,
N2………….Nk such that N1 + N2 +……… + Nk = N. Then we draw a simple random
sample from each stratum either proportional to size of stratum OR equal units from
each stratum.
Merits:-
Demerits:-
Example:-
Suppose 200, 300 and 500 items are produced by Factories located at three cities X, Y
and Z. We wish to draw a sample of 20 items under proportional stratified sampling. We
number the unit from 0 to 999. Then refer to Random Table and select the numbers as
4
For Factory X, it is 20 x (200/1000) = 4
854, 772, 733, 741, 822, 853, 570, 802, 629, 525
Systematic Sampling
Suppose the population size is “N”. The population units are serially numbered 1 to N in
some systematic order and we wish to draw a sample of “n” units, then we divide units
from 1 to N into “K” groups such that each group has n units. This implies nK = N or K =
N/n. From the first group we select a unit at random. Suppose the unit selected is 6th
unit, thereafter we select every 6 + Kth units. If K = 20, n = 5 and N = 100 then units
selected are 6, 26, 46, 66, 86.
Merits:-
Demerits:-
Cluster Sampling
The total population is divided into recognizable sub-divisions, known as clusters such that
within each cluster units are more heterogeneous and between clusters they are homogenous.
The units are selected from each cluster by suitable sampling techniques.
Multi-stage Sampling
The total population is divided into several stages. The sampling process is carried out through
several stages. For example we want to select 1000 colleges from southern states. In the first
5
stages we may select any three state. In the second stage we may select some districts in that
state. In the 3rd stage, we may select the colleges in each district. We may adopt any sampling
technique at each stage.
Merits:-
Demerits:-
Non-Probability Sampling
Depending upon the object of enquiry and other considerations a predetermined number of
sample units is selected purposely so that they represent the true characteristics of the
population.
A serious drawback of this sampling design is that it is highly subjective in nature. The
selection of sample units depends entirely upon the personal convenience, biases, prejudices
and beliefs of the investigator. This method will be more successful if the investigator is
thoroughly skilled and experienced.
Non probability sampling methods, which are based on convenience or judgment rather than
on probability, are frequently used for cost and time advantages. However, one should be
extremely careful in making inferences from a nonprobability sample; whether or not the
sample is representative is dependent on the judgment of the individuals designing and
conducting the survey and not on sound statistical principles. In addition, there is no objective
basis for establishing bounds on the sampling error when a nonprobability sample has been
used.
Judgment Sampling
The choice of sample items depends exclusively on the judgment of the investigator. The
investigator’s experience and knowledge about the population will help to select the sample
units. It is most suitable method if the population size is less.
Merits:-
Demerits:-
6
Convenience Sampling
It is also called “chunk” which refers to the fraction of the population being investigated which is
selected neither by probability nor by judgment. There is high chance of bias being introduced.
It is used to make pilot studies. The sample units are selected according to convenience of the
investigator.
Quota Sampling
It is a type of judgment sampling. Under this design Quotas are set up according to some
specified characteristic such as age group, income groups etc. From each group a specified
number of units are sampled according to the Quota allotted to the group. Within the group the
selection of sample units depends on personal judgment. It has a risk of personal prejudice
and bias entering the process. This method is often used in public opinion studies.
Advantages:-
Q.2 What is the different between correlation and regression? What do you understand
by Rank Correlation? When we use rank correlation and when we use Pearsonian
Correlation Coefficient? Fit a linear regression line in the following data –
X 12 15 18 20 27 34 28 48
Y 123 150 158 170 180 184 176 130
Answer: - Both correlation and regression are used to measure the strength of relationships
between variables.
Difference
Correlation: When two or more variables move in sympathy with other, then they are said to
be correlated. If both variables move in the same direction then they are said to be positively
correlated. If the variables move in opposite direction then they are said to be negatively
correlated. If they move haphazardly then there is no correlation between them.
Regression: Regression is defined as, “the measure of the average relationship between two
or more variables in terms of the original units of the data.”
1. Correlation: Types
a. Simple correlation – Here the relationship between two variables are studied.
b. Partial correlation – Here the relationship of any two variables are studied,
keeping all others constant.
7
c. Multiple correlation – Here the relationship between variables are studied
simultaneously.
d. Positive - Both the variables (X and Y) will vary in the same direction. If variable X
increases, variable Y also will increase; if variable X decreases, variable Y also
will decrease.
e. Negative - The given variables will vary in opposite direction. If one variable
increases, other variable will decrease.
f. Linear and Non-linear - It depends upon the constancy of the ratio of change
between the variables. In linear correlation the percentage change in one variable
will be equal to the percentage change in another variable. It is not so in non
linear correlation.
2. Regression: - Types
a. Simple regression
b. Multiple regression
3. Association of Attributes
Correlation measures the relationship (positive or negative, perfect) between the two variables.
Regression analysis considers relationship between variables and estimates the value of
another variable, having the value of one variable. Association of Attributes attempts to
ascertain the extent of association between two variables.
Both correlation and simple linear regression can be used to examine the presence of a linear
relationship between two variables providing certain assumptions about the data are satisfied.
The results of the analysis, however, need to be interpreted with care, particularly when looking
for a causal relationship or when using the regression equation for prediction. Multiple and
logistic regression will be the subject of future reviews.
Regression analysis
1. Used to estimate the values of the dependent variables from the values of the
independent variables.
2. Used to get a measure of the error involved while using the regression line as a basis
for estimation.
3. Regression coefficient is used to calculate correlation coefficient. The square of
correlation that prevails between the given two variables.
Correlation quantifies the degree to which two variables are related. Correlation does not find
a best-fit line while regression is fit. You simply are computing a correlation coefficient (r) that
tells you how much one variable tends to change when the other one does.
With correlation you don't have to think about cause and effect. You simply quantify how well
two variables relate to each other.
With regression, you have to consider about cause and effect as the regression line is
determined as the best way to predict Y from X.
With correlation, it doesn't matter which of the two variables you call "X" and which you call
"Y". You'll get the same correlation coefficient if you swap the two.
8
With linear regression, the decision of which variable you call "X" and which you call "Y"
matters a lot, as you'll get a different best-fit line if you swap the two. The line that best predicts
Y from X is not the same as the line that predicts X from Y.
Correlation is almost always used when you measure both variables. It rarely is appropriate
when one variable is something you experimentally manipulate. With linear regression, the X
variable is often something you experimental manipulate (time, concentration...) and the Y
variable is something you measure.
The correlation answers the STRENGTH of linear association between paired variables, say
X and Y.
The regression tells us the from of linear association that best predicts Y from the values of
X.
Correlation is calculated whenever:
• Both X and Y is measured in each subject & quantifies how much they are linearly
associated.
• In particular the Pearson's product moment correlation coefficient is used when the
assumptions of both X and Y is sampled from normally distributed populations are
satisfied.
• The Spearman's moment order correlation coefficient is used if the assumption of
normality is not satisfied.
• Correlation is not used when the variables are manipulated, for example, in
experiments.
Linear regression is used whenever:
• At least one of the independent variables (Xi's) is to predict the dependent variable Y.
Note: Some of the Xi's are dummy variables, i.e. Xi = 0 or 1, which are used to code
some nominal variables.
• If one manipulates the X variable, e.g. in an experiment.
Linear regression are not symmetric in terms of X and Y. That is interchanging X and Y will
give a different regression model (i.e. X in terms of Y) against the original Y in terms of X.
On the other hand, if you interchange variables X and Y in the calculation of correlation
coefficient you will get the same value of this correlation coefficient.
The same underlying distribution is assumed for all variables in linear regression. Thus, linear
regression will underestimate the correlation of the independent and dependent when they (X's
and Y) come from different underlying distributions.
Charles spearman rank is denoted by the Greek letter ρ (rho) or as rs, is a nonparametric
correlation coefficient and assumes that
Where D is the difference between ranks assigned to the variables. Value of Σ lies
between – 1 and +1 and its interpretation is same as that of Karl Pearson’s correlation
coefficient.
Spearman rank correlation is used when you have two measurement variables and one
"hidden" nominal variable. The nominal variable groups the measurements into pairs; if you've
measured height and weight of a bunch of people, "individual name" is a nominal variable. You
want to see whether the two measurement variables; whether, as one variable increases, the
other variable tends to increase or decrease. It is the non-parametric alternative to correlation,
and it is used when the data do not meet the assumptions about normality, homoscedasticity
and linearity. Spearman rank correlation is also used when one or both of the variables
consists of ranks.
You will rarely have enough data in your own data set to test the normality and
homoscedasticity assumptions of regression and correlation; your decision about whether to do
linear regression and correlation or Spearman rank correlation will usually depend on your prior
knowledge of whether the variables are likely to meet the assumptions.
If tied ranks exist, classic Pearson's correlation coefficient between ranks has to be used.
It is defined as
1. i. …………………………….(A)
Where
10
n- Number of paired observations.
Σxy / N is called covariance of x and y. The other forms of this formula are
r = ii. ii.
For all practical purpose we can conveniently use form D. Whenever summary information is
given choose proper form from A to C.
Regression Lines
For a set of paired observations there exist two straight lines. The line drawn such that sum of
vertical deviation is zero and sum of their squares is minimum, is called Regression line of y on
x. It is used to estimate y – values for given x – values. The line drawn such that sum of
horizontal deviation is zero and sum of their squares is minimum, is called Regression line of x
on y. it is used to estimate x – values for given y – values. The smaller angle between these
lines, higher is the correlation between the variables. The regression lines always intersect at
(X, Y)
Y –Y = byx (X –)
X –X = bxy (Y – Y)
Where
And
11
The regression equations found by the above conditions is said to fitted by method of least
squares. byx and bxy are called regression coefficients.
X 12 15 18 20 27 34 28 48
Y 123 150 158 170 180 184 176 130
X predicted by the best-fit line (predicted values) to be as close to the actual values
(observed values) as possible. The differences between the predicted values and the
observed values appear as the vertical distances shown in the figure below gives the best fit
linear regression.
The best fit line associated with the n points (x1, y1), (x2, y2), . . . , (xn, yn) has the form
y = b + mx
Where
n( xy) - ( x)( y)
slope = m =
n( x2) - ( x)2
8(32254)-(202)(1271)
=
8(6066)-(202)2
1290
=
7724
= 0.167012
y - m( x)
intercept = b =
n
1271-0.16701(202)
=
8
= 154.66
Total Numbers: 8
Slope (b):0.16701
Y-Intercept (a): 154.65
Regression Equation: 154.66 + 0.17x
Here, means "the sum of." Thus
xy = sum of products = x1y1 + x2y2 + . . . + xnyn
x = sum of x-values = x1 + x2 + . . . + xn
y = sum of y-values = y1 + y2 + . . . + yn
x2 = sum of squares of x-values = x12 + x22+ . . . + xn2
12
Q.3 What do you mean by business forecasting? What are the different methods of
business forecasting? Describe the effectiveness of time-series analysis as a mode of
business forecasting. Describe the method of moving averages.
Business forecasting refers to the analysis of past and present economic conditions with the
object of drawing inferences about probable future business conditions. The process of making
definite estimates of future course of events is referred to as forecasting and the figure or
statements obtained from the process is known as ‘forecast’ future course of events is rarely
known. In order to be assured of coming course of events, help is taken of an organized
system of forecasting. These are two aspects of scientific business forecasting.
1. Analysis of past economic conditions: The secular trend will show how the series has been
moving in the past and what its future course is likely to be over a long period. The cyclic
fluctuations would reveal whether the business activity is subjected to boom or depression. The
seasonal fluctuations would indicate the seasonal changes in the business activity.
Forecasts empower people because their use implies that we can modify variables now to alter
(or be prepared for) the future. A prediction is an invitation to introduce change into a system.
1. There is no way to state what the future will be with complete certainty. Regardless of the
methods that we use there will always be an element of uncertainty until the forecast horizon
has come to pass.
2. There will always be blind spots in forecasts. We cannot, for example, forecast completely
new technologies for which there are no existing paradigms.
3. Providing forecasts to policy-makers will help them formulate social policy. The new social
policy, in turn, will affect the future, thus changing the accuracy of the forecast.
Forecasting is a part of human conduct. Businessmen have also to look to the future. Success
in business depends on correct predictions. In fact when a man enters business, he
automatically takes with it the responsibility for attempting to forecast the future and to a very
large extent his success or failure would depend upon the ability to forecast successfully the
future course of events. Since without same element of continuity between past, present and
future, there would be little possibility of successful prediction. But history is not likely to repeat
itself and we would hardly expect economic conditions next year or over the next ten years to
follows a clear cut prediction.
13
A businessman cannot afford to base his decision on guesses. Forecasting helps a
businessman in reducing the areas of uncertainty that surround management decision making
with respect to costs, sales, production, profits, capital investment, pricing, expansion of
production, extension of credit, development of markets, increase of inventories and
curtailment of loans. These decisions cannot be made off-hand. They are to be based on
present indications of future conditions.
While forecasting, we should know that it is impossible to forecast the future precisely – these
always time must be same range of error allowed in the forecast. Statistical forecasts are those
in which we can use the mathematical theory of probability to measure the risks of errors in
predictions.
A great amount of confusion seem to have grown up in the use of words ‘forecast’, ‘prediction’
and ‘projection’. A prediction is an estimate based solely in past data of the series under
investigation. It is purely mechanical extrapolation. A projection is a prediction where the
extrapolated values are subjects to a certain numerical assumptions. A forecast is an estimate
which relates the series in which we are interested to external factors. Forecasts are made by
estimating future values of the external factors by means of prediction, projection or forecast
and from these values calculating the estimate of the dependent variable.
i. Based on past and present conditions: The business forecasting is based on past and
present economic condition of the business. To forecast the future, various data, information
and facts concerning to economic condition of business for past and present are analyzed.
ii. Based on mathematical and statistical methods: The process of forecasting includes the use
of statistical and mathematical methods. By using these methods the actual trend which may
take place in future can forecasted.
iii. Period: The forecasting can be made for long term, short term, medium term or any specific
term.
iv. Estimation of future: The business forecasting is to forecast the future regarding probable
economic conditions.
Steps in Forecasting
i. Understanding why changes in the past have occurred: One of the basic principles of
statistical forecasting is that the forecaster should use the data on past performance. The
current rate and changes in the rate constitute the basis of forecasting. Once they are known
various mathematical techniques can develop projections from them. If an attempt is made to
forecast business fluctuations without understanding why past changes have taken place, the
forecast will be purely mechanical based solely upon the application of mathematical formulae
and subject to series error.
14
ii. Determining which phases of business activity must be measured: After it knows why
business fluctuations have occurred, it is necessary to measure certain phase of business
activity in order to predict what changes will probably follow the present level of activity.
iii. Selecting and compiling data to be used as measuring devices: This is an independent
relationship between the selection of statistical data and determination of why business
fluctuations occur. Statistical data cannot be collected and analyzed in an intelligent manner
unless there is a sufficient understanding of business fluctuations.
It is important that reasons for business fluctuations be stated in such a manner that is possible
to secure data that are related to the reasons.
iv. Analyzing the data: Lastly, the data are analyzed in the light of understanding of the reason
why change occurs. For example, if it is reasoned that a certain combination of forces will
result in a given change, the statistical part of the problem is to measure these forces, from the
data available, to draw conclusions on the future course of action. The methods of drawing
conclusions may be called forecasting techniques.
1. Business Barometers
Business indices are the indicators of future conditions, so they are also known as “Business
Barometers” or ‘Economic Barometers’, which can help in forecasting and decision making. It
consists of gross national product, wholesale prices, consumer prices, industrial production,
stock prices, bank deposits etc. These quantities may be concerted into relatives on a certain
base. The relatives so obtained may be weighted and their average be computed. The index
thus arrived at in the business barometer.
The forecasting through time series analysis is possible only when the business data of various
years are available which reflects a definite trend and seasonal variation
3. Extrapolation
ii. Knowledge about the course of events relating to the problem under consideration.
i.) There are no sudden jumps in figures from one period to another,
15
ii.) There is regularity in fluctuations and the rise and fall in uniform.
4. Regression Analysis
It is the means by which we select from among the many possible relationships between
variables in a complex economy those which will be useful for forecasting. Regression
relationship may involve one predicted or dependent and one independent variables simple
regression, or it may involve relationships between the variable to be forecast and several
independent variables under multiple regressions. Statistical techniques to estimate the
regression equations are often fairly complex and time-consuming but there are many
computer programs now available that estimate simple and multiple regressions quickly.
The term econometrics refers to the application of mathematical economic theory and
statistical procedures to economic data in order to verify economic theorems. Models take the
form of a set of simultaneous equations. The value of the constants in such equations are
supplied by a study of statistical time series, and a large number of equation may be necessary
to produce an adequate model.
This method is regarded as the best method of business forecasting as compared to other
methods. Exponential smoothing is a special kind of weighted average and is found extremely
useful in short-term forecasting of inventories and sales.
The selection of an appropriate method depends on many factors – the context of the forecast,
the relevance and availability of historical data, the degree of accuracy desired, the time period
for which forecasts are required, the cost benefit of the forecast to the company, and the time
available for making the analysis.
Merits:
i) It is an easy method of forecasting.
iii) Reliable results of forecasting are obtained as this method is based on mathematical
model.
i. Comparative study of the behavior of the variable over different periods of time can be
done. The variable may be export figures, quantity of industrial production etc:
16
ii. Forecasting can be done using the time series. By studying the variations and other
behavior of the variables over a sufficiently long period of time, it may be possible to
forecast the future behavior of the variables. However, such a forecast has meaning only if
the period of forecast is a normal period. For example, various five-year plans by the
Government of India are formulated by studying the time series and forecasting.
iii. Study of the time series helps in analyzing the post behavior of the variables. This helps in
identifying the various forces that affect its behavior.
This method is used for smoothing the time series. That is, it smoothens the fluctuations of the
data by the method of moving averages.
A) When Period of moving average is odd: To determine the trend by this method, we use the
following method:
iii) Compute moving totals according to the length of the period of moving average.
If the length of the period of moving average is 3 i.e., 3-yearly moving average is to be
calculated, compute moving totals as follows:
a + b + c, b + c + d, c + d + e, d + e + f…..
Placing the moving totals at the centre of the time span from which they are computed.
iv.) Compute moving averages by moving totals in step (3) by the length of the period of
moving average and place them at the centre of the time span from which the moving totals
are computed. These moving averages are also called the trend values.
By plotting these trend values (if desired) one can obtain the trend curve with the help of
which we can determine the trend whether it is increasing or decreasing.
If needed, one can also compute short-term fluctuations by subtracting the trend values
from the actual values.
B). When period of moving averages is even: when period of moving average is even (4years
etc) we compute the moving averages by using the following steps:
ii.) Obtain the length of the period of moving average. Let the length of the moving averages
period be 4-years.
iii.) Compute 4 yearly moving totals and place them at the centre of time span. The four –
yearly moving totals are computed as follows:
a + b + c + d, b + c + d + e, c + d + e + f,
17
iv.) Compute 4 – yearly moving average and place them at the centre of the time span. Note
that this placement is inconvenient, because the moving average so placed would not
coincide with original time period.
v.) Take two – period moving average of moving averages and place them at the middle of the
periods. This process is called centring of moving averages.
ii.) This method is objective in the sense that any body working on a problem with this method
will get the same results.
iii.) This method is used for determining seasonal, cyclic and irregular variations besides the
trend values.
iv.) This method is flexible enough to add more figures to the data because the entire
calculations are not changed.
v.) If the period of moving averages coincides with the period of cyclic fluctuations in the data,
such fluctuations are automatically eliminated.
Limitations:
i.) There is no functional relationship between the values and the time. Thus, this method is not
helpful in forecasting and predicting the values on the basis of time.
ii). There are no trend values for some year in the beginning and some in the end. For
example, for 5 – yearly moving average there will be no trend values for the first two years and
the last three years.
iii.) In case of non – linear trend the values obtained by this method are biased in one or the
other direction.
iv.)The selection of the period of moving average is a difficult task. Therefore great care has to
be taken in selecting the period, particularly, when there is no business cycle during that time.
18
Q.4 What is definition of Statistics? What are the different characteristics of statistics?
What are the different functions of Statistics? What are the limitations of Statistics?
Answer:-Definition for “Statistics”:- Different authors provide different definitions for
statistics
2. Croxton and Cowden, ‘Statistics is the science of collection, presentation, analysis and
19
interpretation of numerical data.’
3. Prof.Horace Secrit Statistics deals with aggregate of facts, affected to marked extent by
multiplicity of causes, numerically expressed, enumerated or estimated according to a
reasonable standard of accuracy, collected in a systematic manner for a predetermined
purpose and placed in relation to each other.
Characteristic of Statistics
1. Statistics Deals with aggregate of facts: Single Figure cannot be analyzed. Thus,
€the fact "Mr Lee is 170cms tall" cannot be statistically analysed. On the other hand, if we
know the heights of 60 students of a class, we can comment upon the average height,
variations etc.
2. Statistics are affected to a marked extent by multiplicity of causes: The statistics of
yield of paddy is the result of factors such as fertility of soil, amount of rainfall, quality of
seed used, quality and quantity of fertilizer used, etc.
3. Statistics are numerically expressed: Only numerical facts can be statistically
analyzed. Therefore, facts as ‘price decreases with increasing production’ cannot be called
statistics.
4. Statistics are enumerated or estimated according to reasonable standards of
accuracy: The facts should be enumerated (collected from the field) or estimated
(computed) with required degree of accuracy. The degree of accuracy differs from purpose
to purpose.
5. Statistics are collected in a systematic manner: The facts should be collected
according to planned and scientific methods. Otherwise, they are likely to be wrong and
misleading.
6. Statistics are collected for a pre-determined purpose; There must be a definite
purpose for collecting facts. Eg. Movement of wholesale price of a commodity.
7. Statistics are placed in relation to each other: The facts must be placed in such a
way that a comparative and analytical study becomes possible. Thus, only related facts
which are arranged in logical order can be called statistics.
Functions of Statistics
Limitations of Statistics
1. Statistics does not deal with qualitative data. It deals only with quantitative data.
2. Statistics does not deal with individual fact: Statistical methods can be applied only to
aggregate to facts.
3. Statistical inferences (conclusions) are not exact: Statistical inferences are true only on
an average. They are probabilistic statements.
4. Statistics can be misused and misinterpreted: Increasing misuse of Statistics has led to
increasing distrust in statistics.
5. Common men cannot handle Statistics properly: Only statisticians can handle statistics
properly.
20
21
Q.5 what are the different stages of planning a statistical survey? Describe the various
methods for collecting data in a statistical survey.
2. In Direct personal observation the investigator collects data by having direct contact with
units of investigation.
3. Indirect oral interview is used when area to be covered is large. The data is collected
from a third party or witness or head of institution.
4. Through local agencies and correspondents.
5. Through Questionnaires. Generally adopted by research workers and other official and
non official agencies.
6. Through schedules filled by investigator through personal contact.
7. Secondary data may be collected either by census or sampling methods.
8. Pilot survey: It is a small trial survey undertaken before main survey. It gives a measure
of efficiency of the Questionnaire
9. Success of Questionnaire method of collection of data depends mainly on proper
drafting of the questionnaire. Following general principle are considered.
• The number of questions should be less.
• Lengthy questions should be avoided.
• Answers to them should be short.
• Questions regarding personal matters should be avoided.
• It should be unambiguous.
• There should not be any scope for misinterpretation.
• They should have been arranged in logical sequence.
22
• A covering letter should accompany.
10. Information can be collected through schedules filled by investigator through personal
contact.
11. In order to get reliable information, the investigator should be well trained, tactful,
unbiased and hard working. It is suitable for extensive area of investigation through
investigator’s personal contact. The problem of non-response is minimized.
12. The information used for the investigation of the current problem and obtained from the
data collected and used by some other agency or person before for his investigation is known
a secondary data.
They are available in published or unpublished form. In published form they are
available in research papers, news papers, magazines, government publication,
international publication, websites etc. They are collected for a different purpose.
Therefore care should be exercised while making use of it. Their accuracy, reliability,
objectives and scope should be examined thoroughly before use.
13. Primary data are collected by census method. In other words information with respect to
each and every individual of the population is observed. Whereas secondary data may be
collected either by census or sampling methods.
14. Pilot survey: It is a small trial survey undertaken before main survey. It gives a measure
of efficiency of the Questionnaire. It reduces the inconveniences and loss of information. It
helps us to introduce necessary changes.
23
Q.6 what are the functions of classification? What are the requisites of a good
classification? What is Table and describe the usefulness of a table in mode of
presentation of data?
Answer:
Functions of Classification
Parts of a Table:
• Table number:
• Title
• Captions
• Stubs
• Body of the table
• Ruling and Spacing
• Head Note
• Source Note
24
Types of Table
• General purpose table or also known as reference table. They are formed without
specific objective, but can be used for any specific purpose. They contain large mass of
data. Example: Census.
• Specific purpose table or text table or summary table deals with specific problems. They
are smaller in size and they highlight relationship between characteristics. Example:
Cost of living indices.
• Primary Table: They contain data in the form in which it were originally collected
• Derived Table: They represent figures like totals, averages, ratios etc. derived from
original data.
c. Construction: 3 types
25