Sunteți pe pagina 1din 176

Indira Gandhi National Open University REC-001

School of Social Sciences RESEARCH METHODOLOGY

REC-001: RESEARCH METHODOLOGY

BLOCK 1
Foundations of Research Methodology in Economics 5

BLOCK 2
Quantitative Methods: Data Collection 23

BLOCK 3
Quantitative Methods: Data Analysis 54

BLOCK 4
Qualitative Methods 75

BLOCK 5
Database of Indian Economy 95

BLOCK 6
Use of SPSS and EVIEWS Packages for Analysis
and Presentation of Data 130
2

EXPERT COMMITTEE
Prof. Alakh N. Sharma Prof. Gopinath Pradhan
Director, Institute of Human Development Professor of Economics
New Delhi – 110 002. IGNOU, New Delhi

Prof. B. Kamaiah Prof. Anjila Gupta


Professor of Economics Professor of Economics
University of Hyderabad IGNOU, New Delhi
Hyderabad

Prof. D.N. Rao Prof. Madhu Bala


Rtd. Professor of Economics Professor of Economics
CESP, School of Social Sciences IGNOU, New Delhi
JNU, New Delhi – 110 067

Prof. Ila Patnaik Dr. K. Barik


Professor of Economics Reader in Economics
National Institute of Public Finance & Policy IGNOU, New Delhi
New Delhi

Prof. Pami Dua Dr. B.S. Prakash


Professor of Economics Reader in Economics
Delhi School of Economics IGNOU, New Delhi
(University of Delhi), Delhi

Prof. Romar Correa Sh. Saugato Sen


Professor of Economics Lecturer (Selection Grade)
University of Mumbai in Economics
Mumbai IGNOU, New Delhi

Prof. Tapas Sen Prof. Narayan Prasad (Convenor)


Professor Professor of Economics
National Institute of Public Finance & Policy IGNOU, New Delhi
New Delhi-110067

Programme Coordinator: Prof. Narayan Prasad


Course Coordinator: Prof. Narayan Prasad
Course Preparation Team
Block Resource Person Block Resource Person IGNOU Faculty
(Content, Format and Language editing)

1. Prof. D. Narsimha Reddy 2&5 Sh. S.S. Suryanarayanan Prof. Narayan Prasad
Retd. Professor of Economics Ex. Joint Advisor Professor of Economics
University of Hyderabad Planning Commission IGNOU New Delhi
Hyderabad New Delhi

3. Prof. Narayan Prasad 4. Prof. Narayan Prasad Secretarial Assistance


IGNOU, New Delhi IGNOU, New Delhi Mrs. Seema Bhatia
School of Social Sciences
Prof. D.M. Diwakar
Giri Institute of
Development Studies
Lucknow

6. Matter related to SPSS adapted from Unit 14 of course MFN-009


(Research Methods in Bio-statistics, part of M.Sc (DFSM), SOCE, IGNOU, New Delhi
Dr. Alok Mishra, Manager, Evalue service.com.pvt.ltd.
____________________________________________________________________________________________________________

August 2009.
Indira Gandhi National Open University

All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without permission in
writing from the Indira Gandhi National Open University.

Further information about the Indira Gandhi National Open University courses may be obtained from the University’s Office at
Maidan Garhi, New Delhi-110 068.
3

INTRODUCTION TO REC-001: RESEARCH METHODOLOGY

In order to pursue research degree programme, you need to be equipped with the various
constituents of Research Methodology and the different techniques applied in data
collection/analysis. The present course aims to cater this need. The theoretical
perspectives that guide research, tools and techniques of data collection and methods of
data analysis together constitute the research methodology. This course deals with all
these aspects. The course comprises of 6 Blocks.

Block 1 on foundation of Research Methodology in economics covers the entire breadth


of main trends of development in the philosophy of science and main debates in the
methodology of economics. Introducing three approaches to research methodology –
scientific, historical and institutional, - this block is devoted to scientific methodology.
The first part deals with the philosophical foundations covering positivism, Karl Popper’s
Critical Rationalism, followed by the three important models of scientific explanation -
Hypothetico Deductive Model; (MD), Deductive Nomological Model (DN) and
Inductive-Probabilistic Model (I-P). The third part is devoted to the main debates in the
main stream economic methodology from the classical to the contemporary period. Each
section in the block, besides giving outline of the subject matter also provides a detailed
reading guide with some references on critical reading material included in boxes.

Block 2 Studies of the behavior of variables and of relationships amongst them


constitute the essentials of empirical research in economics. This necessitates
measurement of the variables involved. Hence, Block 2 addresses the question how to
assemble data on a scientific basis? Covering three methods of data collection (i.e. the
census and survey method, the observation method, and the experimental method) and
tools used in data collection, the block deals with the methods of selecting random and
non-random samples from a population to make judgments about the population. Eight
methods of random sampling and the details about (i) operational procedures for drawing
samples, (ii) expressions for estimators of parameters and measures of their variation, and
estimators of such variation where the population variation parameter is not known, have
been discussed. The question of choosing the appropriate sampling method to a given
research context has also been dealt with. Like first block, each section and sub-section
ends with a box guiding you to relevant portions of one or more publications that give
you more details on the topic(s) included.

Block 3 Various statistical and econometrics techniques are applied in analysis of


data. Broadly these techniques are descriptive and inferential types. Since the
descriptive techniques like measures of central tendency, measures of dispersion,
skewness, one way ANOVA, index numbers, time series, simple correlation and simple
regression etc. are covered at the master’s degree level courses like statistics/quantitative
techniques, the same have been skipped from here without undermining their significance
and application in research. This Block essentially deals with the basic steps involved in
an empirical study, estimation of parameters in two variables and in n variable situations,
and their interpretation, testing hypothesis by applying parametric and non-parametric
tests and tackling the problem of auto-correlation, hetero-scedasticity and multi-
collenearity.
4

Block 4 Empirical evidences in conducting research are captured and analysed by


two approaches i.e. quantitative and qualitative. In the situations like conducting an in
depth scientific enquiry of complex events, their dimensions and variables, cardinal or
quantitative approach is of limited use due to the long time taking exercise and the high
cost involved. Hence, as an alternative, qualitative approach to research methods has
been provided in Block 4. This block broadly deals with the philosophical foundations
and research perspectives guiding the qualitative research, the principles governing the
participatory approach, the process and stages involved in participatory and case study
methods.

Block 5 For undertaking any meaningful research in terms of situational


assessment, testing of models, development of theory, evolving economic policy,
assessing the impact of such policy, availability of data is crucial and determines the
scope of analysis. Hence, the present block deals with the different database of Indian
economy. The block deals with the data available on major macro variables like national
income, saving, investment, etc. relating to Indian economy. The block also throws light
on agricultural and industrial data base and data on trade, finance, social sectors (like
employment, unemployment, education health, quality of life). Particular emphasis has
been laid on different concepts used in data collection, data source and agencies involved
in compilation of data.

Block 6 The methodological advances in quantitative and qualitative analysis are


also accompanied by a significant revolution in the computing power of the
desktops/laptops PCs. SPSS, EVIEWS, SAS and NUDIST are among the popular
sophisticated econometrics packages which are used in data analysis and data
presentation in social sciences in general and in economics in particular. Hence, the
SPSS and EVIEWS fundamentals and use of their statistical components have been
covered in Block 6. The block aims to enable the learners to use the SPSS and EVIEWS
softwares in computing the various statistical and econometric results and analyse and
present the time series and cross section data.
5

BLOCK 1 FOUNDATIONS OF RESEARCH METHODOLOGY IN


ECONOMICS

Structure

1.0 Objectives
1.1 Introduction
1.2 An Overview of the Block
1.3 Approaches to Research Methodology: Scientific, Historical and Institutional
1.4 Philosophical Foundations
1.4.1 Positivism
1.4.1.1 Central Tenets of Positivist Philosophy
1.4.1.2 Criticism of Positivism
1.4.2 Post-Positivism
1.4.3 Karl Popper and Critical Rationalism
1.4.4 Thomas Kuhn and Growth of Knowledge
1.4.5 I. Lakotas: The Methodology of Scientific Research Programmes
1.4.6 Paul Feyerabend: Methodological Dadaism and Anarchism
1.5 Models of Scientific Explanation
1.5.1 Hypothetico-Deductive (H-D) Model
1.5.2 Deductive Nomological (D-N) Model
1.5.3 Inductive- Probabilistic (I-P) Model
1.6 Debates on Models of Explanation in Economics
1.6.1 Classical Political Economy and Ricardo’s Method
1.6.2 Hutchison and Logical Empiricism
1.6.3 Milton Friedman and Instrumentalism
1.6.4 Paul Samuelson and Operationalism
1.6.5 Amartya Sen: Heterogeneity of Explanation in Economics
1.7 Research Problem and Research Design
1.7.1 Research Problem
1.7.2 Basic Steps in Framing a Research Proposal
1.8 Further Suggested Readings
1.9 Model Questions

1.0 Objectives

The main objectives of this block are to:


• introduce the basic outline of the philosophical foundations of the main strands of
scientific methodology as it evolved in the philosophy of science,
• apprise the students of evolution of basic structure and method of scientific
explanation,
• guide the students about development and debates relating to the methodological
approaches of economics, and
• explain the limitations and strengths of the methodological foundations of
economics so as to appreciate the process of growth of knowledge and effectively
contribute to the same.
6

1.1 INTRODUCTION

Contemporary developments in research methodology have faced substantial challenges


in social sciences like economics. While over the years there has been growing emphasis
on rigour and methodological precision, there have also been serious reservations on the
scientific basis of research not only in economics but also in other sciences. There is
growing interest in the history of philosophy of science to understand what has been done
and is being done so that we could think in the direction of what ought to be done.
Before 1970s, the literature on methodology of economics was meager and mostly
confined to classics. But since 1970s, interest in economic methodology has grown
dramatically and, as Roger Backhouse observes, it is now possible to view economic
methodology as a clearly identifiable sub-discipline within economics.

1.2 An Overview of the Block

The block is very ambitiously designed to cover the entire breadth of both the main
trends of development in the philosophy of science as well as the main debates in the
methodology of economics. The first part, beginning with the historical background of
positivism covers up to the contemporary trends in philosophy of science. The second
part is devoted to the main debates in the mainstream economic methodology from the
classical to the contemporary period. Since the focus in this block is substantially on the
scientific method, emphasis primarily is on the methodology of the mainstream neo-
classical economics.

Each section in the block, besides an outline of the subject matter, provides a detailed
reading guide with some critical reading material being included in boxes, wherever it is
felt necessary to draw your attention to a specific reading.

1.3 Approaches to Research Methodology: Scientific, Historical and Institutional

Unlike in science, social sciences had a long tradition of choice of methodology


depending upon the schools of thought each one of which had a distinct way of
conceptualization of social relations and processes of development. For instance, the
Institutional school in Economics conceived economy as the system of related activities
by which the people of any community get their living. This system embraces a body of
knowledge and of skills and a stock of physical equipment. It also embraces a complex
network of personal relations influenced by custom, ritual and dogma. The methodology
focused personal and institutional relations and processes which often were descriptive
without involving any testing or verification. Similarly, historical school based its
analysis of social and economic developments on the basis of historical data and often
historical methods too have been descriptive. There have been several developments in
the Institutional approach with the emergence of New Institutional Economics and
similarly in Historical method there has been climacterics. But, the present block,
however, is entirely devoted to “scientific methodology” and hence no elaboration on
alternatives. The alternative methods have been discussed in Block 4 of this course.
7

1.4 PHILOSOPHICAL FOUNDATIONS

1.4.1 Positivism

The basic tenets of positivism had a relatively long history of evolution over the first half
of the 20th century and went through different phases of development. A question is
often asked whether it is ‘Positivism’ or ‘Positivims’? Positivism in its evolved form is
often called as ‘Logical Positivism’ or ‘Logical Empiricism’. Bruce Caldwell (1982) in
the first chapter of his book attempts to construct the development and basic aspects of
Logical Empiricism. But, for the learners like you intending to know the Methodology of
Science, it would be helpful to familiarize with I. Naletov’s first chapter (esp. pp. 23-58)
which distinguishes three phases of development of Positivism and identify the third
phase of 1940s and 1950s as Logical Positivism. Along with this, you should read
Kolakowski’s ‘Rules of Positivism’ which are reproduced in the first chapter of CGA
Bryant’s Positivism in Social Theory and Research (1985).

1.4.1.1 Central Tenets of Positivism


The central tenets of Positivism are summarized as the following Kolakowski’s four rules
of Positivism:

K1 The Rule of Phenomenalism: According to this rule, science is entitled to record


only that which is actually manifested in experience. Science accepts that which
is observable or experienced i.e. phenomenon, not noumina that which is
represented by the phenomenon. Science is based on that which exists not on the
essence of the existence.

K2 The Rule of Nominalism: Science involves recording experience and represents


that which is experienced. It is nominal of the observed and gives no extra
independent knowledge.

K3 The Rule of Value-free Statements: There is no place for value judgments and
normative statements in science. Science is concerned with ‘what is’ and not
‘what ought to be’.

K4 Rule of Unity of Method in Science: There is only one scientific method,


irrespective of their subject matter. This is also known as methodological
monism of Positivism.

These propositions of Positivism emphasize ‘verifiability’ as the basic requirement for


scientific pursuits and verification/‘testability’ as the basic methodological requirement.
These rules ensure pursuit of scientific knowledge that leads to unraveling of the
regularity of occurrence of phenomenon which can be explained in terms of universal
laws. Testability and repeatability are important dimensions of scientific method under
Positivism. We shall return to the positivist structure of scientific explanation later. But
presently, we shall turn to the limitations of Positivism and the resulting criticism.
8

1.4.1.2 Criticism of Positivism

By 1960s, Positivism as a methodological approach to science was subjected to extensive


criticism and its stature as the methodology of science has declined over the years. One
of the earliest critics of Positivism was Karl Popper and much later the contemporary
philosophers of science built alternative approaches to science. Some of the major
criticisms of Positivism are listed below:

First, the positivist rule of beginning all scientific investigation with facts and facts alone
was questioned by Popper. Knowledge does not start with nothing – tabula rasa – or out
of nothing. “Before we can collect data, our interest in data of a certain kind must be
aroused; the problem always comes first”. There are no brute facts but all theory laden.
Before one collects facts or observes things, one should have relevant questions which
arise from existing knowledge.

Second major criticism against Positivism relates to the problem of induction and the
related problem associated with the test of verification. Popper has given the famous
example of the colour of the swan. If one follows Positivism, it means if one observes
repeatedly without any change in a number of places that the colour of the swan is white,
then one would go to universalize that all ‘swans are white’. But Popper draws attention
to the problem that any number of observations of swans in different locations does not
mean that all swans would be white. There could be one still to be observed which may
turn out to be other than white. This is a typical problem of induction. One cannot verify
the truth of a phenomenon by any number of observations. But one non-white swan can
falsify it. A universal theory can be shown to be false but never proven to be true.
Therefore, verification is not a right test for universalisation. Popper insisted on
falsification test to overcome the problem of induction.

Third, unlike Positivist insistence of universal method, the theoretical idiom of different
sciences varies. Unless there is a specific theory relevant to the subject, facts can not be
expressed in recognizable form. Otherwise, there will be mere description of the
instruments and the activities than underlying pursuit of knowledge. The famous
example of Pierre Duhem is that one working with an oscillating iron bar with a mirror
attached will mean only objects and facts, if one has no theory of measuring electrical
resistances.
These criticisms against Positivism were mainly aimed at the empirical extremes
claiming that observations are theory-independent.

1.4.2 Post-Positivism

As we have seen above, by 1960s, the limitations of Positivism were widely criticized.
Much of the criticism took the form exploring alternative approaches to scientific
methodology which emerged as the Post-Positivist philosophy of science. Post-
Positivism consists of several approaches of which Karl Popper’s ‘Critical Realism’,
Thomas Kuhn’s ‘Growth of Knowledge’, Imre Lakota’s ‘Scientific Research
Programmes’ and Paul Feyerabend’s Criticism ‘Against Method’ are major contributions.
We shall discuss each of these Post-Positivist approaches, their basic contents and the
helpful reading material.
9

1.4.3 Karl Popper and ‘Critical Rationalism’

Karl Popper, as we have seen above, was the earliest critics of Positivism and over a
period, his writings have emerged as an alternative to Positivism. His approach to
Philosophy of Science is known as ‘Critical Rationalism’. Since writings evolved over a
period of time, one may have to be careful in choosing the reading material. His
contribution may be broadly grouped into (i) Criticism of Positivism, (ii) Basis of
Knowledge, (iii) Problem of Induction and (iv) Methodology of Falsificationism.

Growth of Knowledge
Since we are familiar with Popper’s criticism of Positivism, we shall turn to the basics of
his ‘Critical Rationalism’. According to him, the basis for growth of knowledge is the
existence of ‘critical spirit in society’. He conceives of three autonomous worlds. The
first world, he terms as the ‘physical reality’. The second is the ‘subjective knowledge’
referred to consciousness. It is the third world, which he calls as ‘objective knowledge’
that is the domain of pursuit of science through ‘theory, problems and arguments.

Fallibilism
All existing knowledge is full of errors and falliable. Objective truth exists and there are
ways to recognize it. The advance of knowledge consists of merely in the modification
of earlier knowledge. Real progress of knowledge involves elimination of errors.

Induction and Falsificationism


Because of the problem of induction discussed earlier, verification as a method is not
suitable for scientific investigation. Falsification is his critical rationalist approach.
Falsification methodology involves some rules or for behaviour of scientists – not merely
logic. These rules are:
(i) propose and consider only testable or falsifiable theories,
(ii) seek only to falsify scientific theories, and
(iii) Accept those that withstand attempts to falsify as worthy of critical discussion.
He goes on to give reasons for these exhortations towards falsificationism. Popper is
against ‘verification’ for the additional reason that it makes scientists to look for facts
which are likely help in verifying their theories and likely to result in commitment to
their theories. Popper argues that commitment is crime.

Criticism of Popper’s Critical Rationalism

Some of the main criticisms against Popper’s ‘falsificationism’ include the following:
One, in the pursuit of scientific knowledge, logical falsifiability turns out to play only a
minute role in the actual process of theory rejection or revision.

Second, falsification no longer functions as a plausible criterion of demarcation of


science and non-science.

Third, since individual scientific theories need not be falsifiable, there is no logical
asymmetry between verifiability and falsifiability of particular scientific theories. They
are neither verifiable nor falsifiable.
10

You may begin with Blaug (1980 pp. 10-17) to understand Popper’s criticism of
Positivism on the counts of verification, and alternative suggestion on the criteria of
demarcation of science from non-science through ‘falsification’ as an approach to
overcome the problems of induction.

O’ Hear’s (1980) book would help you to appreciate the context of his writings on
philosophy of science. Popper’s The Logic of Scientific Discovery would help you, if
you are familiar with an overview of his contributions. Naletov’s (1984) chapter on
Popper is especially useful for understanding his ‘Third World of Knowledge’. The best
for not only critical appraisal but also an in-depth discussion of Popper’s Methodology is
Hausman (1988).

Read: Chapter 1, pp. 10-17, Mark Blaug (1980) The Methodology of Economics,
Cambridge
1.2.6 ThomasUniversity
Kuhn and Press, Cambridge.
Growth of Knowledge Philosophy of Science
Daniel H. Hausman (1988) “An Appraisal of Popperian Methodology” in Neil De
Marchi (ed), The Popperian Legacy in Economics, Cambridge University Press,
Cambridge,
1.4.4 pp. 65-76.
Thomas Kuhn and Growth of Knowledge Philosophy of Science

“Philosophy of Science without history is empty, History of


science without philosophy is blind”
E. Kant
1.4.4 Thomas Kuhn and Growth of Knowledge

Thomas Kuhn turns to history of science to explain methodology of science. He


questions earlier approaches to the methodology of science and offers an alternative to
both Positivism and Popper’s Critical Rationalism. Kuhn’s approach is variously known
as ‘Growth of Knowledge Philosophy of Science’, ‘Contemporary Philosophy of
Science’ or ‘Trend in Science’. All earlier explanations show growth of knowledge as
incremental, additive or cumulative. According to Kuhn science did not develop by
accumulation of individual discoveries and inventions but by revolutionary breaks.
Scientific progress can be understood by positive description and not by ‘normative’
prescription of rules as done by Positivism or Popper.

The major contribution of Kuhn is the explanation of scientific progress in a structure of


revolutions. Scientists operate within a ‘paradigm’. A paradigm refers to ‘knowledge
embedded in shared exemplars’. Paradigm involves the entire constellation of beliefs,
values, techniques etc. shared by members of a given scientific community. Working
within a paradigm involves ‘normal science’. Normal science involves solving scientific
questions. Normal science involves tacit knowledge learned by doing science, not by
acquiring rules for doing it. Under a paradigm, novices acquire the shared practices as a
part of training and as a part of a scientific group.

In the course of practice of ‘normal science’ within a paradigm, there arise anomalies due
to unsolved puzzles. As the anomalies increase, there arises a crisis which puts the
paradigm on trial. With increasing puzzles, there would be paradigm-change which
brings in new ways of analysis, new approaches and new knowledge. The paradigm-
change is like ‘gestalt’ – total vision change and marks a revolutionary break from the
past. Something like shifting from the notion of flat earth to round earth.
11

Kuhn’s second edition (1970) is a lucid piece of writing and you should take it as a basic
reading (see Box). The first few pages give a detailed overview of the contents and
would be of immense help in following the rest of the text. Blaug (1980, pp.29-34)
contains a good summary version and the first part is a good introduction to both Kuhn
and Lakotas. The first two sections of Aidan Foster Carter (1976) is an excellent
summary of Kuhn’s ‘structure of scientific revolutions’.

Read: Thomas Kuhn (1970) The Structure of Scientific Revolutions, Reprint,


International Encyclopedia of Unified Science, Vol. 2, No. 2.

Blaug, M (1980) The Methodology of Economics, Cambridge University Press,


Cambridge, pp. 29-34.

Aiden Foster-Carter (1976) “From Rostow to Gunder Frank: Conflicting Paradigms


in the Analysis of Underdevelopment”, World Development, Vol. 4, No. 3.
1.4.51.4.5

1.4.5 1 Lakatos: Methodology of Scientific Research Programme (MSRP)

Imre Lakotas also belongs to contemporary philosophy of science and like Kuhn also
believes in history of science as a guide to explain the methodology of science. His
approach is called ‘Methodology of Scientific Research Programmes’ (MSRP). He
tries to bridge the contributions of Popper and Kuhn. MSRP is considered as a
compromise between a historical aggressive rule-bound methodology of Popper on the
one hand and relativistic, defensive and vindictive methodology of Kuhn. According to
MSRP, validation in science involves not individual theories but clusters or
interconnected theories which may be called scientific research programmes (SRP).
SRPs are not scientific once and for all. SRP may experience ‘progressive’ or
‘degenerative’ phases. These phases contain ‘theoretical’ and ‘empirical’ components. If
successive theoretical programmes contain excessive empirical content then these are
empirically corroborated. The SRP may experience problem-shift from ‘degenerating’ to
‘progressive’ phase, like in psychology. If theory does not lead to much of empirical
content then there will be ‘degenerating’ problem-shift like in astrology. There has been
extensive use of Lakotas MSRP in theory appraisal both in sciences and social sciences
like Economics.

Lakotas is a difficult writer and yet Lakotas and Musgrave (1978) in parts would be
useful reading. Lakota’s paper in the volume (it is lengthy and difficult but) particularly
pp. 132-138, serves as a good introduction. Blaug’s (1980) summary (pp. 34-41) is
helpful. Caldwell (1982) contains a brief summary on Lakotas. For those interested in
the application of Kuhn and Lakotas to theory appraisal in Economics, Latsis (1976) is an
important source.

Read: Imre Lakotas (1970) “Falsification and the Methodology of Scientific Research
Programmes” in Imre Lakotas and A. Musgrave (ed) Criticism and Growth of
Knowledge, Cambridge University Press, Cambridge, (esp. pp. 132-138), pp. 132-
194.

Blaug, M (1980) The Methodology of Economics, Cambridge University Press,


Cambridge, pp. 35-41.
12

1.4.6 Paul Feyerabend: Methodological Dadaism or Anarchism

Feyerabend is highly critical of all prescriptive methodologies, particularly against


Positivism and Popper. At one point, he began as more Popperian than Popper and
moved to become more Kuhnian than Kuhn. But later turned against both in his
philosophical approach. Apparently he appears to be preaching ‘against method’ but his
observations are highly insightful and are particularly aimed against formalism, pretence
and pose in the name of science. His major contribution is in terms of ‘theory-
dependence thesis’, the ‘thesis of incommensurability’ and the interactivist view’ and
plurality of scientific method. His anarchist epistemology or Dadaism in science,
emphasizes the social purpose as the objective rather than driven by method as the
objective of science.

His two important works are: Against Method (1975) and Science in Free Society
(1978). While the former cautions against pitfalls of a rigid method and pleads for
breaking rules, the letter emphasizes the limitations of all methodologies and highlights
the role of humility, tenacity, interactiveness and plurality. Caldwell (1982) has very
useful summary of Feyerabend’s contribution (pp. 79-85). Naletov (1984) provides a
good summary account of Against Method. But one caution is: do not stop with reading
Blaug (1980, pp. 40-44) on Feyerabend. Blaug gives an impression that Feyerabend is a
non-serious and flippant ‘methodologist’. This is only a caricature. The truth is
Feyerabend needs careful attention.

Read: Paul Feyerabend (1975) Against Method: Outline of an Anarchist Theory of


Knowledge, New Left Books.

Paul Feyerabend (1978) Science in Free Society, New Left Books.

Caldwell, B (1982) Beyond Positivism: Economic Methodology of the Twentieth


Century, Allen and Unwin, London, pp. 79-85.

1.5 MODELS OF SCIENTIFIC EXPLANATION

Explanation assumes central place in pursuit of scientific knowledge. The search for
making testability criterion concrete has been a major problem in the philosophy of
science. In principle, at least until Popper raised serious questions, complete verification
of observational evidence was considered meaningful. But, there was always a problem
of strict verifiability and as a solution in terms of confirmation of some of the
experimental propositions. Further developments in this direction resulted in the
development of rules of correspondence between theoretical terms and observation terms.
Out of this emerged an explanatory system called Hypothetico-Deductive Model (H-D
Model). Scientific theories have three components: an ‘abstract calculus’, a set of rules
that assign empirical content to the ‘abstract calculus’ and a model for explaining the
abstract calculus. The H-D Model explicitly addresses the problems of a theory’s
structure. The H-D Model by reducing the strict Positivist correspondence principle
between science and observable phenomena, allowed substantial role for theories and
theoretical terms. But, theories in these models were continued to be treated as
13

eliminative fiction and considered establishing correlations among phenomena was all
science could and should do. In fact early positivists considered that explanations had no
role in science.

This counter intuitive approach to scientific explanation was eventually replaced by the
contribution of Hempel and Opperheim who developed Deductive-Nomological (D-N)
Model or what is called ‘Covering Law Models’. However, it realized that many
explanations in science because they make use of statistical laws, cannot be adequately
accounted for by D-N Model. Later Hempel developed the ‘inductive-probabilistic’ (I-P)
model. In I-P model the explanations consisted of statistical laws, Covering Law Models
too came for criticism specifically for the explanation ‘symmetry thesis’ or symmetry
between explanation and prediction and also the claims that these explain adequately
almost all legitimate phenomena in natural and social sciences.

1.5.1 Hypothetico-Deductive (H-D) Model

The basic developments leading to the development of H-D Model and its limitations are
very well summarized in Caldwell (1982, pp. 23-32). This also provides an excellent
summary of Carl Hempel’s emphasis on many positive functions of theories. For a more
detailed discussion you may go through Hempel's collected essays (1965).

Read: Caldwell, B (1982) Beyond Positivism …, George Allen & Unwin, London, pp. .

1.5.2 Deductive-Nomological (D-N) Model

D-N Model is perhaps the most tenacious of all models of explanations that has survived
much after the decline of Positivism. There is a brief summary on D-N Model in
Hausman (1984, pp. 6-10) also in Caldwell (1982, pp. 28-32 and pp. 54-63). Blaug
(1980, pp. 2-9) provides a summary critique of Covering Law Models. But, there is
substitute for Hempel and Oppenheim paper in Brody (1970).

Carl G. Hempel and Paul Oppenheim (1948) “Studies in the Logic of Explanation”
(pp. 9-20) in Baruch Brody (ed) Readings in the Philosophy of Science,
Prentice Hall, Engelwood Cliffs, New Jersey, 1970.

1.5.3 Inductive – Probabilistic (I-P) Model

This is an extension of D-N Model by Hempel for application where statistical laws are
involved. And the basic reading would involve sections referred to above in Caldwell
(1982), Blaug (1980) and Hempel’s collected papers (1965).

1.6 DEBATES ON MODELS OF EXPLANATION IN ECONOMICS

As Daniel Hausman observes, ever since its inception in eighteenth century, the science
of economics has been methodologically controversial. There has always been the
14

haunting question whether economics is a science at all? Beginning with the early 1980s,
there has been a resurgence of interest in philosophical and methodological questions
concerning economics. When there are serious doubts expressed about their scientific
credibility, economists appear to be turning to methodological reflection in the hope of
finding some flaw in previous economic study or to find a new methodological directive
that will better guide their work in the future.

We intend to trace the origins of methodological interest in political economy and the
desire to model economics as a science by attempting to adopt the methods of science. In
the process, it is hoped you would be in a position to see the methodological concerns
beginning with classical economics to the present times. It will help you to understand
the consequences of obsession to adopt methods of natural sciences to complex social
science like economics. In the end, you will be in a position to appreciate the limitations
of the present mainstream methodological approach which in spite of decline of
Positivism as a methodology of the natural sciences has overwhelming Positivist
influence.

Let us begin with the methodological position of classical political economy with David
Ricardo’s method. Though Ricardo did not himself write on methodology explicitly, his
writings carried the seal of abstract deductive method that was at length dealt with by his
followers like N.W. Senior, J.S. Mill, J.E. Cairnes and J.N. Keynes. It is followed by the
Neo-Classical School especially Lionel Robbins and the controversy it generated, acting
as a turning point in economic methodology. Then, we thereafter shall turn to
methodological contributions of contemporary prominent mainstream economists that
include Milton Friedman and Paul Samuelson. The last refers to Amartya Sen’s
contribution in explanation of economics.

1.6.1 Classical Political Economy and Ricardo’s Method

As pointed out earlier, Ricardo’s methodological habit is described as ‘abstract deductive


method’. Though Ricardo claimed that laws of economics (political economy in his
times) were as exact as that of laws of gravity, he did not explain the method of
economics. On the contrary, his laws were abstract laws without any appeal to evidence
or verification. It was left to his followers like Senior, Mill etc. to defend him. N.W.
Senior in his Outline of Science of Political Economy, in 1836 differentiated between
pure and strictly positive science on the one hand and an impure and inherently normative
art on the other hand, and considered Ricardian system of explanation as science. Senior
identified a few general propositions of Ricardo’s work to lay claim on scientific status.
It was J.S. Mill in his essay ‘On the Definition of Political Economy and on the Method
of Investigation to It’ (1836) who took upon himself the task of laying bare the nature of
economics and the method adopted by Ricardo. Mill is the first economist to spell out
the ‘economic man’ as conceived in the classical economics. Mill maintained that
economic science was ‘hypothetical’ and a science of ‘tendencies’, the laws of which
were overwhelmed by various disturbances.

Mill’s view was influential on the nature of the methodology of economics throughout
the nineteenth and even early twentieth centuries. J.N. Keynes and J.E. Cairnes carried
forward Mill’s methodological views. All this was a period when economics asserted
scientific status on the basis of abstract deductive explanation without any appeal to
testing or verification.
15

Read: Blaug, M (1980) The Methodology of Economics, Cambridge University Press,


Chapter 3 “The Verificationists”.
4.2 Robbins, Positivism and Appriorism in Economics
Daniel M. Hausman (1981) “John Stuart Mill’s Philosophy of Economics”, Philosophy
Overview
of Science, Vol. 48, pp. 363-385.
For an introductory discussion of Ricardo’s Classical Methodology, chapter 3 of

Blaug (1980) is very useful but you have to ignore the title of the chapter
“Verificationists”, there are no verificationists but only Senior, Mill, J.N. Keynes and
Cairnes are discussed. There is an excellent discussion of J.S. Mill in Hausman (1981).
J.S. Mill’s essay is reproduced in Hausman (1984) and it is an insightful essay to realize
even at present why economics turns out to be an ‘inexact science’. T. Hutchison (1988)
also provides an overview of the early methodological approaches in economics.

Lionel Robbins’ An Essay on the Nature and Significance of Economic Science (1935) is
a path breaking methodological contribution to economics that held sway for a substantial
part of the first half of the twentieth century and even today serves as the one that defines
the nature of the subject matter of economics. The major objective of the essay was to rid
economics from ethics and normative welfare considerations and approach it as a ‘pure
science’. His contention is to show that economics is a pure theory based on prior
experience and hence no testing is needed. He, therefore, conceives economics as
‘apriori science’ and his approach has been described as ‘apriorism’. At the same time,
he claimed the status of positive science to economics because of his insistence on getting
rid of all normative considerations from economic analysis. His claims of positive
science of pure theory without any need for testing was subjected to extensive criticism
and Hutchison was foremost among the critics. In fact, the latter’s criticism severely
undervalued Robbins’ scientific claims to economics.
Read: Lionel Robbins (1935) The Nature and Significance of Economic Science”
reproduced in Daniel Hausman (1984) The Philosophy of Economics: An Anthology,
Cambridge University Press, Chapter 3, pp. 83-110.

B. Caldwell (1982) Beyond Positivism …., George Allen & Unwin, Part II Chapter 6,
“Robbins Vs Hutchison”, pp. 99-128.

Chapter 4 of Blaug (1980) will be a useful introduction to more of Hutchison’s Criticism


of Robbins. The sixth chapter in Caldwell’s (1982) is very useful for both Robbins and
Hutchison. But, there is no substitute for reading Robbins original essay reproduced in
Hausman (1984). There is a brief but succinct survey of the methodological position of
Robbins in D.P.O ‘Brien (1988).

1.6.2 Hutchison and Logical Empiricism

Terence Hutchison is instrumental in turning economics to logical empiricism and


testability. His first book, The Significance and Basic Postulates of Economic Theory
(1938), was the first systematic attempt to apply logical positivism to economics. He
termed claims for economics as a pure apriori science as a bogus claim and insisted that
economics if should stake claims as science, its propositions should be in the testable
form. He insisted not only the theories but also assumptions of economics should be
16

subjected to testing and this earned him the description as ‘radical empiricist’. It also led
to a debate on the testability of assumptions in economics. Aided by other developments
in the improved sources and methods of data collection, it certainly turned economics
more towards empirical research.

Read: Terence W. Hutchison (1956) “On Verification in Economics” in Daniel


1.6.3 Milton Friedman and Instrumentalism
Hausman (1984), The Philosophy of Economics: An Anthology, Cambridge University
Press, Chapter 7, pp. 158-167.
Blaug (1982) discusses in the fourth chapter contribution made by Hutchison. A more
detailed discussion of Hutchison’s contribution and its significance is found in the sixth
chapter of Caldwell (1982) referred to above. But, it would be essential to read
Hutchison’s essay reproduced in Hausman (1984).

Of all the methodological debates in modern economics, the one that revolves round
Milton Friedman’s contribution has far reaching significance because it not only tries to
establish formal foundations of research in economics but also brings into wide view the
limitations and difficulties involved in putting up a scientific façade to economics. His
methodology is known as ‘instrumentalism’ since he considers the function of
assumptions and theory is to get predictions, nothing more.

Milton Friedman’s paper (1953) is known as a remarkable masterpiece of marketing what


he assumes as the methodology of positive economics. For him, the ultimate goal of
science is the development of theories or hypotheses that can provide valid and
meaningful predictions about phenomena. The criteria for acceptability of a theory or a
hypothesis are:
i) logically consistent statements with meaningful counterparts,
ii) testable substantial hypothesis, and
iii) the only text of the validity of a hypothesis is the correspondence between the
prediction and the experience.
Since there are many competing hypotheses, simplicity and fruitfulness are important
requirements. For test, predictability is the main criterion and realism of assumption
do not matter. And he offers effective arguments why the realism of assumptions do
not matter. For him, if the predictions come true, then one can treat as if the
assumptions were true. This earned his methodology the name “as if” methodology.
He goes on to say that “…truly important and significant hypotheses will be found to
have assumptions that are widely inaccurate descriptive representatives of reality and,
in general”, he concluded “the more significant the theory, the more unrealistic the
assumptions”. This last part has become the notorious ‘F-Twist’ in his methodology.
He went on to argue that the mainstream neo-classical economics has an excellent
predictive track record, as if there has been no methodological problem at all in
economics. Such a provocative contribution did invite extensive criticism, though the
comfortable abstraction from reality did bring a large following to Friedman’s
methodology. The reading guide that follows here is kept deliberately elaborate to
capture the range of theoretical and methodological difficulties that are often glossed
over.

Friedman’s paper (1953) is now reproduced in many readings on Methodology. Bruce


Caldwell (1984) is very helpful because it also reproduces the debate that followed from
the American Economic Review. To follow the criticism that Friedman used the
expression “assumptions” in a limited sense and doesn’t distinguish between the different
17

senses in which ‘unrealism’ is used, one may begin with Nagel in Caldwell (1984). For
an elaboration on this see Boland (1979). Melitz (1965) provides a good summary of the
reasons advanced by Friedman as why search for realistic assumption is futile. Besides,
Melitz also helps us to locate Friedman in the appropriate historical context in the
evolution of economic methodology. For critical remarks and specifically for
characterizing the Friedmanian ‘instrumentalism’ as an ‘F-Twist’ Samuelson in Caldwell
(1984) is very useful. Mason (1980) is a drastic criticism of Friedman, whose work he
calls as “a mythology resulting in methodology”. This critique is with particular
reference to monetary theory.

There is a good summary of the whole debate in Blaug (1980). There is a good
discussion of Friedman’s method along with other shades of empiricism in Eugene
Rotwein (1980). For a short but stimulating contrast of positivist ‘predictive’ approach of
Friedman with that of anti-positivist “assumptive” approach of F.Knight see Abraham
and Eva Hirsh (1980). They trace the Friedmanian approach to Senior – Cairnes. For an
excellent analysis of the Chicago School with Friedman at the centre, tracing the origins
of logical positivism, the insularity of positivism giving rise to a kind of “ideal type” see
C.K. Wilber and Jon D. Wisman (1980).

Bruce Caldwell (1982) provides a brief but succinct summary of Friedman’s essay,
Boland’s restatement and the philosophical rejection of Instrumentalism. Rotwein (1959)
contains a good summary of Friedman’s methodology, followed by critical appraisal.
Melitz (1965) gives an account of the debate on the realism of assumptions and the
significance of testing assumptions. Boland (1979) provides a valiant defence of
Friedman by an attempt to answer every point of criticism.

Read: Milton Friedman (1953) “Methodology of Positive Economics” reproduced


in Bruce Caldwell (1984).

Jack Melitz (1965) “Friedman and Machlupon: Significance of Testing Economic


Assumptions”, Journal of Political Economy, February 1965, pp. 37-60.

1.6.4 Paul Samuelson and Operationalism


Paul Samuelson’s two major thesis on methodology are:
(i) that economists should seek to discover ‘operationally meaningful theorems’ and (ii)
there is no explanation in science but only description.

For this reason, his methodological approach is described as ‘operationalism’ or


‘descriptivism’. Samuelson comes close to Popper’s ‘rational reconstruction’ and there
has been affinity to Hutchison’s insistence on testability, though not as radical in insisting
on testing of assumptions as well. He was critical of Milton Friedman on assumptions.
Samuelson himself is criticized for not practicing what he proposed as methodological
tenets.

Read: Bruce Caldwell (1982) Beyond Positivism …, George Allen & Unwin,
Chapter 9, pp. 189-200.
18

Bruce Caldwell (1982) devotes a chapter (No. 9) to Samuelson. Samuelson’s writings on


methodology is limited and is reproduced in Caldwell (1984). There is substantial work
on Sameulson’s methodology in Stanley Wong (1973) and (1978).

1.6.5 Amartya Sen: Heterogeneity of Explanations in Economics


Perhaps one of most reflective contribution on the methodology of contemporary
economics is found in Amartya Sen (1989). It is a broad based critique of contemporary
methodology based on the deep seated heterogeneity of the subject. He observes a good
deal of discontent on methods and traditions that are vogue in economics. He comes to
the conclusion that given the diversity of the subject matter of economics, there is a need
for plurality in methodological approaches. He observes that the diversity in economics
could be seen in terms of three broad exercises viz.
(i) predicting the future and causally explaining the past,
(ii) choosing appropriate description of states and events in the past and present
and,
(iii) providing normative evaluations of states, institutions and policies. He feels
‘methodology of economics’ should admit enough diversity to deal with all
these three states. Once it is done, then there is place for all activities ranging
from verifiability and testing, static general equilibrium analysis, value based
welfare evaluations, application of formalism and mathematics to work on
assuming rationality.

Read: Amartya Sen (1989) “Economic Methodology: Heterogeneity and


Relevance”, Social Research, Vol. 56, No. 2, Summer 1989, pp. 299-329.

1.7 RESEARCH PROBLEM AND RESEARCH DESIGN

1.7.1 Research Problem


For any successful and fruitful research work, the basic requirement is clarity and
simplicity in formulating research problem. A problem might be defined as an issue that
exists in the literature, theory or policy that leads to a need for the study. A problem
should be stated within a context, and the context should be provided and briefly
explained in terms of the conceptual or theoretical framework in which it is embedded.
There is extensive literature, both in print and electronic form on research problem,
proposals and various dimensions of research design. Much of this literature is designed
for behavioural sciences and more often addressed to problems in psychology and
education, since dissertation work is common in these areas. Creswell (1994) and
Kerlinger (1979).

Read: Creswell, J.W (1994) Research Design: Qualitative and Quantitative


Approaches, Sage Publications.

Kerlinger, F.N. (1979) Behavioural Research: A Conceptual Approach, H.Rinchart


of Winston, New York.

1.7.2 Basic Steps in Framing a Research Proposal


19

Once there is basic clarity on the research problem to be pursued, you may get down to
make a research proposal. There are certain basic steps involved in preparing a research
proposal and this preparation will facilitate smooth sailing in carrying out the research
work. The following are rudimentary steps in preparing a research design.
1. Introduction

“The introduction is the part of the paper that provides readers with the background
information for the research reported in the paper. Its purpose is to establish a framework
for the research, so that readers can understand how it is related to other research”.

2. Statement of the Problem

Effective problem statements answer the question “Why does this research need to be
conducted.” If a researcher is unable to answer this question clearly and succinctly, and
without resorting to hyperspeaking (i.e. focusing on problems of macro or global
proportions that certainly will not be informed or alleviated by the study), then the
statement of the problem will come off as ambiguous and diffuse.

3. Objectives of the Study

Objectives should be stated clearly and these should be kept in view throughout the
investigation and analysis. One of the important character of a good price of research
work is that the findings are sharply linked to the set objectives. Since the objectives
provide direction to the entire research work, these should be limited and focused and too
many objectives are likely to be a hindrance to analysis and interpretation.

4. Review of Literature

The review of literature is meant to gain insight on the topic and gain knowledge on the
availability of data and other materials on the theme of proposed area of research. The
literature reviewed may be classified into two types viz. (i) literature relating to the
concepts and theory and (ii) empirical literature consisting of findings in quantitative
terms by studies conducted in the area. This will help in framing research questions to be
investigated. Academic journals, conference proceedings, government reports, books etc.
are the main sources of literature. With the spread of IT, one can access a large volume
of literature through internet.

5. Questions or Hypotheses

Questions are relevant to normative or census type research (How many of them are
there? Is there a relationship between them?). They are most often used in qualitative
inquiry, although their use in quantitative inquiry is becoming more prominent.
Hypotheses are relevant to the theoretical research and are typically used only in
quantitative inquiry. When a writer states hypotheses, the reader is entitled to have an
exposition of the theory that lead to them (and of the assumptions underlying the theory).
Just as conclusions must be grounded in the data, hypotheses must be grounded in the
theoretical framework.

Hypothesis can be formulated as a proposition or set of propositions providing most


probable explanation for occurrence of some specified phenomenon. Hypotheses when
20

empirically tested may either be accepted or rejected. A hypothesis must, therefore, be


capable of being tested. A hypothesis stated in terms of a relationship between the
dependent and independent variables are suitable for econometric treatment. The manner
in which hypothesis is formulated is important as it provides the required focus for
research. It also helps in identifying the method of analysis to be used.

6. Methods of Analysis / Methodology

The methods or procedures section is really the heart of the research proposal. The
activities should be described with as much detail as possible, and the continuity between
them should be apparent. There is need to indicate the methodological steps to be taken
to answer every question or to test every hypotheses.
The issues relating to sources of data, nature of data, sampling design, methods of
collection of data, methods of analysis etc. all should be clearly discussed in this section.

7. Limitations of the Study

No research proposal or project is likely to be totally perfect. There will always be


weaknesses and limitations. It is always desirable to spell out these limitations so as to
keep the work within the feasible limits and make this known to the readers.

8. Significance of the Study

There is always a question whether the proposed research leads to ‘value addition’ – in
this case addition to knowledge in the domain of the proposed research. It would be
important to indicate how the proposed research will refine, revise or extend existing
knowledge in the area of investigation.

9. References

Proper documentation is an essential part of any research work. There are number of
style sheets which would be of help for proper references in text and in the reference list.

1.8 Further Suggested Readings

Abraham and Eva Hirsh (1980) “The Heterodox Methodology of Two Chicago
Economists” in W.J. Samuels (ed.).

Bruce Caldwell (1982) “Beyond Positivism: Economic Methodology in the Twentieth


Century, George Allen & Unwin, London.

Bruce Caldwell (1984) Appraisal and Criticism in Economics, Allen and Unwin, Boston.

Carl Hempel (1965) Aspects of Scientific Explanation and Other Essays in the
Philosophy of Science, Free Press, New York.

C.K. Wilber and Jon D. Wisman, (1980) “The Chicago School: Positivism or Ideal Type”
in W.J. Samuels (ed.).
21

D.M. Hausman (1988) “Economic Methodology and Philosophy of Science” in


Boundaries of Economics, edited by Gordon C. Winston and Richard F. Teichgraeber II.

D.P.O’ Brien (1988) Lionel Robbins, McMillan, London, pp. 23-40.

Daniel M. Hausman (ed) (1984) The Philosophy of Economics, Cambridge University


Press, Cambridge.

Duncan Hodge (2007) : Economics, realism and reality: a comparison of Maki and
Lawson, Cambridge Journal of Economics, Volume 32 Number 2 March 2008, Oxford
University Press.

E. Nagel (1963) “Assumptions in Economics Theory” in Caldwell (1984).

Eugen Rotwein (1959) “On the Methodology of Positive Economics”, QJE.

Eugene Rotwein (1980) “Empiricism and Economic Method” in Warren J. Samuels


(1980).

Igor Naletov (1984) Alternatives to Positivism, Progress Publishers, Moscow.

IGNOU (2006): Research Methodology: Issues and Perspectives (Block 1) of MEC-009


course on Research Methods in Economics.

L. Boland (1979) “A Critique of Friedman’s Critics” JEL, June 1979, also in Caldwell
(1984).

M. Blaug (1980) Methodology of Economics, CUP.

Neil de Marchi (ed) (1988) The Popperian Legacy in Economics, Cambridge University
Press, Cambridge.

P.A. Samuelson (1963 & 1964) on Friedman in Caldwell (1984).

Spiro Latsis (ed) (1976) Method and Appraisal in Economics, Cambridge University
Press, Cambridge.

Stanley Wong (1978) The Foundations of Paul Sameulson’s Revealed Preference


Theory.

Stanley Wong (1973) “The F-Twist” and the Methodology of Paul Samuelson, American
Economic Review, June 1973.

W. Salmon (1990) Four Decades of Scientific Explanation, University of Minnesota


Press, Minneapolis.

W.J. Samuels (1980) The Methodology of Economic Thought, Transactions Books, New
Burnswick.
22

William E. Mason (1980) “Some Negative Thoughts on Friedman’s Positive Economics”


Journal of Post Keynesian Economics Vol. III No. 2, pp. 235-255.

1.9 Model Questions

1. Why does the question Positivism or Positivism arise? How does one come round
this problem?
2. According to Popper, what are the limitations of Positivism? Critically examine
Popper’s philosophy of science.
3. How does Kuhn explain growth of knowledge?
4. Why is Lakatos’ contribution called a bridge between Popper and Kuhn?
5. What is the significance of Feyerabend’s tirade ‘against method’?
6. Discuss the evolution of explanatory structures within Positivism.
7. Discuss the contribution of Hempel and Oppenheim to explanatory models.
8. Critically examine the methodological contention of the Classical School.
9. What is apriorism? How does Robbins defend it?
10. Evaluate the methodological contribution of Hutchison.
11. How does one explain the all pervasive appreciation as well as criticism of Milton
Friedman’s methodology of positive economics?
12. Is there a room for methodological heterodoxy in economics? What is the
significance of Sen’s methodological contribution?
23

Block 2 QUANTITATIVE METHODS: DATA COLLECTION

Structure
2.1 Introduction
2.2 Objectives
2.3 An Overview of the Block
2.4 Method of Data Collection
2.5 Tools of Data Collection
2.6 Sampling Design
2.6.1 Population and Sample Aggregates and Inference
2.6.2 Non-Random Sampling
2.6.3 Random or Probability Sampling
2.6.4 Methods of Random Sampling
2.6.4.1 Simple Random Sampling with Replacement (SRSWR)
2.6.4.2 Simple Random without Replacement (SRSWR)
2.6.4.3 Interpenetrating Sub-Samples (I-PSS)
2.6.4.4 Systematic Sampling
2.6.4.5 Sampling with Probability Proportional to Size (PPS)
2.6.4.6 Stratified Sampling
2.6.4.7 Cluster Sampling
2.6.4.8 Multi-Stage Sampling
2.7 The Choice of an Appropriate Sampling Method
2.8 Let Us Sum Up
2.9 Further Suggested Readings
2.10 Some Useful Books
2.11. Model Questions

2.1 INTRODUCTION

Research is the objective and systematic search for knowledge to enhance our
understanding of the complex physical, social and economic phenomena that surround us.
It involves a scientific study of the variety of factors or variables that shape such
phenomena, the interrelationships amongst them and how these impact on our lives. The
results of such studies give rise to more questions for us to find answers and egg us on to
further research, resulting in the extension of the frontiers of knowledge.

Studies of the behaviour of variables and of relationships amongst them necessitate


measurement of the variables involved. Variables can be quantitative variables like GNP
or qualitative variables like opinions of individuals on, say, ban on smoking in public
places. The former set assumes quantitative values. The latter set does not admit of easy
quantification, though some of these can be categorised into groups that can then be
assigned quantitative values. Research strategies thus adopt two approaches, quantitative
and qualitative. We shall deal with the quantitative approach in this Block.

The basic ingredient of quantitative research is the measurement, in quantitative terms, of


the variables involved or the collection of the data relevant for the analytical and
interpretative processes that constitute research. The quality of the data utilised in
research is important because the use of faulty data in such endeavour results in
misleading conclusions, however sophisticated may be the analytical tools used for
analysis. Research processes, be it testing of hypotheses and models or providing the
24

theoretical basis for policy or review of policy, call for objectivity, integrity and
analytical rigour in order to ensure academic and professional acceptability and, above
all, an effective tool to tackle the problem at hand. Data used for research should,
therefore, reflect, as accurately as possible, the phenomena these seek to measure and be
free from errors, bias and subjectivity. Collection of data has thus to be made on a
scientific basis.

2.2 OBJECTIVES

After going through this Block you will be able to

• appreciate different methods of collecting data;


• acquire knowledge of different tools of data collection;
• define the key terms commonly used in quantitative analysis, like parameter,
statistic, estimator, estimate, inference, standard error, confidence intervals, etc.;
• distinguish between random and non-random sampling procedures for data
collection;
• appreciate the advantages of random sampling in the assessment of the
“precision” of the estimates of population parameters;
• acquire knowledge of the procedure for drawing samples by different methods;
• develop the ability to obtain estimates of key parameters like population total,
proportion, mean, etc.; and of the “precision” of such estimates under different
sampling methods; and
• appreciate the feasibility/appropriateness of applying different sampling methods
in different research contexts.

2.3 AN OVERVIEW OF THE BLOCK

How to assemble data on a scientific basis? There are broadly three different methods of
collecting data. These are dealt with in Section 2.4. The tools that one can use for
collecting data – the formats and the devices that modern technology has provided – are
enumerated in section 2.5. There are situations where it is considered desirable to gather
data from only a part of the universe, or a sample selected from the universe of interest to
the study at hand, rather than a complete coverage of the universe, for reasons of cost,
convenience, expediency, speed and effort. Questions then arise as to the manner in
which such a sample should be chosen – the sampling design. This question is examined
in detail in Section 2.6. The discussion is divided into a number of sub-topics. Concepts
relating to population aggregates like mean and variance and similar aggregates from the
sample and the use of the latter as estimates of population aggregates have been
introduced in sub-section 2.6.1. There are two types of sampling: random and non
random. Non random sampling methods and the contexts in which these are used are
described in sub-section 2.6.2. A random sample has certain advantages over a non
random sample - it provides a basis for drawing valid conclusions from the sample about
the parent population. It enables us to state the precision of the estimates of population
parameters in terms of (a) the extent of their variation or (b) an interval within which the
value of the population parameter is likely to lie with a given degree of certainty. Further,
it even helps the researcher to determine the size of the sample to be drawn if his project
is subject to a sanctioned budget and permissible limits of error in the estimate of the
25

population parameter. These principles are explained in sub-section 2.6.3. Eight methods
of random sampling are then detailed in sub-section 2.6.4. These details relate to(i)
operational procedures for drawing samples, and (ii) expressions for (a) estimators of
parameters and measures of their variation and b) estimators of such variation where the
population variation parameter is not known. Different sampling procedures are also
compared, as we go along, in terms of the relative precision of the estimates, they
generate. Finally, the question of choosing the sampling method that is appropriate to a
given research context is addressed in Section 2.7. A short summing up of the Block is
given in Section 2.8.

Each Section/subsection ends with a box guiding you to relevant portions of one or more
publications that give you more details on the topic(s) handled in it. Fuller details of these
publications are indicated in Section 2.10. Section 2.9 is meant to kindle your interest and
appetite for recent developments in the subject and Section 2.11 for evaluation of your
knowledge of the subject matter covered in this Block.

2.4 METHOD OF DATA COLLECTION

There are three methods of data collection – the Census and Survey Method, the
Observation Method and the Experimental Method. The first is a carefully planned and
organised study or enquiry to collect data on the subject of the study/enquiry. We might
for instance organise a study on the prevalence of the smoking habit among high school
children – those aged 14 to 17 - in a certain city. One approach is to collect data of the
kind we wish to collect on the subject matter of the study from all such children in all the
schools in the city. In other words, we have a complete enumeration or census of the
population or universe relevant to the enquiry, namely, the city’s high school children
(called the respondent units or informants of the Study) to collect the data we desire. The
other is to confine our attention to a suitably selected part of the population of high
school children of the city, or a sample, for gathering the data needed. We are then
conducting a sample survey. A well known example of Census enquiry is the Census of
Population conducted in the year 2001, where data on the demographic, economic,
social and cultural characteristics of all persons residing in India were collected. Among
sample surveys of note are the household surveys conducted by the National Sample
Survey Organisation (NSSO) of the Government of India that collect data on the socio-
economic characteristics of a sample of households spread across the country.

The Observation Method records data as things occur, making use of an appropriate and
accepted method of measurement. An example is to record the body temperature of a
patient every hour or a patient’s blood pressure, pulse rate, blood sugar levels or the lipid
profile at specified intervals. Other examples are the daily recording of a location’s
maximum and minimum temperatures, rainfall during the South West / North East
monsoon every year in an area, etc.

The Experimental Method collects data through well designed and controlled statistical
experiments. Suppose for example, we wish to know the rate at which manure is to be
applied to crops to maximise yield. This calls for an experiment, in which all variables
other than manure that affect yield, like water, quality of soil, quality of seed, use of
insecticides and so on, need to be controlled so as to evaluate the effect of different levels
of manure on the yield. Other methods of conducting the experiment to achieve the same
26

objective without controlling “all other factors” also exist. Two branches of statistics -
The Design and Analysis of Experiments and Analysis of Variance - deal with these.

Read Sections 1.2 to 1.7, Chapter 1, M.N.Murthy (1967), pp. 3 – 20.

2.5. TOOLS OF DATA COLLECTION

How do we collect data? We translate the data requirements of the proposed Study into
items of information to be collected from the respondent units to be covered by the study
and organise the items into a logical format. Such a format, setting out the items of
information to be collected from the respondent units, is called the questionnaire or
schedule of the study. The questionnaire has a set of pre-specified questions and the
replies to these are recorded either by the respondents themselves or by the investigators.
The questionnaire approach assumes that the respondent is capable of understanding and
answering the questions all by himself/herself, as the investigator is not supposed, in this
approach, to influence the response in any manner by interpreting the terms used in the
questions. Respondent-bias will have to be minimised by keeping the questions simple
and direct. Often the responses are sought in the form of “yes”, “no” or “can’t say” or the
judgment of the respondent with reference to the perceived quality of a service is graded,
like, “good”, “satisfactory” or “unsatisfactory”.

In the schedule approach on the other hand, the questions are detailed. The exact form of
the question to be asked of the respondent is not given to the respondent and the task of
asking and eliciting the information required in the schedule is left to the investigator.
Backed by his training and the instructions given to him, the investigator uses his
ingenuity in explaining the concepts and definitions to respondents to obtain reliable
information. This does not mean that investigator-bias is more in the schedule approach
than in the questionnaire approach. Intensive training of investigators is necessary to
ensure that such a bias does not affect the responses from respondents.

Schedules and questionnaires are used for collecting data in a number of ways. Data may
be collected by personally contacting the respondents of the survey. Interviews can also
be conducted over the telephone and the responses of the respondent recorded by the
investigator. The advent of modern electronic and telecommunications technology
enables interviews being done through e mails or by ‘chatting’ over the internet. The mail
method is one where (usually) questionnaires are mailed to the respondents of the survey
and replies received by mail through (postage pre-paid) business-reply envelopes. The
respondents can also be asked (usually by radio or television channels or even print
media) to send their replies by SMS to a mobile telephone number or to an e-mail
address.

Collection of data can also be done through mechanical, electro-mechanical or electronic


devices. Data on arrival and departure times of workers are obtained through a
mechanical device. The time taken by a product to roll off the assembly line and the time
taken by it to pass through different work stations are recorded by timers. A large number
of instruments are used for collecting data on weather conditions by meteorological
centres across the country that help assessing current and emerging weather conditions.
Electronic Data Transfers (EDT) can also be the means through which source agencies
like ports and customs houses, where export and import data originate, supply data to a
27

central agency like the Directorate General of Commercial Intelligence and Statistics
(DGCI&S) for consolidation.

The above methods enable us to collect primary data, that is, data being collected afresh
by the agency conducting the enquiry or study. . The agency concerned can also make
use of data on the subject already collected by another agency or other agencies –
secondary data. Secondary data are published by several agencies, mostly Government
agencies, at regular intervals. These can be collected from the publications / compact
discs or the websites of the agencies concerned. But such data have to be examined
carefully to see whether these are suitable or not for the study at hand before deciding to
collect new data.

Errors in data constitute an important area of concern to data users. Errors can arise due
to confining data collection to a sample. (sampling errors). It can be due to faulty
measurement arising out of lack of clarity about what is to be measured and how it is
measured. Even when these are clear, errors can creep in due to inaccurate measurement.
Investigator bias also leads to errors in data. Failure to collect data from respondent units
of the population or the sample due to omission by the investigator or due to non-
response (respondents not furnishing the required information) also results in errors.
(non-sampling errors). The total survey error made up of these two types of errors need
to be minimised to ensure quality of data.

Read Chapter 3, p-69, Kultar Singh (2007).

2.6 SAMPLING DESIGN

We have looked at methods and tools of data collection, chief among which is the sample
survey. How to select a sample for the survey to be conducted? There are a number of
methods of choosing a sample from a universe. These consist of two categories, random
sampling and non-random sampling. Let us turn these methods and see how well the
results from the sample can be utilised to draw conclusions about the parent universe.

But first let us turn to some notations, concepts and definitions.

2.6.1 Population And Sample Aggregates And Inference

Let us denote population characteristics by upper case (capital) letters in English or


Greek and sample characteristics by lower case (small) letters in English. Let us consider
a (finite) population consisting of N units Ui (i = 1,2,….N). Let Yi (i = 1,2,….N) be the
value of the variable y, the characteristic under study, for the ith unit Ui (i = 1,2,…..N).
For instance, the units may be the students of a university and y may be their weight in
kilograms. Any function of the population values Yi is called a parameter. An example
is the population mean ‘μ’ or ‘M’ given by (1/N)∑iYi, where ∑i stands for summation
over i = 1 to N. Let us now draw a sample of ‘n’ units ui (i = 1,2,…..n)1 from the above

1
The sample units are being referred to as ui (i = 1,2,…..n) and not in terms of Ui as we do not know
which of the population units have got included in the sample. Each ui in the sample is some population
unit
28

population and let the value of the ith sample unit be yi (i = 1,2,…..n)2. In other words, yi
(i = 1,2,….n) are the sample observations. A function of the sample observations is
referred to as a statistic. The sample mean ‘m’ given by (1/n)∑iyi , ∑i (i =1 to n), is an
example of a statistic.

Let us note the formulae for some important parameters and statistics.

Population total Y = ∑iYi , ∑i stands for summation over i = 1 to N (2.1)

Population mean = ‘μ’ or ‘M’ , = (1/N)∑iYi, ∑i , i = 1 to N (2.2)

Population variance σ2 = (1/N) ∑iYi2 – M2 , ∑i , i = 1 to N (2.3)

Population SD = σ = +√[(1/N) ∑iYi2 – M2] , ∑i , i = 1 to N (2.4)

Sample mean = (1/n) ∑iyi , ∑i , (i =1 to n) (2.5)

Sample variance s2 = (1/n) ∑i yi 2 – m2 , ∑i i = 1 to n (2.6)

= [ss]2 /n , where [ss]2 = ∑(yi – m)2 = ∑ yi 2 – n m2 = sum of


squares of sample observations from their mean ‘m’ (2.7)

Sample standard deviation ‘s’ = +√[(1/n)∑i yi 2 – m2 ], ∑i , i = 1 to n (2.8)

Population proportion P = (1/N)∑iYi = Nı / N, (where Nı is the number of units in the


population possessing a specified characteristic) (2.9)

σ2 = (1/N) ∑i Yi2 – M2 = P – P2 = P(1 – P) = PQ,


where Q = [(N – Nı)/N] = (1 – P). (2.10)
m = p, (proportion of units in the sample with the specific characteristic) (2.11)

s2 = p(1 – p) = pq, where p is the sample proportion and p + q = 1 (2.12)


[ss]2 = npq (2.13)

The purpose of drawing a sample from a population is to arrive at some conclusions


about the parent population from the results of the sample. This process of drawing
conclusions or making inferences about the population from the information contained in
a sample chosen from the population is called inference. Let us see how this process
works and what its components are. The sample mean ‘m’, for example, can serve as an
estimate of the value of the population mean ‘μ’. The statistic ‘m’ is called an estimator
(point estimator) of the population mean ‘μ’. The value of ‘m’ calculated from a specific
sample is called an estimate (point estimate) of the population mean ‘μ’. In general, a
function of sample observations, that is, a statistic, which can be used to estimate the
unknown value of a population parameter, is an estimator of the population parameter.
The value of the estimator calculated from a specific sample is an estimate of the
population parameter.

2
The same reasons apply for referring the sample values or observations as yi (i = 1,2,…n) and not in terms
of the population values Yi . yi wil be some Yi.
29

The estimate ‘m1’ of the population parameter ‘μ’, computed from a sample, will most
likely be different from ‘μ’. There is thus an error in using ‘m1’ as an estimate of ‘μ’.
This error is the sampling error, assuming that all measurement errors, biases etc., are
absent, that is, there are no non-sampling errors. Let us draw another sample from the
population and compute the estimate ‘m2‘ of ‘μ’. ‘m2‘ may be different from ‘m1’ and
also from ‘μ’. Supposing we generate in this manner a number of estimates mi (i =
1,2,3,…….) of ‘μ’ by drawing repeated samples from the population. All these mi (i =
1,2,3,….) would be different from each other and from ‘μ’. What is the extent of the
variability in the mi (i = 1,2,3,….), or, the variability of the error in the estimate of ‘μ’
computed from different samples? How will these values be spread or scattered around
the value of ‘μ’ or the errors be scattered around zero? What can we say about the
estimate of the parameter obtained from the specific sample that we have drawn from the
population as a means of measuring the parameter, without actually drawing repeated
samples? How well do non-random and random samples answer these questions? The
answers to these questions are important from the point of view of inference.

Let us first look at the different methods of non-random sampling and then move on to
random sampling.

Read Sections 3.1 to 3.3, pp. 66 – 77 and Section 3.9, pp. 97 – 107, Chapter 3, Richard I
Levin and David S Rubin (1991).

2.6.2 Non-Random Sampling

There are several kinds of non-random sampling. A judgment sample is a sample that
has been selected by making use of one’s expert knowledge of the population or the
universe under consideration. It can be useful in some circumstances. An auditor for
example could decide, on the basis of his experience, on what kind of transactions of an
institution he would examine so as to draw conclusions about the quality of financial
management of an institution. Convenience Sampling is used in exploratory research to
get a broad idea of the characteristic under investigation. An example is one that
consists of some of those coming out of a movie theatre; and these persons may be asked
to give their opinion of the movie they had just seen. Another example is one consisting
of those passers by in a shopping mall whom the investigator is able to meet. They may
be asked to give their opinion on a certain television programme. The point here is the
convenience of the researcher in choosing the sample. Purposive Sampling is much
similar to judgement sampling and is also made use of in preliminary research. Such a
sample is one that is made up of a group of people specially picked up for a given
purpose. In Quota Sampling, subgroups or strata of the universe (and their shares in the
universe) are identified. A convenience or a judgement sample is then selected from each
stratum. No effort is made in these types of sampling to contact members of the universe
who are difficult to reach. In Heterogeneity Sampling units are chosen to include all
opinions or views. Snowball Sampling is used when dealing with a rare characteristic. In
such cases, contacting respondent units would be difficult and costly. This method relies
on referrals from initial respondents to generate additional respondents. This technique
enables one to access social groups that are relatively invisible and vulnerable. This
method can lower search costs substantially but this saving in cost is at the expense of the
representative character of the sample. An example of this method of sampling is to find
30

a rare genetic trait in a person and to start tracing his lineage to understand the origin,
inheritance and etiology of the disease.

It would be evident from the description of the methods given above that the relationship
between the sample and the parent universe not clear. The selection of specific units for
inclusion in the sample seem to be subjective and discretionary in nature and, therefore,
may well reflect the researcher’s or the investigator’s attitudes and bias with reference to
the subject of the enquiry.

A sample has to be representative of the population from which it has been selected, if it
is to be useful in arriving at conclusions about the parent population. A representative
sample is one that contains the relevant characteristics of the population in the same
proportion as in the population. Seen from this angle, the non-random sampling methods
described above do not yield representative samples. Such samples are, therefore, not
helpful in drawing valid conclusions about the parent population and the way these
conclusions change when another sample is chosen from the population. Non-random
sampling is, however, useful in certain circumstances. For instance, it is an inexpensive
and quick way to get a preliminary idea of the variable under study or a rough
preliminary estimate of the characteristics of the universe that helps us to design a
scientific enquiry into the problem later. It is thus useful in exploratory research.
------------------------------------------------------------------------------------------------------------
-
Read
Sections “Non-Probability Sampling” and “Other Sampling Designs”, Chapter 5, Royce
A. Singleton (2005) pp.132 – 138.
Section on “Non-Probability Sampling”, Chapter 4, Kultar Singh (2007), pp. 107 – 108.
------------------------------------------------------------------------------------------------------------

2.6.3 Random or Probability Sampling

Random sampling methods, on the other hand, yield samples that are representative of
the parent universe. The selection process in random sampling is free form the bias of the
individuals involved in drawing the sample as the units of the population are selected at
random for inclusion in the sample. Random sampling is a method of sampling in which
each unit in the population has a predetermined chance (probability) of being included in
the sample. A sampling design is a clear specification of all possible samples of a given
type with their corresponding probabilities. This property of random sampling helps us to
answer the questions we raised at the end of sub-section 2.6.1 above. That is, we can
make estimates of the characteristics of the parent population from the results of a sample
and also indicate the extent of error to which such estimates are subject or the precision
of the estimate. This is better than not knowing anything at all about the magnitude of the
error in our statements regarding the parent population. Let us see how random sampling
helps in this regard.

A. Precision of Estimates – Standard Errors and Confidence Intervals

We noted earlier (the last paragraph of subsection 2.6.1) that the sample mean (an
estimate of the population mean ‘μ’) will have different values in repeated samples drawn
from the population and none of these may be equal to ‘μ’. Suppose that the repeated
31

samples drawn from the population are random samples. The sample mean computed
from a random sample is a random variable. So is the sampling error, that is, the
difference between‘μ’ and the sample mean. The values of the sample means (and the
corresponding errors in the estimate of ‘μ’) computed from the repeated random samples
drawn from the population are the values assumed by this random variable with
probabilities associated with drawing the corresponding samples. These will trace out a
frequency distribution that will approach a probability distribution when the number of
random samples drawn increases indefinitely. The probability distribution of sample
means computed from all possible random samples from the population is called the
sampling distribution of the sample mean. The sampling distribution of the sample mean
has a mean and a standard deviation. The sample mean is said to be an unbiased
estimator of the population mean if the mean of the sampling distribution of the sample
mean is equal to the mean of the parent population, say, μ. In general, an estimator “t”
of a population parameter “θ” is an unbiased estimator of “θ” if the mean of the
sampling distribution of “t”, or the expected value of the random variable “t”, is equal
to “θ”. In other words, the mean of the estimates of the parameter made from all possible
samples drawn from the population will be equal to the value of the parameter.
Otherwise, it is said to be a biased estimate. Supposing the mean of the sampling
distribution of sample mean is Kμ or K+μ, where K is a constant. The bias in the
estimate can be easily corrected in such cases by adopting m/K or (m – K) as the
estimator of the population mean.

The variance of the sampling distribution of the sample mean is called the sampling
variance of the sample mean. The standard deviation of the sampling distribution of
sample means is called the standard error (SE) of the sample mean. It is also called the
standard error of the estimator (of the population mean), as the sample mean is an
estimator of the population mean. The standard error of the sample mean is a measure of
the variability of the sample mean about the population mean or a measure of the
precision of the sample mean as an estimator of the population mean. The ratio of the
standard deviation of the sampling distribution of sample means and the mean of the
sampling distribution is called the coefficient of variation (CV) of the sample mean or
the relative standard error (RSE) of the sample mean. That is,

CV or RSE = C = standard deviation / mean (2.14)

CV (or RSE) is a free number or is dimension-less, while the mean and the standard
deviation are in the same units as the variable ‘y’. (These definitions can easily be
generalised to the sampling distribution of any sample statistic and its SE and RSE.)

We have talked about the unbiasedness and precision of the estimate made from the
sample. What more can we say about the precision of the estimate and other
characteristics of the estimate? This is possible if we know the nature of the sampling
distribution of the estimate.

The nature of the sampling distribution of, say, the sample mean, or for that matter any
statistic, depends on the nature of the population from which the random sample is
drawn. If the parent population has a normal distribution with mean μ and variance σ2 or,
in short notation, N (μ, σ2), the sampling distribution of the sample mean, based on a
random sample drawn from this, is N (μ, σ2/n). In other words, the variability of the
32

sample mean is much smaller than that of the variable of the population and it also
decreases as the sample size increases. Thus, the precision of the sample mean as an
estimate of the population mean increases as the sample size increases.

As we know, the normal distribution N (μ, σ2) has the following properties:

(i) Approximately 68% of all the values in a normally distributed population lie
within a distance of one standard deviation (plus and minus) from the mean,
(ii) Approximately 95% of all the values in a normally distributed population lie
within a distance of 1.96 standard deviation (plus and minus) of the mean,
(iii) Approximately 99% of all the values in a normally distributed population lie
within a distance of 2.576 standard deviation (plus and minus) of the mean.

The statement at (iii) above, for instance, is equivalent to saying that the population mean
μ will lie between the observed values (y – 2.576 σ) and (y + 2.576 σ) in 99% of the
random samples drawn from the population N(μ, σ2). Applying this to the sampling
distribution of the sample mean, which is N(μ, σ2/n), we can say that

Pr.[(m – 2.576 σ / √n) < μ < (m+ 2.576 σ / √n )] = 0.99 (2.15)

or that the population mean μ will lie between the limits computed from the sample,
namely, (m – 2.576 σ/√n) and (m+ 2.576 σ√n) in 99% of the samples drawn from the
population. This is an interval estimate, or a confidence interval, for the parameter with
a confidence coefficient of 99% derived from the sample.

The general rule for constructing a confidence interval of the population mean with a
confidence coefficient of 99% is: the lower limit of the confidence interval is given by
the “estimate of the population mean minus 2.576 times the standard error of the
estimate” and the upper limit of the interval by the “estimate plus 2.576 times the
standard error of the estimate”. (2.16)

B. Assessment of Precision – Unknown Population Variance

If the parent population is distributed as N (M, σ2) and σ2 is not known, we make use of
an estimate of σ2. The statistic ‘s2 ‘ given in formula 2.6 can be one such, but this is not
an unbiased estimate of σ2 as E (s2) = [(n – 1)/ n] σ2. We, therefore, by using (2.6) & (2.7)
have:

v(y)=[ns2 /(n – 1)] as an unbiased estimate of σ2. (2.17)

v(y) = [1/(n – 1)] [ss]2 or, v(y) = [(1/(n - 1)][ ∑i yi 2 – nm2 ] (2.18)

As the sampling variance of the sample mean ‘m’ is σ2/n, an unbiased estimate v(m) of
the sampling variance will be v(y)/n. Let us now consider the statistic defined by the
ratio,

t = (m – M) / [√v(y) / √n] . (2.19)


33

The numerator is a random variable distributed as N(0, σ2/n) and the denominator is the
square root of the unbiased estimate of its variance. The sampling distribution of the
statistic ‘t’ is the Student’s t-distribution with (n – 1) degrees of freedom. It is a
symmetric distribution. A confidence interval can now be constructed for the population
mean M from the selected random sample, say with a confidence coefficient of (1 - α )%.
The values of ‘tα ‘ for different values of α = Pr.[t > tα ] + Pr.[(- t) < (- tα)] = 2 Pr.[ t > tα
] and different degrees of freedom have been tabulated in, for instance, Rao, C.R. and
Others (1966). The confidence interval with a confidence coefficient (1 - α) for the
population mean M would be as in 2.19 below – easily computed from the sample
observations.

[m – tα v(m) < M < m + tα v(m)] (2.20)

We note that the rule 2.16 applies here also except that we use (i) the square root of the
unbiased estimate of the sampling variance of the estimate of the population mean in
the place of the standard error of the estimate of the population mean, and (ii) the
relevant value of the ‘t’ distribution instead of the normal distribution ……… (2.21)

We have so far dealt with parent populations that are normally distributed. What will be
the nature of the sampling distribution of the sample mean when the parent population is
not normally distributed? We examine this question in the next subsection C.

C. Assessment of Precision–Parent Population has a Non-Normal Distribution

The Central Limit Theorem ensures that, even if the population distribution is not normal,

♦ the sampling distribution of the sample mean will have a mean equal to the
population mean regardless of the sample size and
♦ as the sample size increases, the sampling distribution of the sample mean
approaches the normal distribution.

Thus for large ‘n’ (sample size), say 30 or more, we can proceed with the steps
mentioned in sub-section A above. Further, the Student’s t-distribution also approaches
the normal distribution as ‘n’ becomes large so that we can use the statistic ‘t’ in sub-
section B as a normally distributed variable with mean 0 and unit variance for samples
of size 30 or more. We may then adopt the procedure outlined in sub-section A.

Read
(1) Sections 2.6 to 2., pp. 38 – 45, Chapter 2 and Sections 3.9a to 3.9d, pp.81 – 84,
Chapter 3, M.N.Murthy (1967) ,
(2) Section 3.10, Chapter 3, pp. 108 –110, Section 5.6, pp. 219 – 230, Chapter 5,
Sections 6.3 and 6.4, pp. 266 - 278, Chapter 6 and Sections 7.1, pp. 300 - 304 and
Sections 7.3 to 7.7, pp. 307 - 325, Chapter 7, Richard Levin & David S. Rubin
(1991) and
(3) Chapter s 6 & 7, pp. 113 – 151, P.K.Viswanathan (2007).

D. Determination of Sample Size


34

Random sampling methods also help in determining the sample size that is required to
attain a desired level of precision. This is possible because the standard error and the
coefficient of variation C.V. of the estimate, say, sample mean ‘m’, are functions of ‘n’,
the sample size. C.V. is usually very stable over the years and its value available from
past data can be used for determining the sample size. We can specify the value of C.V.
of the sample mean that we desire as, say, C(m) and calculate the sample size with the
help of prior knowledge of the population C.V., namely, C. That is,

C(m) = C /√n ; so that √n = C/C(m), or, n = [C/C(m)]2 (2.22)

Or we can define the desired precision in terms of the error that we can tolerate in our
estimate of ‘M’ (permissible error) and link it with the desired value of C(m). Then,

n = [2.576C / e]2 , where the permissible error e = ⏐(m – M)⏐/ M. (2.23)

If the sanctioned budget is F for the survey: Let the cost function be of the form F0 +
F1n, consisting of two components – overhead cost and cost per unit to be surveyed. As
this is fixed as F, F = F0 + F1n, and the sample size becomes n = (F – F0 )/F1. The
coefficient of variation of C(m) is not at our choice in this situation since it gets fixed
once ‘n’ is determined. We can, however, determine the error in the estimate of ‘m’ from
this sample (in terms of the RSE of m), if the population CV, C is known. If further we
suppose that the loss in terms of money is proportional to the value of RSE of m, say, Rs.
‘l’ per 1% of RSE of ‘m’, the total cost of the survey becomes, L(n) = F0 + F1n + l (C/
√n). We can then determine the sample size that minimises this new cost (which
includes the cost arising out of loss). Differentiating L(n) w.r.t n and equating to zero
and simplifying,

n = [(l/2) (C/F1)]2/3 (2.24)

See also the sub-section below on stratified sampling.

Read Section 1.6, Chapter 1, pp.13 – 16; Section 2.13, Chapter 2, p. 48; and Sections 4.2
to 4.9, pp. 19 Chapter 4, M.N.Murthy (1967), pp.96 – 123.

2.6.4 Methods of Random Sampling

We have so far dealt with random samples drawn form a population. We did not specify
the size of the population. We had assumed that the population is infinite in size. In
practice, a population may have a size N, however, large. Let us, therefore, consider
drawing random samples of size ‘n’ from a population of size ‘N’. We shall consider the
following methods of random sampling:

A. Simple Random Sampling (With Replacement) [SRSWR],


B. Simple Random Sampling (Without Replacement) [SRSWOR]
C. Interpenetrating Sub-Samples (I-PSS),
D. Systematic Sampling (sys),
E. Sampling with Probability Proportional to Size (pps)
F. Stratified Sampling (sts),
G. Cluster Sampling (cs) and
35

H. Multi-Stage Sampling (mss)

We shall indicate in the following sections a description of the above methods, the
relevant operational procedure for drawing a sample and the expressions/formulae for (a)
the estimator of the population mean/total/proportion, (b) the sampling variance of the
sample mean/total/population and (c) unbiased estimate of the sampling variance.

2.6.4.1 Simple Random Sampling With Replacement (SRSWR)

The method: This method of drawing samples at random ensures that (i) each item in the
population has an equal chance of being included in the sample and (ii) each possible
sample has an equal probability of getting selected. Let us select a sample of ‘n’ units
from a population of ‘N’ units by simple random sampling with replacement (SRSWR).
We select the first unit at random, note its identity particulars for collection of data and
place it back in the population. We choose at random another unit – this could turn out to
be the same unit selected earlier or a different one, note its identity particulars and place
it back. We repeat this process ‘n’ times to get an SRSWR sample of size ‘n’. In such a
sample one or more units may occur more than once. A sample of ‘n’ distinct units is also
possible. It can be shown that the number of possible samples that can be selected by
SRSWR method is N n and that the probability of any one sample being chosen is 1/ N n.

Operational procedure for selection of the sample by SRSWR method: Tables of


Random Numbers are used for drawing random samples. These tables contain a series of
four-digit (or five-digit or ten-digit) random numbers. Supposing a sample of 30 units is
to be selected out of a population of 3000 units. First allot one number from the set of
numbers to 0001 to 3000 as the identification number to each one of the population units.
The problem of drawing the sample of size 30 then reduces to that of selecting 30 random
numbers, one after another, from the random number tables. Turn to a page of the Tables
at random and start noting down, from the first left-most column of (four or five or ten-
digit) random numbers, the first four digits of the numbers from the top of the column
downwards. Continue this operation on to the second column till the required sample size
of 30 is selected. If any of the random numbers that comes up is more than 3000, reject it.
If some numbers (< 3000) get repeated in the process, it means that the corresponding
units of the population would be selected more than once, this being sampling with
replacement.

Estimators from SRSWR samples (using notations set down earlier):

msrswr = (1/n)∑i yi (∑i , i = 1 to n) is an unbiased estimator for M (2.25)

Note: If a unit gets selected in the sample more than once, the corresponding value of yi
will also have to be repeated as many times in the summation for calculating msrswr .

Sampling Variance of msrswr : V(msrswr) = σ2/n = [1/n] [E(yi 2 ) – M2 ] (2.26)

Standard Error of msrswr : SE(msrswr) = σ / √n (2.27)

CV or RSE of msrswr : C(msrswr) = (1/ √n)(σ / M) = C(y)/√n. (2.28)


36

Note that the sampling variance, SE and CV(RSE) of the sample mean in SRSWR is
much less than SE and CV of the variable y and these decrease as the sample size
increases. The precision of the sample mean in SRSWR, as an estimator of M increases
as the sample size increases. However, the extent of decrease in the standard error will
not be commensurate with the size of the increase in the sample size. We would need an
unbiased estimator of σ2, as σ2 may not be known. This is

v(y) = [1/(n – 1)][ss]2 (2.29);

Therefore, v(msrswr) = v(y)/n (2.30)

an unbiased estimate of Y, or, Y*srswr = Nm (2.31)

V(Y*srswr) = N2 (σ2 /n) (2.32)

v(Y*srswr) = N2 (1/n)[1/(n – 1)] ∑(yi – msrswr)2 (2.33)


.
the sample proportion ‘psrswr’ is an unbiased estimate of P (2.34)

V(psrswr) is PQ/n (2.35); & v(psrswr) = npq/(n – 1) (2.36)

C(psrswr )-( = √[(1/n)(PQ)]/ P = [1/√n)]√[Q/P]. (2.37)

Confidence intervals for the population mean/proportion and the sample size for a given
level of precision and/or permissible error can now be derived easily.

Read Sections 3.1 to 3.4, pp. 55 – 66 and Sections 3.7 to 3.8a, pp. 76 – 79, Chapter3,
ibid.

2.6.4.2 Simple Random Sampling Without Replacement (SRSWOR)

The Method: This method of sampling is the same as SRSWR but for one difference. If
a unit is selected, it is not placed back before the next one is selected. This means that no
unit gets repeated in a sample. Operationally, we draw random numbers between 1 and N
and if a random number comes up again, it is rejected and another random number is
selected. This process is repeated till ‘n’ distinct units are selected. It can be shown that
the number of samples of size n that may be selected from a population of ‘N’ units by
this method is NCn = N ! /[ (N – n) ! n ! ] =[N(N –1) (N – 2) …(N – n + 1)] / [n(n – 1) (n
– 2)…….1]. The probability Psrswor(S) of any one of the samples being chosen is, 1/ NCn .

Estimators from SRSWOR samples:

msrswor = (1/n)∑i yi , ∑i , i =1 to n, is an unbiased estimator of M (2.38)

V(msrswor) = [(N – n)/ (N – 1)] [σ2/n]


= [(N – n)/(N – 1)][1/n][(1/N) ∑i(Yi – M)2], ∑i, i=1 to N (2.39)
37

V(m)srswor < V(msrswr) since (N – n) / (N – 1) is less than 1 for n > 1, (2.40)

Both msrswor and msrswr are unbiased estimators of M but msrswor is a more efficient
estimator of M than msrswr . The factor [(N – n)/(N – 1)] in (2.40) is called the finite
population correction or finite population multiplier. The finite population correction
required for finite population need not, however, be used when the sampling fraction (n /
N) is less than 0.05.

v(msrswor) = [(N – n)/N][1/n][1/(n – 1)][∑i (yi – m)2] , ∑i , i= 1 to n

=[(N– n)/N][1/n][1/(n – 1)][ss]2, (2.41)


.
*
Unbiased estimate of population total Y, that is Y srswor = Nmsrswor (2.42)

V(Y*srswor) = N2 V(msrswor ) (2.43)

unbiased estimate of V(Y*srswor), namely, v(Y*srswor) = N2 v(msrswor (2.44)

C(Y*srswor) = C(msrswor ) (2.45)

the sample proportion ‘p’ is an unbiased estimate of ‘P’ in SRSWOR also. (2.46)

V(p) = [(N – n)/ (N – 1)] [PQ/n] , where P + Q =1 (2.47)

v(p) = [(N – n)/N] [pq/(n – 1)] , where p + q = 1 (2.48)

C(p) = √[(N – n)/(N – 1)] √[(1/n) √[Q/P] and (2.49)

Read Sections 3.5 to 3.7, pp. 67 – 78 and Sections 3.8 b & c, pp.80 – 81, Chapter 3, ibid.

2.6.4.3 Interpenetrating Sub-Samples (I-PSS)

Suppose a sample is selected in the form of two or more sub-samples drawn according to
the same sampling method so that each such sub-sample provides a valid estimate of the
population parameter. The sub-samples drawn in this way are called interpenetrating
sub-samples (I-PSS). This is operationally convenient, as the different sub-samples could
be allotted to different investigators. The sub-samples need not be independently
selected. There is, however, an important advantage in selecting independent
interpenetrating sub-samples. It is then possible to easily arrive at an unbiased estimate
of the variance of the estimator even in cases where the sampling method/design is
complex and the formula for the variance of the estimator is complicated.

Let {t i}, i = 1,2,…,h be unbiased estimates of a parameter θ based on ‘h’ independent


interpenetrating sub-samples. Then,
38

t = (1/h)∑i t i , (∑i , i = 1 to h) is an unbiased estimate of θ


(2.50)

v(t) = [1/h(h – 1)] [∑i (t i – t)2 ], (∑i , i = 1 to h) is an unbiased estimate of V(t)


(2.51)

If the unbiased estimator ‘t’ of the parameter θ is symmetrically distributed (for example,
normally distributed), the probability of the parameter θ lying between the maximum and
the minimum of the ‘h’ estimates of θ obtained from the ‘h’ sub-samples is given by:

Prob.[Min of {t1, t2 ,---- t h } < θ < Max of {t1, t2,-----t h}] = [1 – (1/2)( h – 1) ] (2.52)

This is a confidence interval for θ from the sample. The probability increases rapidly with
the number of I-P sub-samples – from 0.5 (two sub-samples) to 0.875 (four sub-samples).

The IPSS technique is also useful in assessing non-sampling errors. (see Box below.)

Read Section 2.12, Chapter 2, ibid. p 47.

2.6.4.4 Systematic Sampling

The Method: Let {Ui}, i = 1,2,………. N be the units in a population. Let ‘n’ be the size
of the sample to be selected. Let ‘k’ be the integer nearest to N/n - denoted usually as
[N/n] - the reciprocal of the sampling fraction. Let us choose a random number from 1 to
k, say, ‘r’. We then choose the rth unit, that is, Ur . Thereafter, we select every kth unit. In
other words, we select the units Ur, Ur+k , Ur+2k ,………… This method of sampling is
called systematic sampling with a random start. ‘r’ is known as the random start and ‘k’
the sampling interval. There would thus be ‘k’ possible systematic samples, each
corresponding to one random start from 1 to k. The sample corresponding to the random
start ‘r’ will be

{Ur+jk }, j = 0,1,2, r+jk ≤ N.

The sample size of all the ‘k’ systematic samples will be ‘n’ if N = nk. All the ‘k’
systematic samples will not have a sample size ‘n’ if N ≠ nk. For example, if we have a
sample of 100 units and we wish to select systematic samples of size 14, the sampling
interval is k = [100/15] or 7 The samples with the random starting 1 and 2 will be of size
15 while the other 5 systematic samples (with random starts 3 to 7) will be of size 14.

In systematic sampling, units of a population could thus be selected at a uniform interval


that is measured in time, order or space. We can for instance choose a sample of nails
produced by a machine for five minutes at the interval of every two hours to test whether
the machine is turning out nails as per the desired specifications. Or, we could arrange the
income tax returns relating to an area in the order of increasing gross income returned
and select every fiftieth tax return for a detailed examination of the income of assesses of
the area. Systematic samples are thus operationally easier to draw than SRSWR or
SRSWOR samples. Only one random number needs to be chosen for selecting a
systematic sample.
39

Estimators from Systematic Samples:

An unbiased estimator of the population mean M based on a systematic sample is


given by a slight variant of the sample mean, namely,

msys* = (k/N) ∑i yi , ∑i , i =1 to n*, n* is the size of the selected sample and k the
sampling interval v(2.53)

If N = nk, msys* = m the sample mean. If N ≠ nk, there is a bias in using the sample mean
as the estimator for M, and

the bias in using the sample mean as an estimator of M is likely to be small in the case of
systematic samples selected from a large population. (2.54)

The disadvantages, referred to above, in systematic sampling, namely, N not being a


multiplier of the sample size n and the sample mean not being an unbiased estimator of
the population mean can be overcome by adopting a procedure called Circular
Systematic Sampling (CSS). If ‘r’ is the random start, and k the integer nearest to N / n,
we choose the units.

{Ur+jk }, if r+jk ≤ N and {Ur+jk - N }, if r+jk > N ; j = 0,1,2,…………… ( n - 1).

Taking the earlier example of selecting a systematic sample of size 14 from a population
of 100 units (N == 100, k = 7 and n = 15) all the samples can be made to have a size of
15 by adopting the CSS. A random start of 5 will lead to the selection of a sample of the
15 units 5,12,19,26,33,40,47,54,61,68.75,82,89,96 and 3 (96 + 7 – 100). This procedure
ensures equal probability of selection to every unit in the population.

Besides constancy of the sample size from sample to sample, the CSS procedure
ensures that mr the sample mean is an unbiased estimate of the population mean.
(2.55)

Let nk = N. Then m* = m. There are k possible samples, each sample with a probability
of 1/k. Let the sample mean of the r-th systematic sample be mr = (1/n)∑i yir, where yir is
the value of the characteristic under study for the i-th unit in the r-th systematic sample,
summation is from i = 1 to n. As already noted mi is an unbiased estimator of M or E(mr)
= M. We thus have k possible unbiased estimates of M. Denoting the sample mean in
systematic sampling as msys, the sampling variance of msys, and related results of interest
are:

V(msys)= σb2 (the between-sample variance). (2.56)

V(msys) = V(y) – σw2 , where σw2 is within-sample variance. (2.57)

Equation 2.57 shows that (i) V(msys) is less than the variance of the variable under study
or the population variance, since σw2 is > 0 and (ii) V(msys) can be reduced by increasing
σw2, or by increasing the within-sample variance. (ii) would happen if the units within
40

each systematic sample are as heterogeneous as possible. Since we select a sample of


‘n’ units from the population of N units by selecting every k-th element from the random
start ‘r’, the population is divided into ‘n’ groups and we select one unit from each of
these ‘n’ groups of population units. Units within a sample would be heterogeneous if
there is heterogeneity between the ‘n’ groups. This would imply that units within each of
the n groups would have to be as homogeneous as possible. All these suggest that the
sampling variance of the sample mean is related to the arrangement of the units in the
population. This is both an advantage and disadvantage of systematic sampling. An
arrangement that conforms to the conditions mentioned above would lead to a smaller
sampling variance or an efficient estimate of the population mean while a ‘bad’
arrangement would lead estimates that are not as efficient.

Aspects of systematic sampling listed below are important. Find out about these from the
suggested readings (Box below).

(i) When is V(msys) < V(msrswr) or V(msrswor)?


(ii) It is not possible to get v(msys). Why? How can this problem be overcome?
(iii) Systematic sampling is not recommended when there is a periodic or cyclic
variation in the population. What is the solution?

Read
Sections 5.1 and 5.2, pp. 133 – 141, Sections 5.4 to 5.10, pp. 142 – 171, Chapter 5, ibid.

2.6.4.5 Sampling With Probability Proportional To Size (pps)

The Sampling Method: We have so far considered sampling methods in which the
probability of each unit in the population getting selected in the sample was equal. There
are also methods of sampling in which the probability of any unit in the population
getting included in the sample varies from unit to unit. One such method is sampling with
probability proportional to size (pps) in which the probability of selection of a unit is
proportional to a given measure of its size. This measure may be a characteristic related
to the variable under study. One example may be the employment size of a factory in the
past year and the variable under study may be the current year’s output. Does this method
lead to a bias in our results, as units with smaller sizes would be under represented in the
sample and those with larger sizes would be over represented. It is true that if the sample
mean ‘m’ were to be used to estimate the population mean M, m would be a biased
estimator of M. However, what is done in this method of sampling is to weight the
sample observations with suitable weights at the estimation stage to obtain unbiased
estimates of population parameters, the weights being the probabilities of selection of
the units.

Estimates from pps sample of size 1: Let the population units be {U1,U2, -------UN}.Let
the main variable Y and the related size variable X associated with these units be{Y1, X1;
Y2, X2; …………YN, XN}. The probability of selecting any unit, say, Ui in the sample
will be Pi = (X i / X), where ∑iX i = X, where ∑i , i = 1 to N. Let us select one unit by pps
method. Let the unit selected thus have the values y1 and x1 for the variables y and x. The
variables y and x are random variables assuming values Yi and X i respectively with
probabilities Pi , i = 1,2,…………N. The following results based on the sample of size 1
can be derived easily:
41

An unbiased estimator of population total Y is Y*(1) pps = y1 /p1 (2.58)

An unbiased estimator of M is m*(1)pps = (1/N) Y*(1)pps = (1/N)(y1 /p1) (2.59)

V[Y*(1) pps] = ∑i (Yi2 /Pi ) – Y2 (2.60)

V[m*(1)pps ] = (1/N2) V[Y*(1)pps] = (1/N2) [ ∑i (Yi2 /Pi ) – Y2] (2.61)

These show that the variance of the estimate will be small if the Pi are proportional to Yi.

Estimators from pps sample of size > 1 [pps with replacement (pps-wr)]

A sample of n ( > 1) units with pps can be drawn with or without replacement. Let us
consider a pps-wr sample. Let {yi , pi} be respectively the sample observation on the
selected unit and the initial probability of selection at the i–th draw, i = 1,2,----n. Each (yi
/ pi) , i = 1,2,----- n in the sample is an unbiased estimate [Yi(pps-wr)*] of the population
total Y and V(Yi(pps-wr)*) = ∑i (Yr 2 /Pr) – Y2, ∑r , r = 1 to n. (see 2.60). Estimates from
pps-wr samples are:

Y*pps-wr = (1/n) ∑i (yi / pi) = (1/n) ∑i Y*(i)pps-wr ; ∑i , i = 1 to n. (2.62)

V(Y*pps-wr ) = (1/n)[ ∑r (Yr 2 /Pr) – Y2] ; ∑r , r = 1 to N. (2.63)

V(mpps-wr ) = (1/N2 )(1/n)[ ∑r (Yr 2 /Pr) – Y2] ; ∑r , r = 1 to N. (2.64)

v (Y*pps-wr )= [1/{n(n – 1)}][∑r (yr2 / pr2 – nY*2]; ∑r , r = 1 to n; (using 2.51) (2.65)

Operational procedure for drawing a pps-wr sample: The steps are:

(1) Cumulate the sizes of the units to arrive at the cumulative totals of the unit sizes.
Thus

Ti - 1 = X1 + X2 + -----+ Xi –1 ; Ti = X1 + X2 +---+ Xi - 1 + Xi = Ti - 1 + Xi ; i =
1,2,……..., N.

(2) Then choose a random number R between 1 and TN = X1 + X2 +………..+ XN =


X.

(3) Choose the unit Ui if R lies between Ti - 1 and Ti, that is, if Ti - 1 < R ≤ Ti . The
probability P(Ui) of selecting the i-th unit will thus be P(Ui) = (Ti - Ti – 1) /TN = Xi
/ X = Pi

(4) Repeat the operation ‘n’ times for selecting a sample of size n with pps-wr.
42

There are other sampling methods under pps like pps without replacement and pps
systematic sampling. (Readings – Box below)

------------------------------------------------------------------------------------------------------------
Read Sections 6.1 to 6.4, pp. 183 – 197 and Section 6.10, pp. 200 – 202; Chapter 6,
Section 6.10a to 6.10c, pp. 201 – 208; Section 6.11 a to c, pp. 209 – 215, ibid.

2.6.5.6 Stratified Sampling

The Method: We might sometimes find it useful to classify the universe into a number
of groups and treat each of these groups as a separate universe for purposes of sampling.
Each of these groups is called a stratum and the process of grouping stratification.
Estimates obtained from each stratum can then be combined to arrive at estimates for the
entire universe. This method is very useful as (i) it gives estimates not only for the whole
universe but also for the sub-universes and (ii) it affords the choice of different sampling
methods for different strata as appropriate. It is particularly useful when a survey
organistion has regional field offices. This method is called Stratified Sampling.

Let us divide the population (universe) of N units into k strata. Let Ns be the number of
units in the s-th stratum. Ysi be the value of the i-th unit in the s-th stratum. Let the
population mean of the s-th stratum be Ms. Ms = (1/Ns)∑Ysi, ∑i , i = 1,2,----,Ns (that is
over the units within the s-th stratum) and the population M is =(1/N)∑sNs Ms = ∑sWs Ms
, where Ws = (Ns / N) and ∑s being over the strata s = 1,2,……,k). Suppose that we select
random samples from each stratum and the sampling method for different strata are
different. Let the unbiased estimate of the population mean Ms of the s-th stratum be ms.
Denoting ‘st’ for stratified sampling, an unbiased estimator of M is given by

mst =∑s Ws ms = (1/N) ∑s Ns ms, ∑s , s = 1 to k.


(2.66)

V(mst) = ∑sWs2 V(ms) = (1/N2 )∑s Ns2 V(ms), ∑s , s= 1 to k (2.67)

Cov.(ms,mr) = 0 for s ⋅≠ r ; (samples from diff. strata are independently chosen) .. (2.68)

Yst* = ∑sYs* ; ∑s , s = 1 to k. (2.69)

V(Yst*) = ∑sV(Ys*), ∑s , s = 1 to k. (2.70)

Thus estimators with smaller variance (efficient estimators) can be obtained in stratified
sampling if we form the strata in such a way as to minimise intra-strata or within-strata
variation, that is, variance within strata. This would mean maximising between-strata or
inter-strata variation, since the total variation is made up of within-strata and between-
strata variation. In other words, units in a stratum should be homogeneous.

Stratified sampling enables us to choose the sample we wish to select by drawing


independent samples from each of the different strata in to which we have grouped the
universe. How do we allocate the total sample size ‘n’ among the different strata? One
way is to allocate the sample size to different strata in proportion to the size of individual
strata measured by the number of units in these strata, namely, Ns, [ ∑s Ns = N, (s= 1 to
43

k)]. This method is especially appropriate in situations where no information is available


except the sizes of the strata. The sample sizes for the samples from the stratum, say, the
s-th stratum, would then be ns = n(Ns/N) and ∑s ns can easily seen to be equal to ‘n’.
There are other methods like allocation of the sample size among strata in proportion to
the stratum totals of the variable under study, that is, Ys , the stratum total of the s-th
stratum, ‘s’ = 1 to k. We shall not go into the details of other methods here except one
situation, namely, when we have a fixed budget F sanctioned for the survey. Let the cost
function F be of the form F0 + ∑nsFs , (∑s , s = 1 to k), where F0 , ns and Fs are
respectively the overhead cost, the sample size in stratum ‘s’ and the per unit cost of
surveying a unit in stratum ‘s’ (s = 1,2,-----, k). We can determine the optimum stratum-
wise-sample-size by minimising the sampling variance of the sample mean (2.67) subject
to the constraint that the cost of the survey is fixed. The stratum-wise optimum sample
size is given by

ns = [(F - F0)] [Ws√(Vs / Fs )] / [ ∑s Ws√(VsFs)] , s = 1, k. (2.71)

The stratum sample size should, therefore, be proportional to Ws√(Vs / Fs). The minimum
variance with the ns so determined is,

Min. V(mst ) = [∑Ws√(VsFs)]2 / (F - F0)] (2.72)

Read Section 7.1, pp. 232 – 233, Section 7.2, pp. 235 – 236 and Section 7.4, pp. 239 –
243 (especially Section 7.4b, pp. 241 – 243), Chapter 7, ibid.
------------------------------------------------------------------------------------------------------------

2.6.4.7 Cluster Sampling

The method: Supposing we are interested in studying certain characteristics of


individuals in an area. We would naturally select a random sample of individuals from all
the individuals residing in the area and collect the required information from the selected
individuals. We might also think of selecting a sample of households out of all the
households in the area and collect the required details from all the individuals in the
selected households. The households in the area are clusters of individuals and what we
have done is to select a sample of such clusters and to collect the information needed
from all the individuals in the selected clusters instead of selecting a random sample of
individuals from all persons in the area. What we have done is cluster sampling. Cluster
Sampling is a process of forming suitable clusters of units and surveying all the units
in a sample of clusters selected according to an appropriate sampling method. The
clusters of units are formed by grouping neighbouring units or units that can be
conveniently surveyed together. Sampling methods like srswr, srswor, systematic
sampling, pps and stratified sampling discussed earlier can be applied to sampling of
clusters by treating clusters themselves as sampling units. The clusters can all be of equal
size or varying sizes, that is, the number of units can be the same, or vary from cluster to
cluster. Clusters can be mutually exclusive, that is, a unit belonging to one cluster will not
belong to any other cluster. They could also be overlapping.

Estimates from cluster sampling: Let us consider a population of NK units divided into
N mutually exclusive clusters of K units each – a case of clusters of equal size. The
44

population mean M and the cluster means are given respectively by M = (1/N)∑sms, ∑s
being over clusters s = 1 to N and ms = (1/K)∑i Ysi, ∑i being from i = 1 to K within the s-
th cluster. Let us draw a sample of one cluster by srs. The cluster mean mc-srs (the
subscript c-srs denotes cluster sampling with srs) is an unbiased estimate of M. The
sampling variance of the sample cluster mean is

V(mc-srs) = (1/N)∑s (ms – M)2 = σ b2 = Variance between clusters; ∑s , s = 1 to N;


(2.73)

Let us compare V(mc-srs) with the sampling variance of the sample mean when K units are
drawn from NK units by SRSWR method. How does the “sampling efficiency” of cluster
sampling compare with that of SRSWR?. The sampling efficiency of cluster sampling
compared to that of SRSWR, Ec/srswr , is defined as the ratio of the reciprocals of the
sampling variances of the unbiased estimators of the population mean obtained from
the two sampling methods. The sampling variances and sampling efficiency are

V(msrswr) = (1/K) [(1/NK)∑s∑iYsi2 – M2] = σ2/K (2.74)

σ2 = σw2 + σb2 = within-cluster variance + between-cluster variance. (2.75)

Ec/srswr > 1 if σw2 > (K – 1)σb2 (2.76)

Thus, cluster sampling is more efficient than SRSWR if the within-cluster variance is
larger than (K –1) times the between-cluster variance. Is this likely? This is not likely as
the between-cluster variance will usually be larger than the within-cluster variance due to
within-cluster homogeneity. Cluster sampling is in general less efficient than sampling
of individual units from the point of view of sampling variance. Sampling of individuals
could provide a better cross section of the population than a sample of clusters since units
in a cluster tend to be similar.

Read Sections 8.1, 8.2 and 8.2a, Chapter 8, ibid. pp. 293 – 297,
------------------------------------------------------------------------------------------------------------

2.6.4.8 Multi-Stage Sampling

We noted in the sub-section on cluster sampling that random sampling of units directly is
more efficient than random sampling of clusters of units. But cluster sampling is
operationally convenient. How to get over this dilemma? We may first select a random
sample of clusters of units and thereafter select a random sample of individual units from
the selected clusters. We are thus selecting a sample of units, but from selected clusters of
units. What we are attempting is a two-stage sampling. This can thus be a compromise
between the efficiency of direct sampling of units and the relatively less efficient
sampling of clusters of units. This type of sampling would be more efficient than cluster
sampling but less efficient than direct sampling of individual units. In the sampling
procedure now proposed, the clusters of units are the first stage units (fsu) or the
primary stage units (psu). The individual units constitute the second stage units (ssu) or
the ultimate stage units (usu).
45

This procedure of sampling can also be generalised to multi-stage sampling. Take for
instance a rural household survey. The fsu’s in such a survey may consist of districts, the
ssu’s may be the tehsils or taluks chosen from the districts selected in the first stage, the
third stage units could be the villages selected from the tehsils or taluks selected in the
second stage and the fourth and the ultimate stage units (usu’s) may be the households
selected from the villages selected in the third stage. Such multi-stage sampling
procedures help in utilising such information related to the variable under study as may
be available in choosing the sampling method appropriate at different stages of sampling.
In a multi-stage sampling, estimates of parameters are built up stage by stage. For
instance, in two-stage sampling, estimates of the sample aggregates relating to the fsu’s
are built up from the ssu’s using the sampling method adopted for selecting the ssu’s.
These estimates are then used with the sample probabilities of selection of fsu’s to build
up estimates of the relevant population parameters.

Read
Sections 9.1 and 9.2, Chapter 9, ibid. , pp. 317 – 322.
Chapter 3, NSSO (1997), pp. 12 – 15.

2.7 THE CHOICE OF AN APPROPRIATE SAMPLING METHOD

We have considered a number of random sampling methods in the foregoing sub-


sections. A natural question that arises now is – which method is to be adopted in a given
situation? Let us consider this question, although the answer to it lies scattered across the
foregoing sub-sections. The choice of a sampling design depends on considerations like a
priori information available about the population, the precision of the estimates that a
sampling design can give, operational convenience and cost considerations.

1. When we do not have any a priori information about the nature of the population
variable under study, SRSWR and SRSWOR would be appropriate. Both are
operationally simple. However, SRSWOR is to be preferred, since V(msrswor) <
V(msrswr). This advantage holds only when the sampling fraction is not small, or N
and n are not large.
2. Systematic sampling is operationally even simpler than SRSWR and SRSWOR, but it
should not be used for sampling from populations where periodic or cyclic
trends/variations exist, though this difficulty can be overcome if the period of the
cycle is known. V(msys) can be reduced if the units chosen in the sample are as
heterogeneous as possible. But this will call for a rearrangement of the population
units before sampling.
3. When additional information is available about the variable ‘y’ under study, say, on a
variable (size variable) ‘x’ related to ‘y’, the pps method should be preferred. The
sampling variance of Y* (or m) gets reduced when the probability of selection of
units Pi = ( Xi /N) are proportional to Yi , that is, the size Xi is proportional to Yi or
the variables x and y are linearly related to each other and the regression line passes
through the origin. In such cases pps is more efficient than SRSWR. Further, this
method can be utilised along with other sampling methods and their relative
efficiencies. pps is operationally simple. Pps-wor combines the efficiency of
SRSWOR and the efficiency-enhancing capacity of pps. However, most of the
procedures of selection available, estimators and their variance for pps-wor are
complicated and are not commonly used in practice. This is particularly so in large-
46

scale sample surveys with a small sampling fraction, as in such cases sampling
without replacement does not result in much gain in efficiency. Hence unless the
sample size is small, we should prefer pps-wr.

4. Stratified sampling comes in handy when we wish to get estimates at the level of sub-
populations or regions or groups. This method also gives us the freedom to choose
different sampling methods/designs in different strata as appropriate to the group
(stratum) of the population and the opportunity to utilise available additional
information relating to the stratum. The sampling variance of estimators can also be
brought down by forming the strata in such a way as to ensure homogeneity of units
within individual strata. In fact, the stratum sizes can be so chosen as to minimise the
variance of estimators, when there is a ceiling on the cost of the survey. Stratified
sampling with SRS, SRSWOR or pps-wr presents a set of efficient sampling
designs.

5. Sometimes, sampling of groups of individual units than direct sampling of units


might be found to be operationally convenient. Supposing it is easier to get a
complete frame of clusters of individual units than that of units or, only such a frame,
and not that of the units, is available. (e.g. households are clusters of individuals). In
such circumstances, cluster sampling is adopted. This is in general less efficient than
direct sampling of individual units, as clusters usually consist of homogeneous
units. A compromise between operational convenience and efficiency could be
made by adopting a two-stage sampling design, by selecting a sample of individual
units (second stage units) from sampled clusters (the first stage units). A multi-stage
design would be useful in cases where clusters have to be selected at more than one
stage of sampling.

6. Finally, we can use the technique of independent I-PSS in conjunction with the
chosen sampling design to get at (i) an unbiased estimate of V(m) for any sampling
design or estimator of V(m), however complicated, (ii) a confidence interval for ‘M’
based only on the I-PSS estimates (when the population distribution is symmetrical)
and (iii) a tool for monitoring the quality of work of the field staff and agencies.

7. SRSWOR, stratified sampling with SRSWOR and, when available information


permits, pps-wr and stratified sampling with pps-wr, turn out to be a set of the more
efficient and operationally easy designs to choose from. I-PSS can also be used in
these designs where possible and necessary.

Read also
Sections 14.8 and 14.9, Chapter 14, M.N.Murthy (1967) pp.493 – 497.
------------------------------------------------------------------------------------------------------------

2.8 LET US SUM UP

There are broadly three methods of collecting data. The array of tools used for data
collection by such methods has expanded over time with the advent of modern
technology. Confining data collection efforts to a sample from the population of interest
to the study, inevitably leads to questions like the use of random and non-random
samples. Judgment sampling, convenience sampling, purposive sampling, quota sampling
47

and snowball sampling all belong to the latter group. The absence of a clear relationship
between a non-random sample and the parent universe and the presence of the
researcher’s bias in the selection of the sample render such samples useless for drawing
valid conclusions about the parent population. But these methods are inexpensive and
quick ways of getting a preliminary idea of the universe for use in designing a detailed
enquiry and in exploratory research. Random samples, on the other hand, are free from
such drawbacks and have properties that help in arriving at valid conclusions about the
parent population.

The simplest of the sampling methods – SRSWR - ensures equal chance of selection to
every unit of the population and yields a sample in which one or more units may occur
more than once. ‘msrswr’ is an unbiased estimator of M. Its precision as an estimator of M
increases as the sample size increases. SRSWOR yields a sample of distinct units.
‘msrswor’ is also unbiased for ‘M’. SRSWOR is a more efficient than SRSWR as
V(msrswor) < V(msrswr). But this advantage disappears when the sampling fraction is small
(< 0.05). Both provide an unbiased estimator of V(m). An operationally convenient
procedure - interpenetrating sub-samples (I-PSS) – also provides an unbiased estimator
of V(m) for any sampling design and estimator for V(m), however complicated.

Systematic sampling is a simple and operationally convenient method used in large-


scale surveys that requires only a random start and the sampling interval k = [N/n] for
drawing the sample. A slight variant of ‘m’ is unbiased for ‘M’. Circular systematic
sampling takes care of problems that arise when N/n is not an integer.. An unbiased
estimate of V(m) is not possible but this problem can be tackled easily. Systematic
sampling is not recommended when there is a periodic or cyclic variation in the
population. This problem too can be overcome if the period of the cycle is known.

An example of methods where the probability of selection varies from unit to unit is pps.
The “size” could be the value of a variable related to the study variable. In pps, each yi /
pi , where yi is the value of the study variable associated with the selected unit and pi the
probability of selection of the unit, is an unbiased estimate (Y*) of the population total Y
and [(1/N) Y*] an unbiased estimator of M. As V(Y*) is small if the probabilities Pi are
roughly proportional to Yi , pps sampling is more efficient than SRS if the size variable
x is proportional to y, that is, x and y are linearly related and the regression line passes
through the origin. pps sampling can be done with SRSWR, SRSWOR or systematic
sampling. In pps-srswr, [(1/n) ∑i(yi / pi)], (∑i i = 1 to n), is an unbiased estimator of Y.
This being the mean of n independent unbiased estimates with the same variance V(Y*),
v(Y*) can be derived using the I-PSS technique.

Stratified Sampling is used when (i) estimates are needed for subgroups of a universe or
(ii) the subgroups could be treated as sub-universes. It gives us the freedom to choose the
sampling method as appropriate to each stratum. Estimates of parameters are available
for the sub-universes (strata) and these can then be combined over the strata to get
estimates for the entire universe. SE of estimates based on stratified sampling can be
small if we form the strata in such a way as to minimise intra-strata variance. Each
stratum should thus consist of homogeneous units, as far as possible. Stratum-wise
sample sizes can also so chosen as to minimise the variance of estimators.

Another operationally convenient sampling method, cluster sampling, is to sample


groups of units or clusters of units at random and collect data from all the units of the
48

selected clusters. For example, the household is a cluster of individuals. SRSWR,


SRSWOR, pps or systematic sampling can be used for sampling clusters. Cluster
sampling is, in general, less efficient than direct sampling of units from the point of
view of sampling variance. The question here is one of striking a balance between
operational convenience and cost reduction on the one hand and efficiency of the
sampling design on the other.

We could improve the efficiency of cluster sampling by selecting a random sample of


units from each of the selected clusters - introduce another stage of sampling. This is
two-stage sampling. This would be more efficient than cluster sampling but less
efficient than direct sampling of units. Multi-stage sampling can also be done. Such
designs are commonly used in large-scale surveys as these facilitate the utilisation of
information available and the choice of appropriate sampling designs at different stages.

Thus while non-random sampling methods are useful in exploratory research and
preliminary work on planning of enquiries, random sampling techniques lead to valid
judgments regarding the universe. Among random sampling methods, SRSWOR,
stratified sampling with SRSWOR and, when available information permits, pps-wr
and stratified sampling with pps-wr, turn out to be a set of the more efficient and
practically useful designs to choose from. I-PSS can also be used in these designs
where possible and necessary.

2.9 FURTHER SUGGESTED READINGS.

Current developments in sample survey theory and methods touch upon all the sub-areas
of the subject, namely, data collection and processing, survey design and estimation or
inference. Use of telephones for surveys, where tele-density is high, for selection of
samples and data collection (Random Digit Dialing), tackling non-response through
techniques like split-questionnaires, ordering of questions on sensitive information in the
questionnaire, application of artificial neural network for editing data and imputation,
total survey design approaches that tackle total survey error, the Dual Frame
Methodology to enable small area estimation that is so necessary for regional planning,
sampling on more than one occasion and related issues, replication methods to tackle
measurement error, post stratification (stratification after collection of data), use of
auxiliary information at the time of estimation, resampling methods like jacknife and
bootstrap, methods of estimation of complex functions of parameters like distribution
functions, quantiles, poverty proportion and ordinates of the Lorenz Curve, especially in
the presence of measurement error, and their variances and the related computer packages
are receiving increasing attention of researchers and survey practitioners. The following
references for further reading may be useful for appreciation of these developments.

Sankhya : The Indian Journal of Statistics

Rao J.N.K. (1999): Some Current Trends in Sample Survey Theory and Methods,
(Special Issue on Sample Surveys), Vol. 61, Series B, Part 1, pp. 1 – 57.
Fuller W. A. & Jay Breidt F.(1999):Estimation for Supplemented Panels, ibid, pp.58 –
70.
Shao, Jun & Chen, Yinzhong (1999): Approximate Balanced Half Sample and Related
Replication Methods for Imputed Survey Data, ibid, pp. 197 – 201.
49

Godambe, V.P. (2002): Utilising Survey Framework in Scientific Investigations, (Special


Issue on San Antonio Conference – Selected Papers), Vol. 64, Series A, Part 2, pp. 268 –
289.
Rao, J.N.K; Yung, W; Hidiroglou (2002): Estimating Equations for the Analysis of
Survey Data Using Poststratification Information, ibid, pp.364 –378.
Chaudhuri, Arijit & Saha, Amitava (2004): Extending Sitter’s Mirror-Match Bootstrap to
Cover Rao-Hartley-Cochran Sampling in Two Stages with Simulated Illustration, Vol.
66, Part 4, pp.791 – 802.
Zhu, Min & Wang, You-Gan (2005): Quantile Estimation from Ranked Set Sampling
Data, (Special Issue on Quantile Regression and Related Methods), Vol. 67, Part 2, pp.
295 – 304.
Matei, Alima & Tille, Yves (2005): Maximal and Minimal Sample Coordination, Vol.
67, part 3, pp. 590 – 612.

Journal of the American Statistical Association

Pfeffermann, Danny & Tiller, Richard (2006): Small – Area Estimation with State –
Space Models Subject to Benchmark Constraints, Vol.101, No. 476, pp. 1387 – 1397.
Mach, Lenka; Reiss, Philip T. & Schiopu-Kratina, Ioana (2006): Optimizing the
Expected Overlap of Survey Samples via the North West corner Rule, Vol. 101, No. 476,
pp.1671 – 1679.
Qin, Jing; Shao, Jun & Zhang, Bio (2008): Efficient and Doubly Robust Imputation for
Covariate-Dependent Missing Responses, Vol. 103, No. 482, pp. 797 – 810.

The Canadian Journal of Statistics

Kim J.K. & Park H (2006): Imputation Using Response Probability, Vol. 34, No. 1, pp.
171 – 182.
Kim J.F. & Kim J.J. (2007): Non Response Weighting Adjustment Using Estimated
Response Probability, Vol. 35, No.4, pp. 501 – 514.
:
Books

Krishniah P.R. & Rao C.R. (Eds) (1988)


Skinner, C.J. ; Holt, D. & Smith, T.M.F. (1989): Analysis of Complex Surveys, New
York Wiley.
Ghosh, Malay & Meeden, Glen (1997): Bayesian Methods for Finite Population
Sampling, Chapman and Hall, London.
Mukhopadhyay, Parimal (1998)
Indian Statistical Institute (2003): Report on Audit Sampling, Applied Statistics Unit,
Indian Statistical Institute, Kolkota.

2.10 SOME USEFUL BOOKS & REFERENCES

Burgess, R.G.(ed) (1982): Field Research: A Sourcebook and Field Manual.


(Contemporary Social Research 4), George Allen and
Unwin, London.
Des Raj & Chandok P(1998): Sampling Theory, Narosa Publishing House, New Delhi.
Levin, Richard I. &
Rubin, David S. (1991): Statistics for Mangement, Fifth Edition, Prentice-Hall of
50

India (Private) Limited, M-97 Connaught Circus, New


Delhi – 110001.
Krishniah, P.R. & Rao, C.R.
(Eds.) (1988): Handbook of Statistics – Vol. 6 : Sampling, North Holland,
Amsterdam.
Mukhopadhyay,
Parimal (1998): Theory & Methods of Survey Sampling, Prentice-Hall of
India Pvt. Ltd., New Delhi.
Murthy, M.N. (1967): Sampling Theory and Methods, Statistical Publishing
Society, 204, Barrackpore Trunk Road, Kolkota-700108.
NSSO (1997): Employment and Unemployment in India, 1993-94, National
Sample Survey Organisation, Ministry of Statistics and
Programme Implementation, Sardar Patel Bhavan, New
Delhi – 110001.
Rao, C.R., Mitra, S.K. and
Matthai, A. (1966): Formulae and Tables for Statistical Work, Statistical
Publishing Society, Kolkota – 700108.
Sampath, S. (2005): Sampling Theory & Methods, Second Edition, Narosa
Publishing House, New Delhi, Chennai, Mumbai, Kolkata.
Singh, Kultar (2007): Quantitative Social Research Methods, Sage Publications
(Pvt.) Limited, New Delhi
Singleton Jr., Royce A. &
Straits, Bruce C. (2005): Approaches to Social Research, 4th Edition, Oxford
University Press, New York – Oxford.
Viswanathan, P.K. (2007): Business Statistics – An Applied Orientation, Darling
Kindersely (India) Pvt. Ltd. – licensees of Pearson
Education
in South Asia.

2.11 MODEL QUESTIONS

1. You have been asked to conduct a study to determine the literacy rate in a district.
The choice before you is to adopt a census approach or a random sample survey. How
would you make a choice between the two? What considerations would lead you to a
choice?

2. What tools of data collection would you make use of in the above enquiry and why?

3. What is meant by a ‘parameter’? Define the term ‘statistic’. Give the expressions for
population mean, sample mean and population variance and sample standard
deviation.

4. What is the most important purpose of studying the population on the basis of a
sample? In this context, define the terms ‘estimator’ and ‘estimate’ with a suitable
example.
51

5. Define the term ‘representative sample’. How is ‘random sampling’ principally


different from that of ‘non-random sampling’? What could be the use of the latter
despite its major drawback vis-à-vis the former?

6. Explain what is meant by ‘the sampling distribution of a statistic’ and ‘sampling


variance of a statistic’. What do you mean by an unbiased estimator of a parameter?

7. How does the random sampling procedure help in correcting for the bias of an
estimate? Illustrate this with the help of an example.

8. The sample proportion ‘p’ calculated from a random sample of size ‘n’ may be
considered as normally distributed with mean P and standard deviation √(PQ/n), when
n is sufficiently large. Construct a confidence interval for ‘P’ with a confidence
coefficient 0.99, when n is large.

9. Explain the terms ‘coefficient of variation’ and ‘relative standard error’.

10. A population has 80 units. The relevant variable has a population mean of 8.2 and a
variance of 4.41. Three SRSWR samples of size (i) 16, (ii) 25 and (iii) 49 are drawn
from the population. What is the standard error (SE) of the sample means in the three
samples? Is the extent of reduction in SE commensurate with that of the increase in
sample size?

11. What are the results when the sampling method in drawing the three samples in
problem 10 above is changed to SRSWOR? What is your advice regarding the choice
between increasing the sample size and changing the sampling method from srswr to
srswor?

12. Why is it said that SRSWOR is not a more efficient sampling design from the point of
view of the precision of the sample mean as an estimator of the population mean for
sampling fractions of less than 0.5?

13. Indicate whether the following statements are true (T) or false (F). If false, what is the
correct position?

(i) The standard error of the sample mean decreases in direct proportion to the
sample size.
(ii) SRSWOR method of sampling is more advantageous than srswr for a
sampling fraction of 0.02.
(iii) If Y* = Nm and Variance of m is V(m), the variance of Y* is NV(m).

14. What should be the size of the SRSWR sample to be selected if the coefficient of
variation of the sample mean should be 0.2? The population coefficient of variation is
known to be 0.8. What will be the sample size if we decide to adopt SRSWOR
sampling method?

15. Why would you recommend the use of the technique of interpenetrating sub-samples
in random sampling?
52

16. Four estimates of the population mean M obtained from four independent I-Pss
samples of equal size are, 20, 18, 23 and 28. Obtain an unbiased estimate of the
sampling variance of the sampling mean. Assuming that the population is normally
distributed, compute a confidence interval for M. What is the confidence coefficient
for this confidence interval? Do these results depend on the sample size/

17. A systematic sample of size 18 has to be selected from a population of 124. What
problems do you face in selecting the sample? Is the sample mean the unbiased
estimator of the population mean M? How do you overcome these problems?

18. Is the sample mean ‘m’ always an unbiased estimator of the population mean ‘M’ in
systematic sampling? If not when? What then is an unbiased estimator of M in cases
where ‘m’ is not an unbiased estimator of ‘M’? What is the bias in using the sample
mean ‘m’ as the estimator of ‘M’ in these situations? Show that this bias is likely to
be small for systematic samples from large populations.

19. What is the sampling variance of msys? what steps can be taken to reduce V(msys)?

20. When will systematic sampling be more efficient than (i) SRSWR ; (ii) SRSWOR ?

22. “It is not possible to get v(msys)” What are the reasons for this situation in the case of
systematic sampling? How is this problem overcome and how then can we get
v(msys)?

23. What are the situations in which systematic sampling should not be adopted? What
information is needed in such situations to use systematic sampling and how would
you use such information?

24. When is pps method adopted?

25. When will the sampling variance of Ypps* be small?

26. Say true (T) or false (F):

(a) pps and stratified sampling can be combined with other sampling methods.

(b) V(mst) is reduced by ensuring that units within individual strata are heterogeneous.

(c) The size of a stratified sample can be allocated among the strata in proportion to the
size of the strata, the size being the number of population units in a stratum.

(d) Systematic sampling is a kind of stratified sampling.

27. We wish to study the wage levels of factory labour. What type of sampling method
would you adopt for the study and why if (a) just a list of factories is available with
the Chief Inspector of Factories of different State Governments, (b) if the list in (a)
above also gives the total number of employees in the individual factories at the end
of last year and (c) the list also indicates both the kind of product manufactured in the
factory along with the information specified in (b) above.
53

28. Show that cluster sampling is less efficient than direct sampling of individuals. Why
does this happen?

29. How does two-stage sampling improve upon cluster sampling?

30. Is it correct to say that stratified sampling is a kind of multistage sampling? Why?

31. Why are multi-stage designs useful in large scale surveys?


54

BLOCK 3 QUANTITATIVE METHODS: DATA ANALYSIS

Structure

3.0 Objectives
3.1 Introduction
3.2 An overview of the Block
3.3 Important Steps involved in an Econometric Study
3.4 Two Variable Regression Model
3.4.1 Estimation of Parameters
3.4.2 Goodness of Fit
3.4.3 Functional Forms of Regression Model
3.4.4 Hypothesis Testing
3.5 Multi-Variable Regression Model
3.5.1 Regression Model with two explanatory variables
3.5.1.1 Estimation of Parameter: Ordinary Least Squares Approach
3.5.1.2 Variance and Standard Errors
3.5.1.3 Interpretation of Regression Coefficients
3.5.2 Goodness of Fit: Multiple Coefficient of Determination (R2)
3.5.3 Analysis of Variance
3.5.4 Inclusion and Exclusion of Explanatory Variables
3.5.5 Generalisation to N-Explanatory Variables
3.5.6 Problem of Multicollinearity
3.5.7 Problem of Heteroscedasticity
3.5.8 Problem of Autocorrelation
3.6 Further Suggested Readings
3.7 Model Questions

3.0 OBJECTIVES

The main objectives of this block are to:

• have an overview of the basic steps involved in conducting an empirical study,

• know the issue of linearity in regression model and appreciate its probabilistic
nature,

• estimate the unknown population regression parameters with the help of the
sample information. Explain the concept of goodness of fit and use the various
functional forms in the estimate of regression model,

• conduct some tests of hypothesis regarding unknown population regression


parameters estimate and interpret the regression model by introducing first, one
additional explanatory variable and then extending further to n explanatory
variables, and

• tackle the problem of auto-correlation, heteroscedasticity and multicollinearity in


the multiple regression analysis.
55

3.1 INTRODUCTION

As a researcher, you may be tempted to examine whether the economic laws hold good in
the real world situation reflected in the displayed pattern of the relevant data. Similarly
as professional economist in government or private sector you may be interested in
estimating the demand or supply of various products and services or to know the effect of
various levels of advertisement expenditure on sales and profits. As macro economist,
you may like to measure and evaluate the impact of various government policies say
monetary and fiscal policies on important variables such as employment, unemployment,
income, imports and exports, interest rates, inflation rates etc. As stock market analyst,
you may seek to relate the prices of a stock to the characteristics of the company issuing
the stock and the overall state of the economy. Such types of issues are investigated by
employing various statistical techniques. Regression Modeling is one of the primary
statistical tools employed for conducting such type of research studies.

3.2 AN OVERVIEW OF THE BLOCK

The empirical research in economics is concerned with the statistical analysis of


economic relations. Often, these relations are expressed in the form of regression
equations involving dependent variable and independent variables. The formulation of an
economic relation in the form of a regression equation is called a regression model in
econometrics. The major propose of a regression model is the estimation of its
parameters in two variable and multivariable situations and testing the hypotheses. In
this block you will be able to find an overview of the basic steps involved in conducting
an empirical study, the estimation of parameters in two variable and multivariable
situation by applying the ordinary least square approach, the various functional forms of
regression models, hypothesis testing, the consequences of the violation of the basic
assumptions in brief i.e. multicollinearity, heteroscedasticity and autocorrelation and,
how to tackle these problems.

3.3 IMPORTANT STEPS INVOLVED IN AN ECONOMETRIC STUDY

The basic steps followed in conducting a regression model based empirical study are: (1)
based on the knowledge of economic theory, past experience or other studies,
specification of a model, (2) gathering the data, (3) estimating the model, (4) subjecting
the model to hypothesis testing, and interpreting the results.

Specifying a Model:
In economics, like other physical sciences, the model (logical structure of system) is set
up in the form of equations, which precisely describe the behavior of economic and
related variables. The model may consist of a single equation or several equations. In
the specification of single equation, the behavior of a single variable (denoted by Y) is
explained. Placed on the left hand side, Y is referred to by a number of names like
dependent variable, regressand or explained variable. On the right hand side a number of
variables that influence the dependent variable are identified (denoted by Xs). These are
referred to as independent variables, exogenous variables, explanatory variables or
regressors. If a single equation model consists of one independent variable, it is referred
to as two variable regression model. In case, the investigator identifies more than one
56

independent variable to explain the behavior of Y, it is referred to as a multi-variable


regression model.
In simultaneous equation models, the behavior of more than one dependent variable is
studied and accordingly several equations are specified together.

Gathering the data:

In order to estimate the econometric model, data on dependent and independent variables
are needed. As we have studied in the last block (Block-2) of this course, data can be
collected by way of experimental, sample survey or observational method. If you aim to
explain the variation of the dependent variable over a period of time, you will be required
to obtain observations at different points of time (referred to as time series data). The
periodicity may be annual, quarterly, monthly or weekly depending on the need and
requirement. If we want to analyse the characteristics of a dependent variable at a given
point of time, we need the cross section data i.e. the observations for a variable for
different units at the same point of time. In pooled data, we have time series observations
for various cross sectional units. Here we combine the element of time series with that of
cross section.

Estimating the Model:

After formulation of the model and gathering the data, the next step before us comes to
estimate the unknown parameters of the model like the intercept term α , the slope term β
etc. We shall discuss this issue in the next section.

We have already seen that the formulation of the basic model is guided by economic
theory, investigator’s perception of the underlying behavior and past experience or
studies. Consequently, the specified model is subjected to variety of diagnostic tests for
ascertaining whether the underlying assumptions and estimation methods are appropriate
for the data.

The final stage of the empirical investigation is to interpret the results. If the chosen
model does not refute the hypothesis or theory under consideration, we may use it to
predict the future value of the dependent variable Y on the basis of known or expected
future value of the explanatory variables.

Read Introductory Econometrics with applications, fifth edition (2002) by Ramu


Ramanathan Chapter 1 (PP2-15) and Chapter 14 (PP 568-579) --------- Learning India(P)
Ltd., New Delhi

3.4 TWO VARIABLE REGRESSION MODEL

3.4.1 Estimation of Parameters: The Ordinary Least Squares Approach

(i) The first step is to specify the relationship between X and Y variable. Assuming
that there is a linear relationship between the two variables, we can specify the model as
Y = α + βX + U . By linearity, we often mean a relationship in which, the dependent
variable is a linear function of the independent variable. However, the linearity in the
57

context of regression analysis can be interpreted in two different ways: linearity in the
variable, and linearity in the parameters. Since the main purpose of regression analysis is
the estimation of its parameters, we shall consider only those models, which are linear in
parameter, no matter whether they are linear in variable or not. In fact, models that are
non-linear in variables but linear in parameters can be easily estimated by extending the
basic procedure that we are discussing here.

By the very nature of social science, the relationship among different variables cannot be
expected exact or deterministic. Hence the dependent variable Y tends to be probabilistic
or stochastic in nature. That is why; we specify the regression model by incorporating a
random or stochastic variable. The random variable U is also called the disturbance or
error term and is a sort of catch-all variable that represents all kinds of indeterminacies of
an inexact relationship.

Y= α + β X + U (3.1)

The above equation is called the population regression function. In this formulation, Y is
a stochastic or random, but X is non-stochastic or deterministic in nature. This
asymmetry in the treatment of the dependent and the independent variable can be
removed by making both Y and X stochastic in nature. However, such a model is beyond
the scope of the present discussion. It should be clear that a random variable like U is
introduced in the population regression function to incorporate the element of
randomness of a statistical relationship.

We have an unknown bivariate population and hence to estimate the population


regression function, we require the sample observations. Accordingly, the concept of
sample regression function comes into the picture. The sample regression function is
quite similar to that of the population regression function and can be presented as

∧ ∧ ∧ ∧
Y = a+ β X + U (3.2)

∧ ∧ ∧ ∧
Here Y , α , β and U are interpreted as the sample estimates for their corresponding
unknown population counterparts. Thus, we hypothesize that corresponding to the linear
population function Y = α + βX + U , there is a linear sample regression function given by
Yˆ = αˆ + βˆX + Uˆ .

Assumptions of Classical Regression Model :

i) The disturbance term U has a zero mean for all the values of X i.e. E (U) = 0
ii) The variance is constant for all the values of X, i.e. V(X) = σ 2
iii) The disturbance term for two different values of X are independent i.e.
Cov (Ui, Uj) = 0 for i≠j
iv) X is non-stochastic
v) The model is linear
58

Estimation of Sample regression function :

The philosophy behind the least square method is that we should fit in a straight line
through the scatter plot in such a manner that the vertical difference between the observed
values of Y and the corresponding values obtained from the straight line for the different
values, called errors, are minimum. In others words, we should choose our α and β in
such a manner that sum of the squares of the vertical differences between the actual
values or observed values of Y and the one obtained from the straight line is minimum.
The straight line that we obtain is called the line of best fit. Mathematically:
∧ ∧ ∧ ∧ ∧
Minimize ∑ U 2 = ∑(Y − Y ) 2 = E (Y − α − β X ) 2 with respect to α and β . By following
the usual minimization procedure, we obtain the so called normal equations. The two
normal equations are then simultaneously solved for


β=
∑ xy (3.3)
∑x 2

and

α = y − βˆ x (3.4)

∧ ∧
The least square estimators α and β are taken as the estimators of the unknown
population parameters α and β because they satisfy the following desirable properties.
(1) Least square estimators are linear
∧ ∧
(2) Least square estimators are unbiased i.e. E ( α ) = α and E( β ) = β .

(3) Among all the linear unbiased estimators, least square estimators have the
minimum variance and are therefore termed as efficient estimators.

All these properties of the least square estimators lead to Gauss-Markov theorem.
In other words, the least square estimators are the best linear unbiased estimators
i.e. BLUE

Standard Error of the Regression Estimate:

The standard errors of the estimates also known as standard deviations of the sampling
distributions of least square estimates are taken as a measure of the precision of these
estimates. These are obtained by taking the positive square root of the variances
∧ ∧
of α and β . The expressions for both the variance and standard error of the least square
estimators are given below:
59


Var (α ) =
∑X 2

σ 2
(3.5)

Se(α ) =
∑X 2

σ2 (3.6)
n∑ ( X − X 2
) n∑ ( X − X 2
)

∧ σ2 σ
Var ( β ) = (3.7) Se( βˆ ) = (3.8)
∑(X − X 2
) ∑(X − X 2 )
Here an unbiased estimator of σ is
∑ Uˆ 2

(3.9)
n−2

(n-2 is the degree of freedom)

Replacing σ by its unbiased estimator, we can compute the standard errors of


∧ ∧
both α and β and can write the unbiased estimator of σ as


σ=
∑Uˆ 2

=
∑ (U − Uˆ ) 2

(3.10)
n−2 n−2

3.4.2 Goodness of Fit Once the regression line is fitted, we may be interested to
know how faithfully the sample regression line describes the unknown population
regression line. The regression error term or residual Û plays an important role in this
regard. Small quantities of residuals imply that a large proportion of variation in the
dependent variable has been explained by the regression equation and hence the fit is
good. Similarity, large quantities of residuals obviously point to the poor fit. The
coefficient of determination (The square of the correlation coefficient i.e. R2) acts as a
measure of goodness of fit.

Example: Given the following estimated regression model, interpret the results:


Y = −14.0217 + 0.965217 X R 2 = 0.989345 (3.11)

Se = (7.9382644) (0.0354118) Degree of freedom = 8

Where Y = average employment, X=Level of labour force



The slope coefficient β =0.965217 estimates the rate of change of employment
with respect to labour force. If 100 more persons start searching jobs, about 97

percent get actually employed. The intercept α =− 14.017 indicates the average
combined effect of all those variables that might affect employment but have been
omitted for the purpose of the above mentioned regression. The coefficient of
regression R2 = 0.989345 indicates that 99 percent of variation can be explained
by a variation in the labour force which is indeed very high and is a good fit for
the given sample.
60

3.4.3 Functional Forms of Regression Model: As we have discussed above, linear


in parameter regression model is relevant for us. Linear in parameter regression models
have following functional forms.

S. Model Equation ⎛ dy ⎞ ⎛ dy x ⎞
Slope ⎜ = ⎟ Elasticity ⎜⎜ = . ⎟⎟
⎝ dx ⎠ ⎝ dx y ⎠
No.

1. Linear Y = β1 + β 2 X β2 ⎛X⎞
β2 ⎜ ⎟
⎝Y ⎠

2. Log-Linear Ln Y = β1 + β 2 ln X β 2 .Y X β2

3. Log-Lin Ln Y = β 1 + β 2 X β 2 .Y β 2 .X

4. Lin-log Y = β1 + β 2 ln X 1 β 2 . 1Y
β2.
X

5. Reciprocal 1 1 1
Y = β1 + β 2 . - β2. - β2.
X X2 XY

6. Log Reciprocal 1 Y 1
Ln Y = β 1 − β 2 . β2. β2.
X X2 X

Choice of Functional Form A great deal of skill and experience are required in
choosing an appropriate model for empirical estimation. However, following guidelines
can be helpful in this regard:
(i) The underlying theory may suggest a particular functional form.
(ii) The knowledge of above formula will be helpful to compare the various
models.
(iii) The coefficients of the model chosen should satisfy certain a priory
expectation.
(iv) One should not overemphasize the r2 measure in the sense that higher the
r2 , the better the model. The theoretical underpinnings of the chosen
model, the signs of the estimated coefficients and their statistical
significance are of more importance in this regard.
Read Basic Econometrics (Fourth Edition) by Damodar N. Gujrati and Sangeeta, Tata
Mcgraw Hill Publishing Company Ltd. Delhi (2007 edition), Chapter 3,5 and 6 (PP60-
108, 169-196)

3.4.4 Hypothesis Testing: To examine whether the unknown parameter


α or β assumes a particular value or not is known as hypothesis testing in statistics.
Although we may test some hypothesis about intercept α , our main concern in
61

regression model is the slope coefficient β . Hypothesis testing consists of three basic
steps:
(i) Formulating two opposing hypothesis:

H0 : β = 0
H1 : β ≠ 0

(ii) Deriving a test statistic and its statistical distribution under the null hypothesis
which is conventionally denoted by t. Thus
βˆ − E ( βˆ )
t=
s.e.( βˆ )

The t statistic obtained above has n-2 degrees of freedom because we are

estimating two parameters α and β .

(iii) Deriving a decision rule for rejecting or accepting a null hypothesis: The
following steps are involved in this process:
(a) H 0 : β = β 0 , H1 : β ≠ β 0

βˆ − βˆ 0
(b) the test statistic is t = and can be calculated from the sample
s.e.( βˆ )
information. Under the null hypothesis, it has the t distribution with n-2
degree of freedom. If the modulus of t is large, we would suspect that β
is probably not equal to β 0 .
(c) In the t table, trace the critical value of t for n-2 d.f. at the desired level of
significance (say a)
(d) Reject H 0 if t c > t ⊗ (t c = computed t valueand t ⊗ critical t value recorded
in the t table)
If t c < t ⊗ accept H 0 .

Example : Given the GDP at factor cost and final consumption expenditure for the
Indian economy during the period 1980-2001 at 1993-94 prices, we run
the regression model of the final consumption expenditure and got the
results through SPSS software as under:
FCE = α + β GDP

FCE = 108206.4 + 0.719674 GDP

s.e. = (233.203) (0.007865)

t = (17.35968) (91.50314)

R2 = 0.997617 d.f. = 20
62

H 0 : β = 0.80
Null hypothesis
H 1 : β ≠ 0.80

t statistics in this case is given by

βˆ − E ( βˆ ) 0.719676 − 0.80 − 0.080324


t= = = = 10.212841
s.e.( βˆ ) 0.007865 0.007865

This computed value of t statistics (10.212841) exceeds the critical values of 2.845 and
2.086 for a degree of freedom of 20 at 1% and 5% levels of significance respectively.
Thus on the basis of sample information, the difference between estimated value of β
and its mean is so much that even in 1 out of 100 cases or 5 out of 100 cases we do not
obtain such a difference. Thus on the basis of the sample information, we are not in a
position to accept the null hypothesis. Hence in all probability during the sample period
of 1980-2001, India’s marginal propensity to consume has not been as high as 80%. In
this example, we considered two tailed test. Similarly one tailed test can also be
conducted. It all depends upon the type of enquiry that we intend to conduct.
The t test discussed above is an example of a small sample test. However, if the sample is
sufficiently large, then by virtue of central limit theorem, the distribution of the test
statistic discussed above approximately follows the standard normal distribution.
Accordingly, the entire test can be conducted by consulting the standard normal table
instead of the t table and in this case one need not bother about the degrees of freedom.
For deciding about whether a sample is sufficiently large or not one has to consider the
size of the sample (n). A rule of thumb is, if n is 30 or more, the sample can considered to
be a large sample, otherwise, it is to be taken as a small sample.

3.5 MULI-VARIABLE REGRESSION MODELS

We shall extend the regression analysis further to make it more realistic and
comprehensive by
(i) introducing one more explanatory variable and re-examine the model,
(ii) interpreting the partial regression coefficients,
(iii) considering how many explanatory variables must be included in the
model and what should be the touch stone for arriving at such decision.
(iv) discussing the conditions or assumptions which make these extensions and
generalizations possible.
(v) examining the possible effects of violations of one or more assumptions,
particularly, multicollinearity, heteroscedasticity and autocorrelation

3.5.1 Regression Model with two explanatory variables:

(The following matter has been adapted from Unit 10, Block 3 of MEC-009 course)

For the simplicity and better comprehension, we shall write the model Y= β 0 + β1 X 1
(3.1) in a different form to add more explanatory variables like X2, X3 with their
63

respective coefficients as β 2 , β 3 etc. Thus, a model with two explanatory variables in


stochastic form can be written as:
Y= β 0 + β1 X 1 + β 2 X 2 + U (3.12)
= E(Y) + U (3.13)

Using subscript t with Y, X1, X2 and U to denote the tth observation of these variables, the
above equation can be written as

Yt = β 0 + β 1 X 1t + β 2 X 2t + U t

3.5.1.1 Estimation of Parameter: Ordinary Least Squares Approach

We collect the sample observations on Y, X1 and X2 and write down the sample
regression function as Yt = b0 + b1 X 1t + X 2t + et …………… (3.14) where b0, b1 and
b2 replace the corresponding population parameters β 0 , β 2 and β 3 and random population
component Ut is replaced by sample error term et. By applying the principle of ordinary
least squares, we work out the values of b0, b1 and b2 such that residual sum of squares
(∑ et2 ) is minimum. Here
et = Yt − b0 − b1 X 1t − b2 X 2t (3.15)

∑e = ∑[Yt − b0 − b1 X 1t − b2 X 2t ]
2 2
t (3.16)

Differenting 3.16 w.r.t. b0, b1, b2 and equating to 0 gives us the three normal equations.

Yt = b0 + b1 X 1t + b2 X 2t (3.17)
∑Y X t 1t = b0 ∑ X 1t + b1 ∑ X + b2 ∑ X 1t X 2t
2
1t (3.18)
∑Y X t 2t = b0 ∑ X 2t + b1 ∑ X 1t X 2t + b2 ∑ X 22t (3.19)

These three equations give us the following expressions for b0, b1, and b2 respectively:
b0 = Y − b1 X 1 − b2 X 2 (3.20)

b1 =
(∑ yt x1t )(∑ x22t ) − (∑ yt x2t )(∑ x1t x2t ) (3.21)
(∑ x12t )(∑ x22t ) − (∑ x1t x2t )2

(∑ yt x2t )(∑ x12t ) − (∑ yt x1t )(∑ x1t x2t )


b2 = (3.22)
(∑ x12t )(∑ x22t ) − (∑ x1t x2t )2
The lower case letters denote, as usual, the deviations from the respective means:

y t = ( yt − Y ), x1t = ( X 1t − X ), and x 2t = ( X 2t − X 2 )
64

3.5.1.2 Variance and Standard Errors

⎡ 1 X 2 ∑ x 22t + X 22 ∑ x12t − 2 X 1 X 2 ∑ x1t x 2t ⎤ 2


Var (b0 ) = ⎢ + 1 ⎥σ (3.23)
∑ x12t ∑ x 22t − (∑ x1t x 2t )
2
⎣⎢ n ⎦⎥

SE (b0 ) = var(b0 )
∑ x22t
Var (b1 ) = .σ 2 (3.24)
∑ x1t ∑ x2t − (∑ x1t x2t )
2 2 2

SE (b1 ) = var(b1 )
∑ x12t
Var (b2 ) = .σ 2 (3.25)
∑ x1t ∑ x2t − (∑ x1t x2t )
2 2 2

SE (b2 ) = var(b2 )
where σ 2 is unknown and its unbiased OLS estimator σˆ 2 is obtained. Thus, σˆ 2 is worked out as
∑ et2
σˆ 2 = where n - 3 stands for degree of freedom (3.26)
n−3

3.5.1.3 Interpretation of Regression Co-efficients

Mathematically, b1 and b2 represent the partial slopes of regression plane with respect to
X1, and X2 respectively. In other words, b1 shows the rate of change in Y as X1 alone
undergoes a unit change, keeping all other things constant. Similarly, b2 represents rate
of change of Y as X2 alone changes by a unit while other things are held constant.

3.5.2 Goodness of fit: Multiple Coefficient of Detomination: R2

We have seen above that in case of single independent variable, r2 measures the goodness
of fit of the fitted sample regression line. When we have two explanatory variable X1,
and X2, we might be interested in the the proportion of total variation in
Y = ∑ yt2 explained by X1, and X2 jointly. This information is conveyed by multiple
coefficient determination, denoted by R2. It can be computed by the following formula.
b1 ∑ y t x1t + b2 ∑ y t x 2t
R2 = (3.27)
∑ y t2

R2 lies between 0 and 1 and closer it is to 1, better is the fit which implies that the
estimated regression line is capable of explaining greater proportion of variation in Y.
The positive square root of R2 is called coefficient of multiple correlation.

3.5.3 Analysis of Variance (ANOVA) In the context of regression, a study of the


components of total sum of squares (TSS) is called analysis of variance. We know the
relationship: TSS = ESS + RSS
65

Where ESS = Explained sum of squares, and RSS is Residual sum of squares.

This is equivalent to saying:


∑ y t2 b1 ∑ y t x1t + b2 y t x 2t ∑ et2
= + (3.28)
TSS ESS RSS

It should be noted that every sum of squares has some degree of freedom (df) associated
with it. Accordingly, in our 2-explanatory variable case, the degree of freedom will be

TSS=n-1, RSS = n-3, ESS=2

One may be interested in testing a null hypothesis.


H0 : β 1 = β 2 = 0

ESS / df
In such a case, we find that is the ratio of the variance explained by X1, and X2
RSS / df
to unexplained variance and it follows F distribution with 2 and n-3 degrees of freedom.
In general, if a regression equation estimates ‘K’ parameters including the intercept, then
F has (K-1) df in numerator and (n-k) df in the denominator.
F values can be expressed in terms of R2 as under:
R 2 /( K − 1)
F= (3.29)
(1 − R 2 ) /( n − k )

Interpretation : larger the variance explained by fitted regression line, larger the
numerator will be in relation to the denominator. Thus, a larger F value is evidence
against the truthfulness of H0: β 1 = β 2 = 0 . Thus in the case of an F value larger than 1,
one cannot accept the hypothesis that the variables X1, and X2, taken together, do not
have any effect on Y.

Read Basic econometrics (fourth edition) by Damodar N. Gujrati and Sangeeta Chapter
8 PP 253-265

3.5.4 Inclusion and Exclusion of Explanatory Variables

As we add more and more explanatory variables Xs, the explained sum of squares (ESS)
keeps on rising and, consequently, R2 goes on rising. However, each additional variable
that is added eats up one degree of freedom and our definition of R2 makes no allowance
for this loss of degree of freedom. Thus, the philosophy of improving the goodness of fit
by sensibly increasing the number of explanatory variables may not be justified. We
know that TSS always has (n-1) degree of freedom. Therefore, comparing two regression
models with same dependent variable but different number of independent variables will
not be justified also. Hence we must adjust our measure of goodness of fit for degrees of
freedom. This measure is called adjusted R 2 , denoted by R 2 . It can be derived from R2
in the following manner:
66

n −1
R 2 = 1 − (1 − R 2 ) (3.30)
n−k

Therefore, it is recommended that, we must include new variables only if (upon


inclusion) R 2 increases and not otherwise. A general guide is provided by‘t’ statistic, if
the absolute value of the coefficient of added variable is greater than one, retain it. (Let
us remember that‘t’ value is calculated under the hypothesis that population value of that
coefficient is zero). We should note here that besides R2 and adjusted R2, there are other
criteria also for judging the goodness of fit like Akaike’s Information criterion and
Amemia’s Prediction criteria. However, a description of such criteria is beyond the scope
of the present discussion.

3.5.5 Generalisation To N-Explanatory Variables

In general, our regression model may have a large number of independent variables. Each
of those variables can, on priory grounds, be expected to have some influence over the
‘dependent’ or ‘explained’ variable. Consider a very simple example. What can be
possible determinants of demand for potatoes in a vegetables market? One obvious
choice will be the price of potatoes. What else can affect the quantity demanded? Could it
be availability of vegetables which can be paired off with potatoes? In that case, prices of
a large number of vegetables which are cooked along with potatoes will become ‘relevant
explanatory variables’. You cannot ignore income of the community that patronizes the
particular market. Needless to say, the dietary preferences of members of the households
can also affect the demand and so on. In the next part, we shall discuss techniques which
help us restrict the analysis to a selected ‘few variables, though theoretical considerations
may find a huge number of them to be ‘useful’ and ‘powerful’ determinants. In fact, in
economic theory, we usually append the phrase ceteris paribus, with many a statements.
This phrase means keeping all other things constant. That means, we may focus on
impact of only a few selected variables on the dependent variable while assuming that all
other variables remain ‘unchanged’ during the period of analysis. However, before taking
recourse to this assumption; we have to weigh the need to include more and more
variables in our model with the ‘gains’ in explanatory power of the model. We have
developed, in previous section 3.54 a working touchstone for inclusion of more variables
in terms of improvement in R 2 and have tried to give it a ‘practical’ shape in the form of
the magnitude of ‘t’ values of the relevant slope parameters.

With these considerations in mind, we can generalise the linear regression model as
follows:
We hypothesize that in population, the dependent variable Y depends upon k explanatory
variables, X1, X2, ………..Xk. We also assume that the relationship is linear in
parameters. Three more assumptions are made and they have very significant bearing on
the analysis. These are:
a) Absence of multicollinearity;

b) Absence of heteroscedasticity; and

c) Absence of autocorrelation
67

We will discuss the complications, which arise because of violations of these


assumptions in section 3.5.6, 3.5.7, and 3.5.8 respectively. We can present the Classical
Linear General Regression Model in k explanatory variables in the following fashion:

Yt = β1 X 1t + β 2 X 2t + """"" + β k X kt t = 1" n (3.31)


In this model, we have omitted the constant intercept term to facilitate the exposition.
From the model it is clear that Y and each X have n values (t=1………….n), forming n
(k+1)-tuples like (Y11, X11, X21, ……………..Xk1) and so on of 1 dependent variable and
k explanatory variables.
We can write an elaborate system of n equations for the n values of the dependent
variable Y in terms of k explanatory Xs very conveniently in the matrix equation form:

Y = Xβ+U (3.32)

⎡Y1 ⎤ ⎡β1 ⎤ ⎡U 1 ⎤
Where ⎢ ⎥ ⎢ ⎥
Y = ⎢# ⎥, β = ⎢# ⎥ and U = ⎢⎢# ⎥⎥
⎢⎣Yn ⎥⎦ ⎢⎣ β k ⎥⎦ ⎢⎣U n ⎥⎦

⎡ X 11 , X 21 , X 31 , """"" X k1 ⎤
⎢ X , X , X , """"" X ⎥
⎢ 12 22 32 k2 ⎥

also X = ⎢# ⎥
⎢ ⎥
⎢# ⎥
⎢ X 1n , X 2 n , X 3n , """"" X kn ⎥
⎣ ⎦

We assume that
(1) Expected values of error terms are equal to zero; that is E(ui) = 0 for all ‘i’. In matrix
notation
⎡0 ⎤
⎡E(u i ) ⎤ ⎢ ⎥
E (U ) = ⎢⎢# ⎥ = ⎢0 ⎥ = 0
⎥ ⎢# ⎥
⎢⎣ E (u n )⎥⎦ ⎢ ⎥
⎣0 ⎦
2) The error terms are not correlated with one another and they all have same variance for
σ 2 all sets values of the variables X. That is,
E (u i u j ) = 0; ∀i ≠ j and
E (u i2 ) = σ 2 ; ∀i
in matrix notation:
⎡ E (u12 ), E (u1u2 )""""" E (u1un ) ⎤ ⎡σ 2 0 0 0⎤
⎢ ⎥ ⎢ ⎥
⎢ E (u1u2 ), E (u22 )""""" E (u2un ) ⎥ ⎢ 0 σ 2 σ 0⎥
E ⎡⎣UU ⎤⎦ = ⎢
'
⎥ = = σ 2 In
# ⎢ 0 0 σ 2
0 ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ E (u1un ), E (u2un )"""" E (un ) ⎥⎦ ⎣ 0
2
0 0 σ2⎦
68

3) The explanatory X1……………..Xk variables are non-random (i.e., non-stochastic)


variables.

4) The matrix X has full column rank equal to k. This means, it has k linearly
independent columns. It implies that the number of observations exceeds number of co-
efficient to be estimated (n > k). It also implies that there is no exact linear relationship
among the X variables. This in fact is the assumption of the absence of multicollinearity.

Note: Assumptions that E(uiuj) = 0 means that error terms are not correlated. The
implication of diagonal terms in matrix E(UU′) being all equal to σ 2 is that all error
terms have the same variance, σ 2 . This is also called the assumption of homo-
scedasticity.
We can write the regression relation for the sample as:
e = Y – Xb

where e, Y, X and b are appropriate matrices.

Sum of squared residuals will be

φ = ∑ et2 = ∑(Yt − b1 X 1t """ + bk X kt ) 2 t = 1" n (3.33)



= e′e = [Y − Xb] [Y − Xb] (in matrix form)
= Y 1Y − 2b ′X ′Y + b ′X ′Xb
′ ' ′
Note: b X Y is a scalar and therefore equal to its transpose Y Xb.

By equating 1st order partials of φ w.r.t each bi (i = 1…k), to zero, we get k normal
equations. This set of equations in matrix form is:
∂φ
= −2 X ′Y + 2 X ′Xb = 0 (3.34)
∂b
X ′Xb = X ′Y (3.35)

when X has rank equal to k, the normal equation 3.35 will have a unique solution and
least squares estimator b is equal to:
b = [ X ′X ] [ X ′Y ]
−1
(3.36)

We have assumed b to be estimator for β and thus E(b) = β , therefore we can rewrite
3.36 as

b = [ X ′X ] X ′′[Xβ + u ]
−1

= [ X ′X ] X ′Xβ + [X ′X ] X ′U
−1 −1

[
= β + X 1 X X 1U ]
1

∴ E (b) = E ( β ) + E X 1 X [[ ]
−1
]
X 1 E ( μ ) = E ( B) + [X ′X ]. X ′E ( μ )

69

Variance of b = σ 2 ( X ′X ) −1
Notes

1. In this course our objective is simply to introduce the concepts. You will find the
concepts at much more rigorous level in the courses on Basic Econometrics (REC
003) – included as compulsory course of M.Phil/Ph.D. Programme in Economics.

2. The other ideas regarding coefficient of determination R2 and adjusted R2 remain the
same as they were developed for two explanatory variable case.

Now we can safely turn to discussions about non-satisfaction or violation of assumptions.

3.5.6 Problem of Multicollinearity

Many a times X variables may be found to have some other linear relationships among
themselves. This vitiates our classical regression model.

Let us illustrate it with help of our 2 explanatory variable model.


Yi = β 0 + β1 X 1i + β 2 X 2i + U 1

Let us give specific names to variables, say, X1 is price of commodity Y and X2 is family
income. We expect β1 to be negative and β2 to be positive. Now we go one step further.
Let Y be demand for milk, X1 be price of milk and suppose, the family wise demand for
milk is being estimated for a family, which also produces and sells milk, Clearly, larger
the value of X1, higher the magnitude of X2 will be.

In such situations, the estimation of price and income co-efficients will not be possible.
Recall, we wanted variables X in our matrix equations to be linearly independent. If that
conditions is not satisfied X matrix becomes singular, that is, its determinant will be
equal to zero. Thus, there will be no solution to the normal equation 3.34 (or 3.35).
However, if co-linearity is not perfect, we can still get OLS estimates and they remain the
best linear unbiased estimates (BLUE) – though one or more partial regression co-
efficient may turn out to be individually insignificant.
Not only this, the OLS estimates still retain the property of minimum variance. Further, it
is found that multi-co-linearity is essentially a sample regression problem. The X
variables may not be linearly related in population but some of our suppositions while
drawing a sample may create a situation of multiple linear relations in the sample.

The practical consequences of multicollinearity. Gujarati, (D.N.) has listed the


following consequences of multiplicity of linear relationships:

1. Large variances /SEs of OLS estimates


2. Wider confidence intervals
3. Insignificant ‘t’ ratios for β parameters
4. A high R2 despite few significant t values
5. Instability of OLS estimators: The estimators and their standard errors (SEs)
become very sensitive to small changes in data.
6. Sometimes, even signs of some of the regressions may turn out to be theoretically
unacceptable like a rise in income having negative impact on demand for milk.
70

7. When many regressions have insignificant coefficients, their individual


contributions to the explained sum of squares cannot be assessed properly.

Multicollinearity can be detected by:

(1) high R2 but few significant ‘t’ ratios,


(2) high pair wise correlation between explanatory variables. One can try partial
correlations, subsidiary or auxiliary regressions as well. But each such technique
increases the burden of calculations.

3.5.7 Problem of Heteroscedasticity

The Classical Linear Regression Model has a significant underlying assumption that all
the error terms are identically distributed with zero mean and the same standard deviation
equal to σ (or variance equal to σ2). The second part of the assumption; that errors have a
constant standard deviation or variance is known as the assumption of homoscedasticity.
What happens when this assumption of homoscedasticity does not hold? In symbolic
terms, E (u i ) 2 = σ i2 i = 1" n , that is, if the expectation of squared errors is no longer
equal to σ2 ⎯ each error term has its own σ2, or variance that varies from observation to
observation.

It has been observed that usually time series data do not suffer from this problem of
hetero-scedasticity but in cross-section data, the problem may assume serious
dimensions.
The consequences of heteroscedasticity:

If the assumption of homoscedasticity does not hold, we observe the following impact on
OLS estimators.

1. They are still linear


2. They are still unbiased
3. But they no longer have minimum variance – that is we cannot call them BLUE:
the Best Linear Unbiased Estimators. In fact, this point is relevant both for small
as well as large samples.
4. The reason for this problem hinted at in (3) above is that generally, OLS
estimators have some bias built into their formulae. We try to rectify that by
making use of degrees of freedom. For instance σ̂ 2 , (the estimator for true
population σ2) given by ∑ e12 df no longer remains unbiased. And this very σ̂ 2
enters into calculation of standard errors of OLS estimates.
5. Since, estimates of standard errors are themselves no longer reliable, we may end
up drawing wrong conclusions while using conventional reasoning in procedures
for testing hypotheses.

How to detect heteroscedasticity

In applied regression analysis, plotting the residual terms can give us important clues
about whether or not one or more assumptions underlying our regression model hold. The
pattern exhibited by ei2 plotted against the values of the concerned variable can provide
71

an important clue. If no pattern is detected – homoscedasticity holds or heteroscedasticity


is absent. On the other hand, if errors form a pattern with the values of the variable like
expanding, increasing linearly or changing in some non-linear manner, there is a distinct
possibility of the presence of heteroscedasticity.
Some statistical tests have been designed to detect the presence of heteroscedasticity.
Some of the prominent ones are: Park Test, Glejser Test, White’s General Test,
Spearman’s Rank Correlation Test, Goldfield- Quadnt Test etc. But here, the limitation of
space does not permit us to go into their details. We are forced to refer the learners again
to the course on Basic Econometrics for such details.

How to tackle the heteroscedasticity?

Our ability to tackle the problem will depend upon the assumptions that we can make
about the error variance. Thus, the following situations may emerge
i) When σ i2 is known
Here the CLRM
Yi = β 0 + β1 X 1 + u i can be transformed by dividing each value by the
corresponding σi. Thus,
Yi ⎛ 1 ⎞ ⎛X ⎞ u
= β 0 ⎜⎜ ⎟⎟ + β 1 ⎜⎜ i ⎟⎟ + i
σi ⎝σi ⎠ ⎝ σi ⎠ σi

ui
This effectively transforms the error terms to which can be shown to be
σi
homoscedastic and therefore, the OLS estimators will be free of disability caused by
heteroscedasticity. The estimates of β0 and β1 in this situation are called Weighted Least
Squares Estimators (WLSEs).

ii) When σ2 is unknown: we make some further assumptions about error variance:

(a) Error variance proportional to the Xis. Here, the Square Root transformation is
enough. We divide on both sides by X i . Thus, our regression line looks like:
Yi ⎛ 1 ⎞ ⎛ ⎞
= β0 ⎜ ⎟ + β1 ⎜ X i ⎟ + ui
Xi ⎜ X ⎟ ⎜ X ⎟ Xi
⎝ i ⎠ ⎝ i ⎠
1
= β0 + β1 X i +ν i
Xi
ui
hereν i =
Xi
and this is sufficient to address the problem.

(b) Error variance is proportional to X12. Here, instead of division by X i , we divide


by Xi on both the sides and estimate
72

Yi 1 u
= β0 + βi + i
Xi Xi Xi
1
= β0 + β i + ηi
Xi

ui
The error term will be ηi = and this will be free of heteroscedasticity and thus, will
Xi
facilitate the use of CLS techniques.

iii) Respecification of the Model: Assigning a different functional form to the model, in
place of speculating about the nature of variance may be found to be expedient. For
example, instead of the original model, we can estimate this model:

ln Yi = β 0 + β1 ln X i + u i
This loglinear model is usually adequate to address our concerns.

Read: Introduction to Econometrics by Christopher Dougherty (2002) chapter 8pp 220-


230, Oxford University Press

3.5.8 Problem of Autocorrelation

The classical regression model also assumes that disturbance terms uis do not have any
serial correlation. But, in many situations this assumption may not hold. The
consequences of the presence of serial or auto correlation are similar to those of hetero
scedasticity: the OLS are no longer BLUE. Symbolically no autocorrelation means
E(uiuj)= 0, when i ≠ j.
Autocorrelation can arise in economic data on account of many factors:

i) Inertia is a major reason for the presence of autocorrelation. An economic time


series generally displays a cyclical pattern of upswings and downswings for
various reasons and these swings have a tendency to continue. This tendency is
called inertia.

ii) Specification Bias is an important source of autocorrelation. Specification bias


may be due to under specification or the use incorrect functional form of the
model. For example, one might use only a few explanatory variables and thereby
might exclude rather large systematic components to be clubbed with errors.
Similarly one might use a linear form, instead of a non-linear form for the model.

iii) Cobweb Phenomenon is another factor which may also give rise to the problem
of autocorrelation in certain types of economic time series (especially agricultural
output and the like).

iv) Polishing of Data also sometimes results in the presence of autocorrelation. We


sometimes manipulate the monthly data to obtain quarterly data or similarly
manipulate the quarterly data to derive the half-yearly series and the like. Such
manipulations usually involve certain averaging procedure. This may also be
73

responsible for autocorrelation; because the averaging process dampens the


fluctuations of the original data.

The consequences of autocorrelation are not different from those of heteroscedasticity


listed in 10.7 above. Here too OLS estimators are biased and consequently, not BLUE. In
fact, t and F tests cease to be reliable. As result, one can no longer depend upon the
computed value of R2 as a true indicator of goodness of fit.

There are many tests for detecting autocorrelation. Some of them can be visual inspection
of error plots, Runs Test and Swed-Eisenhart Critical Runs Test. But the test most
commonly used is Durbin-Watson d Test. This is defined as
n

∑ (e t − et −1 ) 2
d= t =2
n

∑e
t =1
2
t

However, again, we are holding back information on practical detections and avoidance
of problem of autocorrelation for the reasons of limitation of space here.

Read: Introduction to Econometrics by Christopher Dougherty (2002) chapter 13 pp


337-358, Oxford University Press

3.6 FURTHER SUGGESTED READINGS

1. Maddala, G.S. (2002), Introduction to Econometrics, Third Edition, Chapter 3 and


Chapter 4, John Wiley & Sons Ltd., West Sussex.
2. Pindy ck, Robert S. and Rubin field, Daniel L. (1991) Econometric models and
economic forecasts, third edition, Chapter 1 Mc graw Hill, New York, U.S.A.
3. Ramanathan Ramu (2002); Introductory Econometrics (fifth edition), chapter
8,9,10 and 14, Cengage Learning Private Limited, New Delhi
4. Karmel, P.H. and Polasek, M. (1986), Applied statistics for economists, fourth
edition, chapter 8, Khasala Publishing House, New Delhi.

3.7 MODEL QUESTIONS

1. State the various forms of regression models. When will you use a log linear
regression model? Give an illustration in support of your answer.
2. How do you interpret the estimated slope coefficient of a log linear regression
model?
3. If you want to estimate India’s rate of growth of per capita income during the
period 1990-2008, what should be the functional form of your regression
model?
4. How do you interpret coefficients of multiple regression model? Give an
example in support of your answer.
74

5. What is multi-co-linearity? What are its consequences?


6. What are the consequences of hetero-scedasticity? How will you tackle the
problem of hetero-scedasticity?
7. “Inclusion of more variables always increases R2 the goodness of fit. So to
make a regression model ‘good’ what we need to do is simply increase the
number of explanatory variables”. Do you agree/disagree with this statement?
Give reasons.
8. How do you interpret coefficients of multiple regression model?
9. From a sample of 209 firsm, the following regression results are given
Log (salary) = 4.32+0.280 log (sales) + 0.0174 roe + 0.00024 ros
Se = (0.32) (0.035) (0.0041) (0.00054)
R2 = 0.283
Where salary = salary of CEO
Sales = annual firm sales
roe = return on equity in percent
ros = return on firms’s stock
And where figures in the parentheses are the estimated standard errors.
a. Interpret the preceding regression taking into account any prior
expectations that you may have about the signs of the various coefficients.
b. Which of the coefficients are individually statistically significant at the 5
percent level?
c. What is the overall significance of the regression? Which test do you use?
And why?
75

BLOCK 04 QUALITATIVE METHODS


Structure
4.0 Objectives
4.1 Introduction
4.2 An Overview of the Block
4.3 Research Approaches
4.3.1 Philosophical Foundation
4.3.2 Frameworks for Qualitative Research
4.3.3 Research Strategies
4.3.4 Methods of Qualitative Research
4.4 Participatory Rural Appraisal (PRA) Approach
4.4.1 Rapid Rural Appraisal and Participatory Rural Appraisal
4.4.2 Other Streams of PRA
4.4.3 Principles of PRA
4.4.4 Organizing PRA
4.4.5 Methods and Techniques of PRA
4.4.6 Sequence of Techniques
4.4.7 Practical Applications
4.4.8 Validity and Reliability
4.4.9 Vulnerability and Risks
4.4.10 Challenges
4.5 Case Study Method
4.5.1 Types of Case Studies
4.5.2 Case Study Design
4.5.3 Component of case studies
4.5.4 Sources of evidence
4.5.5 Principles of case studies
4.5.6 Steps of case studies
4.6 Further Suggested Readings
4.7 Model questions

4.0 OBJECTIVES
The main objectives of this block are to:
• apprise you of the philosophical foundations and research perspectives guiding
the qualitative research,
• to explain the various principles governing the participatory method of qualitative
approach,
• discuss the process and stages involved in participatory method,
• apply the various tools and techniques of PRA approache in research,
• appreciate the limitations and challenges faced in participatory method, and
• explain the principles, research design and the steps involved in conducting
studies by applying case study method.
76

4.1 INTRODUCTION
Research methodology deals with the branch of philosophy that analyses the principles
and procedures of scientific inquiry in a particular discipline with a set of pedagogy to
understand complex reality. Principles and procedures of scientific enquiry tend to unfold
causality of factors to understand complex phenomena through empirical evidences and
their validations. Empirical evidences are captured through quantitative and qualitative
approaches and variables. The first three blocks of this course contain the different
aspects of the quantitative approach namely Foundations of Research Methods, data
collection and Analysis of Data through Quantitative Methods. Quantitative approach
broadly deals with data and sampling errors but still lacking reliability of data on account
of non-sampling errors to handle such situations. On the other hand, Qualitative approach
is an in depth scientific enquiry of complex events, their dimensions and variables, which
are difficult to be captured through cardinal or quantitative approach. For example, it is
easier to collect data on income and expenditure of the households canvassing simple
structured questions in the Keynesian framework of psychological laws. Moreover,
traditional quantitative research methods are considered time taking exercise to produce
results which are often irrelevant for particular time bound policy drive. These methods
also involve high cost of formal surveys. Keeping in view all these limitations of
quantitative approach, this block covers qualitative approach to research methods, tools
and techniques of data collections, formatting, processing and analysis of data, report
writing, etc. This block focuses on a few important methods of qualitative approach:
participatory rural appraisal (PRA) and case study method (CSM).

4.2 AN OVERVIEW OF THE BLOCK


The major difference between quantitative and qualitative approach lies in the underlying
beliefs and assumptions, the framework guiding research and the methodological
prescriptions. Critical theory and interpretivism have emerged alternative paradigm to
positivism and post positivism in the context of qualitative approach. Qualitative
approach is an in-depth scientific enquiry of complex events, their dimensions and
variables which are difficult to be captured through quantitative approach. Qualitative
research methods can be put broadly under two categories: (i) traditional established
methods like ethnography including case studies, interviewing, history and
historiography, (ii) emerging qualitative methods: Participatory methods specially
Participatory Rural Appraisal (PRA), . These methods are used in conducting research in
the area of agriculture, rural development, health, nutrition, agro-forestry, natural
resource assessment, emergencies and disasters, etc. Hence, principles guiding the PRA
approach, methods used for data collection and analysis have been discussed in this
block. Case study is another important method for probing the issues. Underlying
principles and steps involved in conducting case studies ,therefore, have also been taken
up in this block..
4.3 RESEARCH APPROACHES: QUANTITATIVE AND QUALITATIVE
The major difference between the quantitative approach and qualitative approach is not
the type of data used or preferred but is much broader and deeper. It lies in the
underlying beliefs and foundational assumptions (i.e. paradigm) that guide the use of
particular research method and assumed to be true. A paradigm is a comprehensive
belief system, world view or framework that guides research and practice in the field. It
consists of:
• At the basic or fundamental level, a philosophy of science that makes a number of
assumptions about fundamental issues relating to nature and characteristics of
77

truth or reality (ontology) and the theory of knowledge dealing with how can we
know the things that exist (epistemology).
• World view or framework that guides research and practice in field,
• General methodological prescriptions including instrumental techniques about
how to conduct work within the paradigm.

Since knowledge of philosophical foundations, frameworks and paradigms enable us to


understand when, how and where a particular method will be appropriate, a brief
discussion on these three components is desirable.
4.3.1 Philosophical Foundation
Ontology and Epistemology : Ontology and Epistemology are the two major aspects of
a branch of philosophy called metaphysics. Ontology is concerned with nature of reality
whereas epistemology refers to a theory of knowledge how human beings come to have
knowledge of the world around them. Two theories of knowledge have pre-dominated in
philosophical discourse: Rationalism and Empiricism. Rationalism is based on the idea
that reliable knowledge is derived from the use of pure reason, establishing indisputable
axioms and then using formal logic to arrive at conclusions. Empiricism, on the other
hand, relies on the use of human senses to produce reliable knowledge. These
philosophical positions can be further elaborated in terms of two dominant
epistemological positions and their associated ontological positions: materialism and
idealism, thus generating a four-way classification scheme.
(i) Empiricism: Materialist ontology and nominalist epistemology together
constitutes empiricism. Under this position reality is viewed as being constituted of
material things to be observed by the human senses.
(ii) Substantialism: Again the view that ‘matter constitutes the reality’ is
adopted. But here people in different times and places can interpret reality differently.
(iii) Subjectivism: Since, reality here is viewed as socially constructed and
interpreted, knowledge of this reality is available from the accounts that social actors
provide.
(iv) Rationalism: ‘Reality is made up of ideas’ under this position. It is believed that
it exists independently of people and their consciousness. Knowledge can be obtained
only by examining thought process.
These four positions are associated with the major paradigms (philosophy of sciences) in
the following manner.
Empiricism – Positivism and Post- Positivism or falsificationism
Substantalism – Critical Realism, Critical Theory
Subjectivism - Interpretativism
The exact number of world views and the names associated with a particular paradigm
vary from author to author, but three paradigms in the context of qualitative approach of
research are important:
• Positivism and Post-Positivism
• Critical Theory
• Interpretivism

Qualitative research sometimes is described as interpretative, critical or post modern


research whereas quantitative research is often called as empirical, positivist, post-
positivist or objectivist. There are important differences between positivism and post-
Positivism or post modernism and interpretivism. However these differences are less
78

important than the similarities. Critical theory and interpretativism are the most
important paradigms qualitative research. The peculiar features of these paradigms are:
• They differ on the question of reality.
• They offer different reasons or purposes for doing research.
• They point us to quite different types of data and methods as being valuable and
worthwhile.
• They have different ways of deriving meaning from the collected data.
• They vary in the relationship between research and practice.

The above three paradigms have been the dominant guiding frameworks in research in
the social sciences.

Differences between Post positivism and Critical Theory on the Five Major Issues
Post Positivism Critical Theory
Nature of reality Material and external to the Material and external to the
human mind human mind
Purpose of research Find universals Uncover local instances of
universal power
relationships and empower
the oppressed
Acceptable methods and Scientific method Subjective inquiry based on
data Objective data ideology and values; both
quantitative and qualitative
data are acceptable
Meaning of data Falsification is Interpreted through
Used to test theory ideology; used to enlighten
and emancipate
Relationship of research to Separate activities Integrated activities
practice Research guides practice Research guides practice
(Source: Foundations of Qualitative Research by Jerry W. Willis (2007) p.p.83)

Differences between Post positivism and Interpretivism on the Five Major Issues
Post Positivism Interpretivism
Nature of reality External to human mind Socially constructed
Purpose of research Find universals Reflect understanding
Acceptable methods and Scientific method Subjective and Objective
data research methods are
acceptable
Meaning of data Falsification Understanding is contextual.
Use to test theory Universals are deemphasized
Relationship of research to Separate activities Integrated activities
practice Research guides practice Both guide and become the
other
(Source: Foundations of Qualitative Research by Jerry W. Willis (2007) p.p.95)

4.3.2 Framework for Qualitative Research


79

Qualitative researchers have the option to choose conceptual frameworks out of various
attachments. A framework is a set of broad concepts that guide research. Researches
working within interpretive and critical theory paradigms have a number of frameworks
to choose. There are many commonalities of these frameworks but there are many
differences too. These differences often lead to the development and use of different
research methods.
The important frameworks that appeal to a number researches today include:
• Analytic Realism
• Interpretive perspective
• Einsner’s connoisseurship model of inquiry
• Semiotics
• Structuralism
• Post structuralism and Post modernism

All these frameworks put forward different options before a qualitative researchers and
point towards certain research methods, goals and topic. Under the positivist or post-
positivist paradigm, a researcher undertakes the research by stressing the variables,
hypothesis and propositions derived from a particular theory that sees the world in terms
of cause and effect. On the other hand, under the interpretative paradigm, emphasis is
laid on socially constructed realities, inter-subjectivity, local generalisations, practical
reasoning and ordinary talk. Critical researches underline the importance of terms like
action, structure, culture and power fitted into a general model of society. Under the
feminist perspective, research focuses on gender, reflectivity, emotion and an action
orientation.
All these frameworks have different applications in different disciplines of social
sciences and provide options towards certain research methods, goals and topics. Two
characteristics – the search for contextual understanding, the place of universal laws and
correspondingly research design – make the qualitative approach distinct from post
positivist quantitative approach.
Read Chapter 5 Frameworks for qualitative Research , Foundations of qualitative
Research by Jerry W.Wills (2007), Sage Publications, P 147-181

4.3.3 Research Strategies

The four research strategies work with different ontological assumptions:


Positivism - Induction
Falsification - Deduction
Critical Realism - Retroduction
Interpretativism - Abduction

Each strategy has different starting points:


1. The Inductive strategy begins with collection of data from which
generslisation is made and can be used as an elementary explanation.
2. The deductive strategy starts with theory that provides a possible answer. The
theory is tested in the context of a research problem by collection of relevant
data.
3. The retroductive strategy starts out with a hypothetical mode of a mechanism
that could explain the occurrence of a phenomenon under investigation.
80

4. The abductive strategy starts with laying the concepts and meanings that are
contained in social quarters’ accounts of activities related to a research
problem.

Read: Philosophy of Social Research (pp.816 to 820) in The Sage Encyclopedia of


Social Science Research Methods by Michael S. Lewis-Beck, Alan Bryman, Tim Futing
Liao (ed.), Sage Publications (2004)

4.3.4 Methods of Qualitative Research


Methods of Qualitative
Research

I Established Qualitative II Emerging Qualitative Methods


Research Methods
• Ethnography A. Participatory Method

• Interviewing B. Emancipatory Methods


(structured, semi structured (i) Critical emancipatory research
and open) (ii) Feminist and standpoint research
(iii) Critical action research
• History and
Historiography
The traditional qualitative research methods under category I have been used mostly
in the different disciplines of social sciences like anthropology, Sociology,
psychology, education and history etc. Similarly Historiography method is useful to
investigate the issues in history. Emancipatory research developed from critical
theory perspective, is useful in studying the worker-management relationship in a
large factory and is based on the assumption that research should lead to greater
freedom and control on the part of participants.
With emergence of interpretive and critical theory approach as an alternatives to
positivism and post-positivism, participative approach is being increasingly used to
conduct the evaluative research studies in economics involving the people who have
been the subjects of research in the research process. Rapid Rural Appraisal and
participatory “Rural Appraisal are alternatives to sample survey approach in
collecting the data at the village level studies particularly in a situation when we need
the results in emergent situations like flood, earthquakes.
Hence in the further discussion of this block we shall take up the two important
methods – PRA and the Case Study

Read Chapter 7 Methods of qualitative Research, Foundations of qualitative


Research by Jerry W.Wills (2007), Sage Publications P 260-278
81

4.4 PARTICIPATORY RURAL APPRAISAL (PRA) METHOD


Participation is now widely accepted as a philosophy and mode in development research.
One practical set of approaches which have evolved and spread in the early 1990s is
termed as Participatory Rural Appraisal (PRA). It is used to describe a growing family of
approaches and methods to enable local people to share, enhance and analyze their
knowledge of life and conditions, to plan and to act. PRA flows from and owes much to
the traditions and methods of participatory research: PRA has many sources. Important
among them are:
- Rapid Rural Appraisal
- Activist Participatory Research
- Agroeco System Analysis
- Applied Anthropology
- Field Research on Farming Systems

Read Participatory Rural Appraisal (PRA): Analysis of Experience by Robert


Chambers, World Development, vol.22 N.9 pp 1253-1268, 1994
4.4.1 Rapid Rural Appraisal (RRA) and Participatory Rural Appraisal (PRA)
PRA has evolved from RRA. RRA itself began as a response in the late 1970s and early
1980s to the biased perceptions derived from rural development tourism and the many
defects and high costs of large-scale questionnaire surveys.
The basic distinction between PRA and RRA is that in RRA information is more elicited
and extracted by outsiders while in PRA, it is more shared and owned by local people to
conduct their own analysis and often to plan and take action. In this sense, PRA often
implies radical personal and institutional change. The comparison between both the
approaches has been summerised in the following table.
Table 4.1: RRA and PRA compared
________________________________________________________________________
RRA PRA
________________________________________________________________________

Period of major development Late 1970s, 1980s Late 1980s, 1990s

Major innovators based in Universities NGOs

Main users at first Aid agencies, NGOs


Universities Government field
organisations

Key resource earlier undervalued Local people’s knowledge Local people’s


analytical capabilities

Main innovations Methods Behaviour


Team management Experiential training

Predominant mode Elicitive, Extractive Facilitating,


Participatory

Ideal objectives Learning by outsiders Empowerment of


local people
82

Longer term outcomes Plans, projects, publications Sustainable local


action and institution
________________________________________________________________________
Source: (Chambers 1994a:958)

In short, RRA methods are more verbal with outsiders more active, while PRA methods
are more visual with local people more active. The methods between two approaches are
broadly shared. Thus
(i) RRA approach is extractive-elicitive in nature wherein data is collected by
outsiders.
(ii) PRA is sharing-empowering approach where the main objectives are variously
investigation, analysis, learning, planning, action, monitoring and evaluation
by insiders.

In practice, there is a continuum between RRA and PRA in the following manner:
Table 4.2: RRA and PRA continuum
________________________________________________________________________
Nature of process RRA PRA
________________________________________________________________________
Mode Extractive elicitive Sharing
empowering

Outsider’s role Investigator Facilitator

Information owned, analysed Outsider Local people


and used by
Methods used Mainly RRA Mainly PRA
+ sometimes PRA + sometimes RRA
________________________________________________________________________
Source: (Chambers 1994a:959)
RRA has its own advantage for macro policy decisions but PRA not only takes variations
into account but also ensures ownership of findings and analysis with local people as end
user. In RRA outsiders remain moving force and therefore ownership of findings may not
have similar fate. However, there is continuum in RRA and PRA also as shown in table
4.2.

4.4.2 Other Streams of PRA


(a) Activist participatory research
The term “activist participatory research” is used to refer to a family of approaches and
methods which use dialogue and participatory research to enhance people’s awareness
and confidence, and to empower their action.
The contributions of the activist participatory research stream to PRA have been more
through concepts than methods. They have in common three prescriptive ideas:
- that poor people are creative and capable, and can and should do much of their own
investigation, analysis and planning;
-that outsiders have roles as conveners, catalysts and facilitators;
-that the weak and marginalized can and should be empowered.
(b) Agroecosystem analysis
83

It was developed in Thailand from 1978 onwards and has combined analysis of ecology
and system properties with pattern analysis of space, time, flows and relationship, relative
values and decisions. It has contributed significantly to RRA and PRA particularly
transacts and informal mapping, diagramming, scoring and ranking.
Some of the major contributions of agroecosystem analysis to current RRA and PRA
have been:
- transects (systematic walks and observation);
- informal mapping (sketch maps drawn on site);
- diagramming (seasonal calendars, flow and causal diagrams, bar charts, Venn or chapati
diagrams);
- innovation assessment (scoring and ranking different actions).
(c) Applied anthropology
It took over stage of social anthropology in 1980s. Rapid assessment procedure (RAP)
and rapid ethnographic assessment (REA) were adopted in the field of health and
nutrition. In this exercise conversation, observation, informal interviews, and focus
group, etc., were used for data collection. Idea of field learning, participant observation,
importance of attitude behaviour and rapport and validity of local knowledge are major
contribution to RRA and PRA. Some of the many insights and contributions coming from
and shared with social anthropology have been:
- the idea of field learning as flexible art rather than rigid science;
- the value of field residence, unhurried participant- observation, and conversations;
- the importance of attitudes, behavior and rapport;
- the emit-etic distinction;
- the validity of indigenous technical knowledge.
(d) Field research on farming systems
It is of multi-disciplinary approach for complex and diversified problems with
systematized methods for investigating, understanding, and prescribing for farming
system complexity. In this method, farmers’ capabilities of experiment are recognized.
Field research on farming systems have contributed the appreciation and understanding
of:
-the complexity, diversity and risk-proneness of many farming systems;
-the knowledge, professionalism and rationality of small and poor farmers;
-their experimental mindset and behavior;
-their ability to conduct their own analyses.
4.4.3 Principles of PRA
Principles of PRA evolved in due course of experiments and results drawn from
development tourism. List of principles therefore varies from practitioners to
practitioners, as and when it evolved. However, there are certain commonalities shared
by most of them.
(i) Principles shared by RRA and PRA:
(a) Reversal of Learning: It is a departure from dominant paradigm of
learning from formal institutions and from consolidated published
information. In this approach, face to face learning from the people takes
place at site from their local, physical, technical and social knowledge and
analysis.
(b) Learning rapidly and progressively: It has inbuilt adaptable rapid learning
process with conscious exploration, flexible use of methods,
improvisation, iteration and cross checking and not following strictly a
blue print programme.
84

(c) Offsetting biases; It is a paradigm of offsetting biases of development


tourism by being relaxed and not rushing, listening instead of lecturing,
intensive probing, not being arrogant, listening poor and common people
instead of getting influenced by powerful and riches.
(d) Optimizing trades off: This approach is based on trade offs between the
cost of learning and usefulness of information, quantity, relevance,
accuracy, and timeliness. This also includes principles of optimal
ignorance and appropriate imprecision, i.e., it is better to be approximately
right than precisely wrong.
(e) Triangulation: PRA is based on triangulated framework of cross checking,
progressive learning and approximation through plural investigation with
three methods: sets of conditions, distribution, individuals or groups,
(f) Seeking diversity: This approach is based on maximising variability rather
than average. It applies purposive sampling not in strictly statistical sense
but it looks for contradictions, anomalies and differences.

(ii) Additionally evolved and stressed principles in PRA:


In addition to above principles PRA has some special emphasis on four following
principles:
(a) Subject self – analysis: This investigation is carried out to the results and
analysis by the local people with their ownership. It assumes and reposes
confidence in their capabilities of doing these exercises. Facilitators
simply initiate the process and become passive sitting back or walking
away from the scene ensuring least interruption.
(b) Self – critical awareness: Facilitators remain self critical and correcting
their failures and dominant behaviour.
(c) Personal responsibility: PRA is not a manual based approach but based on
personal responsibilities and best judgement at all times, and
(d) Sharing: It is based on sharing of information and ideas between local and
facilitators and also between different practitioners of different regions
and countries.

4.4.4 Organizing PRA

A PRA activity involves a team of people working for two to three weeks on workshop
discussions, analyses, and fieldwork. Several organizational aspects should be
considered:
• Logistical arrangements should consider nearby accommodations, arrangements
for lunch for fieldwork days, sufficient vehicles, portable computers, funds to
purchase refreshments for community meetings during the PRA, and supplies
such as flip chart paper and markers.
• Training of team members may be required, particularly if the PRA has the
second objective of training in addition to data collection.
• PRA results are influenced by the length of time allowed to conduct the exercise,
scheduling and assignment of report writing, and critical analysis of all data,
conclusions, and recommendations.
• PRA covering relatively few topics in a small area (perhaps two to four
communities) should take ten days to four weeks, but a with a wider scope over a
larger area can take several months. Allow five days for an introductory workshop
if training is involved.
85

• Reports are best written immediately after the fieldwork period based on notes
from PRA team members. A preliminary field report should be available within a
week or so. Final report should be made available to all the participants and the
local institutions involved.

4.4.5 Methods and Techniques of PRA


The more developed and tested methods of PRA include participatory mapping and
modelling, transect walks, matrix scoring, well-being grouping and ranking seasonal
calendars, institutional diagramming, trend and change analysis, and analytical
diagramming, all undertaken by local people.
Broadly four methods are applied for PRA. Hundreds of participatory techniques and
tools have been developed in a variety of occasions and taught in training courses around
the world. These techniques are divided into four categories:

• Group dynamics, e.g. learning contracts, role reversals, feedback sessions


• Sampling, e.g. transect walks, wealth ranking, social mapping
• Interviewing, e.g. focus group discussions, semi-structured interviews,
triangulation
• Visualization e.g. venn diagrams, matrix scoring, time lines

In order to ensure that people are not excluded from participation, generally these
techniques avoid writing wherever possible, relying on the tools of oral communication
like pictures, symbols, physical objects and group memory. Efforts are made in many
projects, however, to build a bridge to formal literacy; for example by teaching people
how to sign their names or recognize their signatures. Tools of data collection for RRA
and PRA are often overlapping and of supplementary nature but the basic difference lies
in the forms of ownership and end user.
Team contracts and interactions:
It is always better to decide collectively and unanimously with permissible variation
about certain norms and behaviour before team proceeds for PRA. It may agree to
interact on issues with certain distance or closeness, mode of discussion and its
consolidation, division of labour, etc.
Role reversal:
It is very important that unlike development tourism in PRA, catalysts are always in
learning mode. They are merely facilitators. Therefore, dominant behaviour and
arrogance of being custodian of solution and knowledge is not acceptable. Deliberate
effort to change behaviour and attitude is essential for a successful PRA. They may
initiate discussion and leave the site or sit back passively but observe carefully. No
manual is strictly followed. Group may decide collectively as they find effective.
Feed back session:
In order to review and consolidation of progress feed back session is essential element in
PRA.
Transact walks:
In this exercise, facilitators create an environment in which surroundings are discussed
and learnt with local people while walking through the village with people of the area.
This gives an empowerment and opportunity for local people to get involved in sharing
their observations in discussions about features of village, its qualities of resources,
infrastructures, technological levels in production, patterns of uses, production and
productivities, life styles, customs and festivals, public sharing events, leadership
qualities, problems of locality, solutions, hurdles, etc. Facilitators have to be patient
86

listeners while transacting walks and providing opportunities to local people to present
information through various forms: mapping, modelling, using symbols, etc. Thus, this
exercise empowers local people to consolidate information with collective wisdom for
initiating data collection by them.
Identification of Key informants:
Identification of articulated experts of the area is an important task on which success and
quality of PRA depends. Key informants are identified through participatory social
mapping of a village.
Social Mapping:
It provides a basis for household listings, and for indicating population, social group,
health and other household characteristics. This can lead to identification of key
informants and discussions with them. A village social map provides an up dated
household listing to be used for well-being or wealth ranking of households, based on
these lists, focus group consisting of different categories of people are formed. These
groups express their different preferences leading to discussion, negotiation and
reconciliation of priorities. Resource maps help to understand the natural and
environmental settings in a particular village.

Well-being wealth grouping and ranking:


In this exercise people participate in identifying households with different level of well-
being or wealth. Local wisdom needs to identify whether the household is landless or
with different size of farms, poor or poorest of the poor, etc. This task may also identify
basis and indicators of wellbeing. Value of a particular activity or item according to a
range of criteria is ranked. For example, a range of different land care group activities
could be assessed against a set of criteria such as attendance rate, cost and value to
members.
Group participation:
Participation of heterogeneity of the cross section of people from the community with a
mix of different levels of seriousness and varieties is essential for participatory collective
feed back. This method has been used by many streams of research.
Do it your self:
In order to make it participatory, catalyst needs asking to be taught, being taught, and
performing village tasks, agricultural operations, hut thatching and dressing, fetching
waters, collecting fuels, stitching and washing clothes, etc.
Local People do it:
Villagers perform as investigators and researchers. They transact, interview, observe,
analyse data, and present results.
Semi structured interviews:
This is one of the most important techniques generally based on visual and oral
framework of participation. A tentative checklist with categorical option of open ended
feature is the core of participation allowing space with flexibility for narratives and
symbols.
Participatory analysis of secondary sources:
Files, reports, maps, aerial photographs, satellite imagery, articles and books are used for
analysis. Mostly data in pictorial form are analysed by the community in which illiterate
can equally participate.
Participatory mapping and modelling:
In this exercise local people use ground, floor or paper to make social demographic,
health, education, natural resources, services, opportunities, using locally available
symbols, units, etc.
87

Oral history and ethno biography:


Narratives about an event of change in village by local people can be one of the best
sources of
recording collective wisdom. This may include any dimensions of local opportunities
and resources.
Livelihood analysis:
This is one of the core exercises to understand local realities. Generally local people have
a rhythm of life with opportunities of livelihood. They need to reflect on resources,
income, expenditure, credit, difficulties, potential opportunities, possibilities to overcome
crises, imagination of their framework of solution, etc., in their convenient and converse
language and units, which can be later standardised if needed for planning and policy.
Participatory linkage diagramming:
It is always necessary to coordinate the discussion in participatory framework with
causality connections. Facilitators need to streamline discussions to achieve the goal with
permissible variations but not at the cost of participation.
Venn diagramming:
It is an exercise to identify individuals and institutions which are considered important in
carrying out tasks, resolving problems, creating hurdles, etc., and their relations with
individuals and institutions.

Matrix scoring and ranking:


Preparing matrices with convenient local symbols, units, do not require any formal
education or literacy. Information can be symbolised by drawing trees, people, huts,
crops, live stocks, water, forest, quality of soils, etc. Matrix scoring or ranking, elicits
villagers’ criteria of value of a class of items (trees, vegetables, fodder grasses animal
credit grasses, varieties of a crop or animal, sources of credit, market outlets, fuel types)
which leads into discussion of preferences and actions by the implementers and the local
community.
Time lines, trend and change analysis:
It is an exercise of local people’s account of the past events, listing of changes they
remember, initiate them with examples, such as change in use of technology, production,
cropping pattern, infrastructures, demography, values of life, customs, capabilities,
employment, consumption pattern, live stocks, living conditions, institutions, causes of
change, etc. This exercise can be as intensive as the objective of the study permits for
data requirement. Hence a tentative checklist with open ended option is always helpful to
consolidate participation effectively.
Seasonal calendars:
Information of practices and changes through local calendars is always better in terms of
reliability. This may later be converted into standard units with a precaution that basic
features of variations are not lost.

Daily time use analysis:


Even practices of village in terms of their routine of life may help in understanding their
potential and uses.
Analysis of difference:
While conducting wellbeing ranking or doing any exercise of PRA, analysis of difference
in terms of social groups, gender, farm size, poor and non-poor provides insights for
drawing lines in understanding problems to decide priorities, planning and actions.
88

Contrast and comparison with various dimensions empower local people to understand
their problems with various dimensions.
Estimates and quantifications:
Collecting data in local units by the local people is easier. Even illiterates with small
pebbles or smaller parts of brick chips, seeds, sticks, drawing lines and tallies on the
ground or walls can serve the purposes of counting for data. They manage their accounts
with their phenomenal memories about the events. Combining with mapping, modelling
and matrices, an excellent quantification may be made by the local people.
Key probes:
Direct questions relating to set objectives of investigation lead to focussed discussion for
non controversial issues. Indirect narratives help in initiating discussions on controversial
issues.
Stories portraits and case studies:
Narratives provide excellent insights for which local people are inexhaustible treasure.
They can reflect through memory lane about the events what they resolved or not
resolved and their outcomes. Case studies help in unfolding and correcting general
perceptions.
Presentations and analysis:
It is always better to make presentation and cross check by local people. If at all they
need a little direction, facilitator can help in but participatory spirit of presentation must
not be distorted.
After analysis of participatory action planning, budgeting, implementation and
monitoring need be decided with time frame.
Report writing:
Report writing is to be done without delay in the field itself collectively dividing
assignments among the designated people involved in the process of learning through
PRA sequences. Feedback from the groups and local people is always correct approach to
validate understanding of the problems and solutions.
4.4.6 Sequence of Techniques
PRA techniques can be combined in a number of different orders and ways depending
upon the topic focus, goal and objectives under investigation. Some general rules of
thumb, however, may be useful. Rapport building is the core of the success of any PRA.
Mapping and modelling are good techniques to start with because they involve several
people, stimulate much discussion and enthusiasm, provide the PRA team with an
overview of the area, and deal with noncontroversial information. Maps and models may
lead to identification of key informants; transect walks, perhaps accompanied by some of
the people who have constructed the map; listing of the households. Wealth ranking is
best done later in a PRA, once a degree of rapport has been established, given the relative
sensitivity of this information followed by focus group discussion, matrix scoring, and
preference ranking. However, sequence of techniques should be decided by the groups
through brainstorming. This exercise of group discussion may take place more than once
depending upon the felt need. The group may decide stages to include or exclude outsider
while conducting group discussion. The current situation can be shown using maps and
models, but subsequent seasonal and historical diagramming exercises can reveal changes
and trends, throughout a single year or over several years. Preference ranking is a good
ice breaker at the beginning of a group interview and helps focus the discussion. Later,
individual interviews can follow up on the different preferences among the group
members and the reasons for these differences.
4.4.7 Practical Applications
89

This approach has been popular in the field of natural resource management, agriculture,
implementation of rural development programmes, poverty eradication and social
development: health, education and food security, etc. Now it has spread in creating and
managing self help group, marketing and commercial sector also.

Read The Origins and Practice of Participatory Rural Appraisal, by Robert Chambers,
World Development, vol.22 N.7 pp 953-963, 1994

4.4.8 Validity and Reliability


Since RRA and PRA are considered closer to reality and direct response from the local
people, validity and reliability of data through these approaches are expected to be better.
Robert Chambers has reviewed findings of practitioners and reached to the conclusion
that there is no significant difference in findings of PRA in case of farm and households
survey through large questionnaire methods. Even for ranking, participatory village
censuses, and rain fall data the results are not significantly different. It is argued that
measurement data apply rigorous statistical tools and techniques which claim better
precision over comparison of preferences through these approaches. However, it is also
argued that discrepancies have been recognised in large scale data collection by outsiders
at various levels with inherent biases. Practitioners in these areas have been
experimenting and evolving better ways of learning by doing what worked while moving
towards closer to reality. Moreover, these approaches have been reversal from the main
professional practices towards their opposites. There are four clusters of reversal
intertwine and mutually reinforcing: (a)reversal of frames from etic to emic, i.e., from
outsiders to local; (b)reversal of modes from individual to group, from verbal to visual,
from measuring to comparing; (c)reversal of relations: from reserve to rapport, from
frustration to fun; and (d) reversal of power: from extracting to empowering.

4.4.9 Vulnerability and Risks


PRA and RRA have spread very fast as bottom up approach after brief hesitation of
acceptance from the academic professionals. However, there are vulnerability and risks
of rapid and rigid adoption. These approaches have come in instant fashion which is
vulnerable of becoming discredited if not applied properly. Rapid word injects elements
of rushing towards misleading conclusions. Standardisation has every risk of formalising
the codes, methods, check list and manuals which will defeat the purpose of becoming
best judge at the site.

4.4.10 Challenges
Real challenge is to make it participatory in true sense. Rural society is so complex and
chained in various sacks of identities such as caste, religion, power groups and class, it is
not easy to make the group participatory in real sense. Rapport building and allowing
space for the poor in participating with their experience and wisdom has long way to go.
However, Robert Chambers has considered and listed seven challenges given below:
(a) Beyond farming system research
(b) Participatory alternative to questionnaire surveys
(c) Issues of Empowerment and equity
(d) Local people as facilitator and trainers
(e) Policy research and change
(f) Personal behaviour attitude and learning
(g) PRA in organisations
90

These approaches may have many shortcomings but elements of creative learning,
empowerment and ownership of results have made them distinctly different from others.
Read Participatory Rural Appraisal (PRA): Challenges, Potentials and Paradigm by
Robert Chambers, World Development, vol.22 N.10 pp 1437-1454, 1994

4.5 CASE STUDY METHOD


The term ‘case study’ refers to research that studies a small number of cases possibly
even just one, in considerable depth. Frequently, but not always, case study implies the
collection of unstructured data and qualitative analysis of such data. Generally case
study aims to capture cases in their uniqueness, rather to use them as a basis for wider
empirical or theoretical conclusions.
The case study is distinct from other two types of research design i.e. survey and
experiments:
- In case study the number of cases are less but large amount of information are
collected about one or few cases, across a wide range. On the other hand, in
surveys a large number of cases are studied but relatively a small amount of data
is gathered.
- In experimental research, in a small number of cases are investigated compared to
survey work. However, it involves direct control of variables. Here, the
researcher creates the case(s) studied, whereas in case study the researchers
identifies cases out of naturally occurring social phenomena.
The case study is used to refer to a variety of different approaches and it raises some
fundamental methodological issues.
- Does the case study aim to produce an account of each case from an external or
research point of view, which may contradict the views of the people involved?
Or, is it solely to portray the character of each case ‘in its own terms’?
- Whether case study is a method-with advantages and disadvantageous to be used
depending on the problem under investigation – or a paradigmatic approach that
one simply chooses or rejects on philosophical or political ground.
Viewed as a method, there can be following variations in the specific form of case study
depending on the purpose intending to serve:
• In the number of cases studied;
• In whether there is comparison and, if there is, in the role it plays;
• In how detailed the case studies are;
• In the size of the case(s) dealt with;
• In what researchers as the context of the case, how they identify it, and how much
they seek to document it;
• In the extent to which case study researchers restrict themselves to description,
explanation, and/or theory or engage in evaluation and/or prescription.

When it is designed to test or illustrate a theoretical point, it will deal with the case as an
instance of a type, describing it in terms of a particular theoretical framework (implicit or
explicit). When it is exploratory or concerned with developing theoretical ideas, it is
likely to be more detailed and open-ended in character. The same is true when the
concern is with describing and/or explaining what is going on in a particular situation for
its own sake. When the interest is in some problem in the situation investigated, the
discussion will be geared to diagnosing that problem, identifying its sources, and perhaps
outlining what can be done about it. Variation in purpose may also inform the selection
of cases for investigation.
91

Read: Philosophy of Social Research, Vol. 1 (pp.92 to 94) in The Sage Encyclopedia of
Social Science Research Methods by Michael S. Lewis-Beck, Alan Bryman, Tim Futing
Liao (ed.), Sage Publications (2004)
4.5.1 Types of Case Studies
Broadly there are three types of case studies: exploratory, explanatory, and descriptive..
Each of those three approaches can be either single or multiple-case studies, where
multiple-case studies are replicators, not sampled cases.
(a) Exploratory: In this type of case study fieldwork and data collection may be
undertaken prior to defining research questions and hypotheses. In view of time
constraints a willing and easy case needs to be identified.
(b) Explanatory: This type of cases is suitable for doing causal studies.
(c) Descriptive: In these type of cases, investigator begins with a descriptive theory,
or face the possibility that problems will occur during the project.
Case studies have been widely used in case of education, law medicine. Schools of
business have been most aggressive in the implementation of case based learning. It has
also been used in IT sector. Recently farmers’ suicides cases have been studied to
understand agrarian crises.

4.5.2 Case study design


In CSM there is no rigid number of cases to be undertaken. Case studies can be single or
multiple-case designs, where a multiple design must follow a replication rather than
sampling logic. When no other cases are available for replication, the researcher is
limited to single-case designs. Unlike measurement, in CSM generalization of results,
from either single or multiple designs, is made to theory and not to populations.

4.5.3 Component of case studies


R.Yin identified five components of research design that are important for case
studies:
• A study's questions
• Its propositions, if any
• Its unit(s) of analysis
• The logic linking the data to the propositions
• The criteria for interpreting the findings

4.5.4 Sources of evidence


R. Stake and R. Yin identified following six sources of evidence in case studies:
• Documents
• Archival records
• Interviews
• Direct observation
• Participant-observation
• Physical artefacts

4.5.5 Principles of case studies


R. Yin emphasised three principles for case study researcher:
92

• Show that the analysis relied on all the relevant evidence


• Include all major rival interpretations in the analysis
• Address the most significant aspect of the case study

4.5.6 Steps of case studies


Case study researchers have proposed six steps for case study:
(a) Determine and define the research questions
In this step researcher identifies phenomenon and object, establishes focus,
formulate purposes, questions through historical, social, economic, political
contexts, linkages and inter relations in the light of literature review.

(b) Select the cases and determine data gathering and analysis techniques
The researcher must determine whether to study cases which are unique in some
way or cases which are considered typical and may also select cases to represent a
variety of geographic regions, a variety of size parameters, or other parameters.
Specific case is identified and in case of multiple cases, each case is treated as
single unit. The researcher must use the designated data gathering tools
systematically and properly in collecting the evidence. Throughout the design
phase, researchers must ensure that the study is well constructed to ensure
construct validity, internal validity, external validity, and reliability

(c) Prepare to collect the data


Case study research generates a large amount of data from multiple sources,
systematic organization of the data is important to prevent the researcher from
becoming overwhelmed by the amount of data and to prevent the researcher from
losing sight of the original research purpose and questions. Advance preparation
assists in handling large amounts of data in a documented and systematic fashion.
Researchers prepare databases to assist with categorizing, sorting, storing, and
retrieving data for analysis.

(d) Collect data in the field


Researchers carefully observe the object of the case study and identify causal
factors associated with the observed phenomenon. Renegotiation of arrangements
with the objects of the study or addition of questions to interviews may be
necessary as the study progresses. Case study research is flexible, but when
changes are made, they are documented systematically.

(e) Evaluate and analyze the data


The case study method, with its use of multiple data collection methods and
analysis techniques, provides researchers with opportunities to triangulate data in
order to strengthen the research findings and conclusions. Researchers categorize,
tabulate, and recombine data to address the initial propositions or purpose of the
study, and conduct cross-checks of facts and discrepancies in accounts. Focused,
short, repeat interviews may be necessary to gather additional data to verify key
observations or check a fact. Specific techniques include placing information into
arrays, creating matrices of categories, creating flow charts or other displays, and
tabulating frequency of events. Researchers use the quantitative data that has been
collected to corroborate and support the qualitative data which is most useful for
understanding the rationale or theory underlying relationships.
93

(f) Prepare the report


Techniques for composing the report can include handling each case as a separate
chapter or treating the case as a chronological recounting. Some researchers report
the case study as a story. During the report preparation process, researchers
critically examine the document looking for ways the report is incomplete.

Thus, CSM is a useful method of qualitative approach to handle complex


phenomenon and objects in depth with specific variations and multiple dimensions of
data closure to real life.

4.6 FURTHER SUGGESTED READINGS

• Chambers, Robert (1995): Rural Appraisal: Rapid, Relaxed and Participatory, in


Mukherjee, Amitava (ed), Participatory Rural Appraisal, Vikas Publishing House
Pvt. Ltd., New Delhi.
• Crawford, I.M., Marketing Research and Information Systems. (Marketing and
Agribusiness Texts - 4), FAO, Rome, !997, Chapter : 8, Rapid Rural Appraisal,
http://www.fao.org/docrep/W3241E/w3241e08.htm#TopOfPage accessed on
12.02.2009.
• Graham Gibbs; (2007) Analysing Qualitative Data; Sage Publications
• Manfred Max Bergman; (Ed,) (2008): Advances in mixed methods Research;
Sage Publications
• Mukherjee, Amitava (Ed): Participatory Rural Appraisal, Vikas Publishing House
Pvt. Ltd., New Delhi, 1995.
• Piore, Michael J., Qualitative Research: Does it fit in economics? Massachusetts
Institute of Technology http:// econ-www.mit.edu/files/1125 accessed on
18.02.2009
• Soy, Susan K. (1997). The case study as a research method. Unpublished paper,
University of Texas at Austin, ssoy@ischool.utexas.edu, Last Updated
02/12/2006.
• Tellis, W. (1997, July). Introduction to case study [68 paragraphs]. The
Qualitative Report [On-line serial], 3(2). Available:
http://www.nova.edu/ssss/QR/QR3-2/tellis1.html
• Uwe Flick; (1998); An Introduction to Qualitative Research: An introduction to
Qualitative Research; SAGE Publications
• Uwe Flick; (2008): Designing Qualitative Research, SAGE Publications.
• Uwe Flick; (2008)’ Managing Quality in Qualitative Research; SAGE
Publications.
• Wignaraja, Ponna, Akmal Hussain, Harsh Sethi and Ganeshan Wignraja (1991):
Participatory Development: Learning from South Asia, United Nations University
Press.

4.7 MODEL QUESTIONS

1. “Ontological assumptions guide the research strategy to be followed in


conducting the research study”. Explain.
2. Discuss the different frameworks of research. Which one seems most problematic
to you? Give reasons in support of your answer.
94

3. What is the distinction between PRA and RRA approach of qualitative research?
Discuss the various methods and techniques of PRA with illustrations.
4. Develop a one or two page plan for a research study on topic of your choice
involving semi-structured interview as a major source of data.
5. Do you think that a researcher would make more progress using different
frameworks for different studies in the field? Give reasons.
6. How do the strategies of qualitative enquiry affect the method of data/material
collection?
7. Explain how interpretivist philosophy of science is significant departure from post
positivism?
8. Frame a research proposal of your own choice specifying the purpose of research
to conduct the study from a critical theory perspective.
9. Frame a research proposal of your own choice specifying the purpose of research
to conduct the study from a interpretive perspective.
10. Make a distinction between case study method and experimental method. Explain
the different steps involved in case study method.
11. What are the main sources of participatory rural appraisal?
95

Block 5 DATABASE OF INDIAN ECONOMY

Structure
5.1 Introduction
5.2 Objectives
5.3 An Overview of the Theme
5.4 Macro Variable Data
5.4.1 The Indian Statistical System
5.4.2 National Income & Related Macroeconomic Aggregates
5.4.3 National Income & Levels of Living
5.4.4. Saving
5.4.5 Investment
5.5 Agricultural Data
5.5.1 Introduction
5.5.2 Agricultural Census
5.5.3 Studies on cost of Cultivation
5.5.4 Annual Estimates of Crop Production
5.55 Livestock Census
5.5.6 Data on Production of Major Livestock Products
5.5.7 Agricultural Statistics at a Glance (ASG)
5.5.8 Another Source of Data on Irrigation
5.5.9 Other Data on the Agricultural Sector
5.6 Industrial Data
5.6.1 Introduction
5.6.2 Data Sources Covering the Entire Industrial Sector
5.6.3 Factory (Registered) Sector – Annual Survey of Industries (ASI)
5.6.4 Monthly Prodn. of Selected Industries and Index of Industrial Production (IIP)
5.6.5 Industrial Credit and Finance
5.6.6 Contribution to GDP
5.7 Trade
5.7.1 Introduction
5.7.2 Merchandise Trade
5.7.3 Services Trade
5.7.4 E-Commerce
5.8 Finance
5.8.1 Introduction
5.8.2 Public Finances
5.8.3 Currency, Coinage, Money and Banking
5.8.4 Financial Markets
5.9 Social Sectors
5.9.1 Introduction
5.9.2 Employment, Unemployment & Labour Force
5.9.3 Education
5.9.4 Health
5.9.5 Environment
5.9.6 Quality of Life

5.10 Let Us Sum Up


5.11 Further Suggested Reading
5.12 References
5.13 Model Questions
96

5.1 INTRODUCTION

We noted in Block 2 that statistical data constitutes an essential input to the research
process and talked about the methods and tools of data collection. One of the tools of data
collection, we noted, is to assemble secondary data or data already collected, compiled
and published by other agencies and make use of the same if these met the requirements
of the proposed research endeavour. A large number of Government agencies and several
non- Government agencies collect, compile, analyse and publish data on various aspects
of the Indian economy and society. Such data cover the performance of the economy in
different directions, socio-cultural trends and the impact of such performance on the
levels of living of different sections of society. Let us look at in this Block the kind of
data available, their quality, reliability and timeliness for the purposes for which these are
collected.

5.2 OBJECTIVES

After going through this Block, you will be able to:

• know the manner in which the Indian statistical system is organized;


• describe the data on national income and related macro aggregates, saving and
investment that are useful for analyzing various aspects of Indian economy;
• state the use of the input-output transaction table compiled by CSO in economic
and econometric analysis;
• appreciate the limitations of estimates of corresponding state and district level
aggregates for assessing regional progress;
• know the different sources of agricultural data and limitations of their data;
• explain the multivariable industrial data and their reliability;
• describe the kind of data available on trade
• explain the divergence between RBI’s BOP data and DGCI&S data;
• discuss the agencies involved in the compilation of data on finance;
• know the sources of data on various aspects of employment and unemployment,
labour welfare, education, health, levels of consumption and environment that
determine the quality of life of people; and
• describe the concepts used for collecting data on different variables of social
sector.

5.3 AN OVERVIEW OF THE THEME

We shall look at the database of the Indian economy in this Block. Section 5.4 starts with
a short description of the Indian statistical system and then moves on to discuss the kind
of data compiled and disseminated on the overall performance of the economy - macro
variables depicting it like the national income and state income, the national and regional
accounts, the input output transaction table depicting inter-relationships between
economic activities, instruments facilitating growth like saving and investment and
finally, the real test of economic performance, namely, standards of living of the people.
The databases of important individual sectors of the economy are dealt with in the
subsequent sections. Section 5.5 discusses available data on the production of agricultural
97

crops, cost of cultivation of crops, agricultural holdings and inputs to agriculture


including irrigation, livestock and livestock products and agricultural credit. Section 5.6
deals with the kind of data available on industrial production and employment and related
technical ratios in the organised and unorganised sectors, the index of industrial
production and industrial credit. Section 5.7 looks at data on trade – merchandise and
services – and quantum and unit value indices on merchandise trade as also different
measures of terms of trade. Section 5.8 dwells upon data on the lifeline of the economy -
finance. It discusses availability of data on Central and State Government finances,
transactions with the rest of the world, currency, coinage, money, banking and financial
markets. The discussion then focuses attention on the database of several sub sectors of
the social sector. (Section 5.9). First is the means of participation in the development
process and benefit from it – employment. Sub-section 5.9.2 discusses data available on
employment, unemployment and the quality and adequacy of employment. Sub-section
5.9.3 looks at data on efforts to develop human capabilities – the educational
infrastructure, the extent of utilisation of the infrastructure and the impact of this on the
educational profile of the population. Sub-section 5.9.4 deals with data on health
infrastructure and its impact on the health status of the population. Sub-section 5.9.5
looks at the database of an area that is crucial for the future of human existence itself –
environment – and the steps taken to monitor and control pollution. Finally, information
available for an assessment of the progress towards the ultimate objective of development
planning – quality of life of the country’s people - is discussed in Section 5.9.6. Section
5.10 is a short summing up of the Block.

Each Section/subsection ends with a box guiding the reader to relevant portions of one or
more publications that contain more details on the subject handled in it. Full details of
these publications are indicated in Section 5.12. Section 5.11 is to enable the reader to be
in touch with emerging developments relating to the review, refinement and expansion of
the database in different aspects/sectors of the economy. Section 5.13 is for evaluation of
the reader’s knowledge of the subject matter covered in this Block.

5.4 MACRO VARIABLE DATA

5.4.1 The Indian Statistical System

The Indian Statistical System generates data generally through large scale enquiries like
the Census or sample surveys of the kind conducted by the National Sample Survey
Organisation (NSSO), periodic statutory returns received by Government
Departments/organisations and as a by-product of administration at different levels.
Someone has to take the lead, in such a situation, to ensure adoption of appropriate
standards, concepts and definitions for the phenomena on which statistical data are
collected. The necessary institutional structures were created in India in the early Fifties
and strengthened over the years. Most recently, the National Statistics Commission
(NSC) made detailed recommendations to revamp the Indian statistical system to ensure
the quality, reliability and timeliness of data generated by the system.

At the apex of Indian Statistical System is the permanent National Commission on


Statistics (NCS) to serve as a nodal body for all core statistical activities of the country,
98

evolve, monitor and enforce statistical priorities and standards and to ensure statistical
co-ordination among the different agencies involved. The Chief Statistician of India
functions as the Secretary to NCS and also as the ex-officio Secretary, Ministry of
Statistics & Programme Implementation (MOSPI). The Statistics Wing of MOSPI,
functioning under the guidance of NCS, is the nodal Ministry in the Government of India
for the integrated development of the statistical system in the country, coordination of the
work statistical directorates/divisions in the Central Ministries and the State Governments
and for all policy matters relating to the Indian Statistical Institute (ISI). It has under it
three organisations, the Central Statistical Organisation (CSO), the National Sample
Survey Organisation (NSSO) and the Computer Centre (CC). The State Directorate of
Economics & Statistics (SDES) is at the apex of the system at the State level, responsible
for coordination of statistical activities carried on by statistical cells/divisions/directorates
in different departments. SDESs have statistical offices in the districts and, in some cases,
also in the regions. CSO has revived the Conference of Central and State Statistical
Organisations (COCSSO). It is being held annually to deliberate matters relating to the
development of statistical data on aspects of socio-economic life of the country. Agencies
concerned disseminate the data they collect, process and analyse to data users in print or
in electronic formats. CSO disseminates not only its own data but also those relating to
different sectors and aspects of the economy and society published by other Government
agencies. So do the Reserve Bank of India (RBI) and several non-Government sources.
SDESs provide a similar service in the States.

It would be instructive to judge the Indian situation in an international setting.


Publications of the United Nations (UN)/its agencies, the International Monetary Fund
(IMF), the World Bank and regional agencies publish data on the economies of member
countries. The IMF has formulated a “Special Data Dissemination Standards” (SDDS)
covering real sector (national accounts, production index, price indices, etc.,), Fiscal
Sector, Financial Sector, External Sector and Socio-demographic data, to facilitate
transparency in the compilation/dissemination/cross-country comparison of data on
important aspects of the economy. Countries under SDDS provide to IMF a National
Summary Data Page for each area/sub-area listed in SDDS and the relevant metadata as
per a Dissemination Format and disseminate an advance release calendar on the internet
of the IMF’s Data Dissemination Bulletin Board (DSBB). CSO, the Registrar General
& Census Commissioner (RGI&CC), and RBI furnish the data required and an advance
release calendar for such data. The information provided by any SDDS country can be
accessed on the internet. (Search parameter Special Data Dissemination Standards IMF).

5.4.2 National Income and Related Macroeconomic Aggregates


(1) System of National Accounts (SNA)

How to assess the performance of an economy? Trends in the production of a specific


product or trends in the overall index of industrial production will only assess the
performance in the output of the specific product or the industrial sector. But we should
like to go beyond levels of output or production and look at performance in terms of
incomes flowing from output in the form of rent, wages, interest and profit to those
participating in the creation of the output, namely, the factors of production – land,
labour, capital and entrepreneurship. Alternatively, we would like to base our judgement
of performance on value addition made by the production system, namely, value of
output net of the (intermediate) costs incurred in creating the output. It is (i) this overall
99

value addition computed for all sectors/ activities of the economy that is referred to as the
National Product and (ii) macro-aggregates related to it and (iii) trends in (i) and (ii), that
can help us in analysing the performance of an economy.

National Income (NI) is the Net National Product (NNP). It is also used to refer to the
group of macroeconomic aggregates like Gross National Product (GNP), Gross Domestic
Product (GDP) and Net Domestic Product (NDP). All these of course refer to the total
value (in the sense mentioned above) of the goods and services produced during a period
of time, the only differences between these aggregates being depreciation and /or net
factor income from abroad. There are other macroeconomic aggregates related to these
that are of importance in relation to an economy. What data would you, as a researcher or
an analyst, like to have about the health of an economy? Besides a measure of the
National Product every year or at smaller intervals of time, you would like to know how
fast it is growing over time. What are the shares of the national product that flow to
labour and other factors of production? How much of the national income goes to current
consumption, how much to saving and how much to building up the capital needed to
facilitate future economic growth? What is the role of the different sectors and economic
activities – in the public and private sectors or in the organised and unorganised activities
or the households in the processes that lead to economic growth? How does the level and
pattern of economic growth affect or benefit different sections of society? How much
money remains in the hands of the households for consumption and saving after they
have paid their taxes (Personal Disposable Income) – an important indicator of the
economic health of households? What is the contribution of different institutions to
saving? How is capital formation financed? Such a list of requirements of data for
analysing trends in the magnitude and quality of, and also the prospects of, efforts for
economic expansion being mounted by a nation can be very long. Such data, that is,
estimates of national income and related macroeconomic aggregates form part of a
system of National Accounts that gives a comprehensive view of the internal and external
transactions of an economy over a period, say, a financial year and the interrelationships
among the macroeconomic aggregates. National Accounts thus constitute an important
tool of analysis for judging the performance of an economy vis-à-vis the aims of
economic and development policy.

The UN has recommended a System of National Accounts (SNA) to promote


international standards for compiling national accounts and as an analytical tool and
international reporting of comparable national accounting data. 1968 SNA and 1993 SNA
are the second and third (latest) versions. The 15th International Conference of Labour
Statisticians (ICLS) (January, 1993) adopted a resolution on statistics on the informal
sector, a sector that makes a sizeable contribution to national income, to help member
countries of the International Labour Organisation (ILO) in reporting comparable
statistics of employment in the informal sector. The UN Statistical Commission (UNSC)
and 1993 SNA endorsed it. India uses a mix of 1968 SNA and 1993 SNA in compiling
national accounts and is moving towards the full implementation of SNA methodology.

Read Section 13.8, NSC Report (2001), pp.535 – 543


100

(2) Estimates of National Income and Related Macroeconomic Aggregates

(a) Estimates Prepared by CSO

CSO of MOSPI compile and publish National Accounts, which include estimates of
National Income and related macroeconomic aggregates like NNP, GNP, GDP & NDP,
consumption expenditure, saving, capital formation and so on for the country and for the
public sector for every financial year. Quarterly Estimates (Qtly.Es) of .GDP are also
made. Estimates are prepared for any year at the prices prevailing in that year, that is,
estimates at current prices and at also at constant prices, that is, at the prices of a selected
year (called the base year). CSO changes the base year from time to time to take into
account the structural changes in the economy and depict a true picture of the economy.
The base year from January, 2006 is 1999-2000. Estimates of national accounts
aggregates are published in considerable detail in CSO’s Annual publication National
Accounts Statistics (NAS), the latest being NAS 2008. CSO releases through Press
Notes every January (on the 30th this year), Quick Estimates (QE) of GDP, National
Income, per capita National Income and Consumption Expenditure by broad economic
sectors for the financial year that ended in March of the preceding year (time lag - ten
months) and Revised Estimates (RE) of national accounts aggregates for earlier financial
years. Further, Advance Estimates (AEs) of GDP, GNP, NNP and per capita NNP at
factor cost for the current financial year are also released in February - two months
before the close of the financial year. (AEs for 2008-09 released on 9/2/2009.) These AEs
are revised thereafter and the updated AEs are released by the end of June, three months
after the close of the financial year. Meanwhile, by the end of March, Qly.Es of GDP for
the quarter ending December of the preceding year are also released. Thus by the end of
every financial year (31st March), AEs for that financial year, QEs for the preceding
financial year and the Qtly.Es up to the quarter ending December of the financial year)
become available. In fact, CSO sets before itself an advance release calendar for the
release of national accounts statistics over a period of two years, in line with SDDS
requirements.

NAS (NAS 2008) presents QEs of macroeconomic aggregates for 2006-07, AEs for
2007-08 and Qtly.Es of GDP for 1999-00 to 2007-08, summary statements of GNP,
NNP, GDP and NDP at factor cost at constant (1999-00) prices and market prices and
estimates of the components of GDP, aggregates like Government Final Consumption
Expenditure (GFCE), Private Final Consumption Expenditure (PFCE) in the domestic
market, Exports, Imports, the share of the public sector in GDP, industry wise GDP &
NNP, GDP at crop/item/category level and the consolidated accounts of nation.

CSO’s estimates of NDP for rural and urban areas by economic activity at current prices
for 1970-71, 1980-81 and 1993-94 are published in NAS 2000. The list of publications of
the National Accounts Division (NAD) of CSO can be seen in the MOSPI website.

See Tables in Parts I, II & V, NAS 2008.

(b) Other Publications Giving CSO’s Estimates:


101

CSO’s Monthly Abstract of Statistics (MAS) and the annual Statistical Abstract of
India (SAI), RBI’s Monthly Bulletin and the Handbook of Statistics on the Indian
Economy (2008), RBI website http://www.rbi.org.in), Centre for Monitoring Indian
Economy (CMIE) (Economic Intelligence Unit – EIU), Mumbai publication
National Income Statistics and the publication of Economic and Political Weekly
Research Foundation - EPWRF (EPWRF, December, 2004)”, and www.epwrf.res.in
also give time series estimates of national income and related macro aggregates.

(c) Limitations of the Estimates:

The concepts and methodology used and the data sources utilised for making these
estimates are set out in two publications, namely, (CSO 2007) and (CSO, 1999a). The
methodology for (i) the New series of National Accounts Statistics with base year 1999-
2000 is given in the Brochure on New Series on NAS (Base Year 1999-2000), (ii) AEs
in: NAS 1994, (iii) estimates of factor incomes in NAS – Factor Incomes (March, 1994)
and (iv) Qtly.Es of GDP in a Note in NAS 1999. Besides, the publication NAS of every
year has a chapter “Notes on Methodology and Revision in the Estimates”. Sections 13.2
& 13.3, Chapter 13, of the NSC Report, (pp. 436 to 492) also contain methodological
and conceptual details and data sources utilised in estimating National Income and related
macroeconomic aggregates, data gaps and measures to overcome these. Changes in and
adoption of improved methodology, expansion of the coverage of the estimates, change
in the base year, improvements in the quality of data and the use of new data sources over
the last 50 years all have their beneficial impact on the quality of national income
estimates, etc., that is, in estimating the “true values” of the aggregates as correctly as
possible. These efforts can also affect the comparability of estimates over time although
CSO does make all efforts to minimise the level of non-comparability.

Read the chapter “Notes on Methodology and Revisions in the Estimates” in CSO
(2005), pp.220 – 228; and also the same chapter in CSO (2008).

(3) The Input – Output Table:

Any economic activity is dependent on inputs from other economic activities for
generating its output and the output from this economic activity serves as inputs for
producing the output from other activities. Data relating to such interrelationships among
different sectors of the economy and among different economic activities are thus
important for analysing the behaviour of the economy and, therefore, for formulation of
development plans and setting targets of macro variables like output, investment and
employment. Such an input-output table will also be useful for analysing the impact of
changes in a sector of the economy or economic activity on other sectors of the economy
and indeed the entire economy. CSO publishes an Input-Output Transaction Table (I-
OTT) every five years since 1968. The latest is the one relating to 2003-04. It gives,
besides the complete table, the methodology adopted, the database used, analysis of the
results and the supplementary tables derived from the I-OTT giving the input structure
and the commodity composition of the output. The Planning Commission updates and
recalibrates the I-OTT and prepares Input-Output Tables (I-OT) for the base and the
terminal years of a Five Year Plan and publishes the results of such an exercise as the
Technical Note to the Five Year Plan. (The latest is the one for the Tenth Plan.) It
102

contains the relevant I-OT, the methodology adopted and related material. The two I-OTs
are useful in economic and econometric analysis.

(4) Regional Accounts - Estimates of State Income and related Aggregates

(a) Estimates of State Domestic Product (SDP) Prepared and Released by State
Governments and Union Territory Administrations

State Accounts Statistics (SAS) consist of various accounts showing the flows of all
transactions between the economic agents constituting the State economy and their
stocks. The most important aggregate of SAS is the State Domestic Product (SDP) (State
Income). Estimates of GSDP and NSDP at constant and current prices are being prepared
and published by all SDES except those of Dadra & Nagar Haveli, Daman & Diu and
Lakshadweep. These estimates are also available in the CSO website and the
publications of the preceding section and in EPWRF (June, 2003) and its CD ROM..

[Read Section 13.7, Chapter 13, NSC Report, pp. 528 – 535 and Annexures 13.8 to 10]

(c) Limitations of Estimates of SDP

The preparation of estimates of SDP call for more detailed data than for the preparation
of national level estimates, especially on flows of goods and services and incomes across
geographical boundaries of States/Union Territories. Conceptually, estimates of SDP can
be prepared by two approaches -.the income originating approach and the income
accruing approach. In the former case, the measurement relates to the income originating
to the factors of production physically located within the area of a State. In other words it
is the net value of goods and services produced within a State. In the latter case, the
measurement relates to the income accruing to the normal residents of a State. The
income accruing approach provides a better measure of the welfare of the residents of the
State and also for preparing Human Development Indices (HDI), but it calls for data on
inter-State flow of goods and services and incomes, which are not available. Thus only
the income originating approach is used in preparing estimates of SDP. This has to be
kept in mind while using estimates of SDP. Although efforts have been made by the CSO
over the years to bring about a good degree of uniformity across States and Union
Territories in SDP concepts and methodology, SDP estimates of different States are not
comparable. The successive Finance Commissions got comparable estimates of NSDP
and per capita NSDP made by CSO for their work (available in the Reports of the
successive Finance Commissions). EPWRF (June, 2003) also provides comparable
estimates of SDP and compares these with those made for Finance Commissions. The
question of comparability of estimates of SDPs is important for econometric work
involving inter-State or regional comparisons.

[Read 1. Section 13.7.1 & 13.7.2, Chapter 13, pp. 528 – 535 NSC Report; 2. Preface
and Chapters 5, 7,8 & 10 in EPWRF (2003); 3.CSO (1974); 4. CSO(1976); 5. CSO
(1979); 6. CSO (1980)]
103

(5) Regional Accounts - Estimates of District Income

The need for preparing estimates of district income has become urgent in the context of
decentralisation of governance and the importance of, and the emphasis on, decentralised
planning. Estimates of District Domestic Product (DDP) are being prepared by ten
SDESs using income originating approach and published in State Statistical
Handbooks/Abstracts/Economic Surveys and also posted on their websites. (Another
State is preparing estimates only for commodity producing sectors.) It is necessary to
make adjustments in these estimates for flow of incomes across territories of districts (or
States) that are rich in resources like minerals and forest resources and where there is a
daily flow of commuters.

[Read 1. Subsection 13.7.7, Chapter 13, NSC Report, p. 532; 2. Paper on Methodology
for DDP in 1996 by SDESs Uttar Pradesh & Karnataka, CSO website; 3. Katyal, R.P.,
Sardana, M.G., Satyanarayana, J. (2001). ]

5.4.3 National Income and Levels of Living

What do trends in macroeconomic aggregates say about the welfare of different sections
of society? Precious little, perhaps, especially when these are considered without
information on the distribution of these aggregates among these sections of society. Per-
capita national income or even per-capita personal disposable income can only indicate
overall (national) averages. Distribution of population by levels of income can be a big
step forward in understanding how well the performance in the growth of GDP has
translated into, or has not translated into, improvements in levels of living for sections of
society below levels considered the minimum desirable level. It would also help us
analyse trends in levels of inequalities in living standards, unemployment and
employment, quality of employment, the health status of people and the status of women.
Or, to consider all these together, what are the levels of human development and gender
discrimination? Such lines of analysis and the data required for the purpose are important
from the point of view of planning for a strategy of growth with equity.

The quinquennial Consumer Expenditure Surveys of the NSSO, the latest being the
61st Round (2004-05), provide the distribution of households by monthly per capita
consumption expenditure (MPCE) classes. Data on trends in the growth rate of
employment are available from the quinquennial employment and unemployment surveys
of the NSSO (the latest being the 61st Round). These and the GDP data enable us to look
at trends in employment elasticity. Comprehensive indicators like HDI and the Gender
Discrimination Index (GDI) have been prepared for the country and the States by the
Planning Commission (Human Development Report – HDR - 2001) and for individual
States and districts by several State Governments. These contain detailed data on
different facets of levels of living. All the reports are available in print and electronic
on their websites.

Read 1. Chapter 1 (pp. 1 – 6) and Technical Appendix (pp.132 – 133), HDR 2001 of the
Planning Commission; 2. Sub-sections 9.8.6 to 9.8.21, Chapter 9, NSC Report, pp. 333
– 336.
104

5.4.4 Saving
As you are aware, broadly speaking, GNP is made up of consumption, saving, exports net
of imports, besides net factor income from abroad. Saving is important in as much as it
goes to finance investment, which in turn brings about growth of GNP. What is the
volume of Savings relative to GNP? How much of it is consumed by the needs of
depreciation? Who all contribute, and how much, to the total volume of Savings? Let us
see what kind of data is available on such questions.

Estimates of Gross Domestic Saving (GDS) and Net Domestic Saving (NDS) in current
prices and the Rate of Saving are made by CSO and published in the National Accounts
Statistics (NAS) and the Press Note of January of every year releasing Quick
Estimates. . These are first made for any year along with QEs of GDP, etc., and revised
and finalised along with the revision of QEs of GDP. etc., subsequently. The structure of
Saving, that is, the distribution of GDS and NDS by type of institution – household
sector, private corporate sector and public sector are also available in NAS. Part III of
NAS 2008 also presents the time series of estimates of GDS and NDS in current prices
from 1950-51. Statistics on Saving are also published in the publications mentioned
under national accounts. Estimates of Gross and Net Domestic Saving at the State and
Union Territory levels are not being made at present by SDESs (as per the NSC Report).

Limitations that estimates of Savings suffer from are indicated in NSC Report. A high
level Committee on Savings under the Chairmanship of Dr. Rangarajan (2007) is making
a critical review of estimates of savings and investments in the economy.

[Read Sub-sections 13.6.1 to 13.6.6 (pp. 508 – 509) & 13.6.10 to 13.6.16(pp. 511 – 528),
NSC Report.]

5.4.5 Investment

Investment is Capital Formation (CF). Investment of money in the shares of a company is


not investment but buying a house or machinery is investment. In other words,
investment is creation of physical assets like machinery, equipment, building and so on,
adding to the capital stock (of such assets) in the economy, enhances the productive
capacity of the economy. Investment or CF is another important component of GNP and
the rate of investment – expressed as a proportion of GNP – largely determines the rate of
growth of the economy. How is capital formation financed by the economy? What is the
contribution of different sectors to capital formation or, how much is used up by different
sectors? What is the capital stock available in the economy? These are all the questions
that rise in one’s mind when considering strategies for economic growth. What kind of
data is available?

The annual documents NAS of CSO present such data. NAS 2008 presents estimates of
Gross Domestic Capital Formation (GDCF), Gross Domestic Fixed Capital Formation
(GDFCF), Change in Stocks, Consumption of Fixed Capital (cfc), Net Domestic Fixed
Capital Formation (NDFCF) and Net Domestic Capital Formation (NDCF) in current
prices and at constant (1999-00) prices. These estimates are made along with QEs of
National Income every January (as in 2009) and the revision of these estimates proceeds
along with that of the estimates of national income aggregates. Thus NAS 2008 and the
MOSPI Press Note of 30/1/09 also present estimates of the distribution of these
105

aggregates at current and constant prices by type of institutions, by economic activity, the
manner in which CF is financed, external (current and capital) transactions and so on.
Publications referred to in GDP etc., sub-section also present time series of such data but
the one of EPWRF also provides capital-output ratios and average net fixed capital stock
ratios (NFCS) to output ratios (ACOR). CSO publications and the NSC Report contain
the relevant methodological details. Estimates of GFCF at the State level are being
prepared in 14 States. See also the EPWRF publication on NAS. .

Gaps in data for the estimation of capital formation exist in a number of relevant areas, as
indicated in NSC Report.

See Parts III & V, NAS 2008.


[Read 1. Section 13.6.7 to 13.6.16, Chapter 13, NSC Report, pp. 509 – 528; 2. Chapters
6, 12, Exhibit E - C.4, Statistical Annexures VI & VII, EPWRF ( June, 2003).]

5.5 AGRICULTURAL DATA

5.5.1 Introduction

You are aware of the importance of agriculture to the Indian economy and indeed to the
Indian way of life. You would, therefore, like to examine several aspects of agriculture
like the level of production of different crops and commodities, the availability and
utilisation of important inputs for agricultural production, incentives, availability of post-
harvest services and the role of agriculture in development. Similarly, you would like to
know about livestock and their products, fisheries and forestry, people engaged in these
activities and so on. All these analyses require enormous amount of data over time and
space. Let us have a look at what kind of data are available and where.

The Directorate of Economics and Statistics, Ministry of Agriculture and


Cooperation (DESMOA) and the Animal Husbandry Statistics Division (AHSD) of
the Department of Animal Husbandry, Dairy and Fisheries (DAHDF) of the same
ministry are the major sources of data on agriculture and allied activities. Some of the
major efforts at collection of data mounted at regular intervals are (i) the quiquennial
agricultural census and input survey, (ii) cost of cultivation studies, (iii) annual estimates
of crop production, (iv) the quinquennial livestock census and (v) integrated sample
survey to estimate the production of major livestock products. The major publications
containing statistics flowing from these activities are DESMOA’s Agricultural
Statistics at a Glance (ASG) (annual, also accessible at www.dacnet.nic.in ), Cost of
Cultivation in India and the monthly bulletin “Agricultural Situation In India” and
ASHD’s biennial publication Basic Animal Husbandry Statistics (BAHS).

5.5.2 Agricultural Census

Started from 1970-71, the seventh census related to 2001. The census collects data on
holdings like its area, the gender and social group of the holder, irrigation status, tenancy
particulars, the cropping pattern and the number of crops cultivated. The Input Survey,
conducted in the following year, gathers data on the pattern of input-use across crops,
regions and size-groups of holdings, covering infrastructural facilities, chemical
fertilizers, organic manures, pesticides, agricultural implements and machinery, livestock,
106

agricultural credit and seeds. The results of the 2001 Agricultural Census and those of
the Input Survey, 1996-97 are on the census website at the national and State levels and
also ASG 2008). The results of the next census (2005-06) and Input Survey (2006-07) are
awaited. Those of Input Survey 2001-02 are being finalized.

[Read 1. Sections 4.9 (pp.136 – 139), 4.14 (pp. 146 – 147) & 4.22 (159 – 161), NSC
Report; 2 http://www.agcensus.nic.in ; 3. Table Set 16, ASG (2008).]
------------------------------------------------------------------------------------------------------------

5.5.3 Studies on Cost of Cultivation

DESMOA implements a comprehensive scheme for studying the cost of cultivation of


principal crops in India and this results in the collection and compilation of field data on
the cost of cultivation and production in respect of 29 principal crops leading to estimates
of crop-wise and State-wise costs of cultivation and also computation of the index of the
terms of trade between agriculture and non-agricultural sectors (ITT). The scheme
covers 16 States and foodgrain crops, oil seeds and commercial crops and selected
vegetables. These are published in Cost of Cultivation in India and in ASG. The
Commission for Agricultural Costs and Prices (CACP) makes use of these estimates
and data on number of variables and make recommendations to Government on
Minimum Support Prices (MSP). MSPs for different commodities and ITT are also
published in ASG.

[Read 1. http://dacnet.nic.in/cacp 2. Table Set 8, ASG (2008), 4. Sections 4.12 (pp.142


– 144) and Sub-sections 4.20.3 to 4.20.6 and 4.20.8 (pp. 156 – 157), Chapter 4, NSC
Report.]

5.5.4 Annual Estimates of Crop Production

DESMOA makes annual estimates of area, production and yield of principal crops of
foodgrains, oil seeds, sugar cane, fibres and important commercial and horticulture crops.
These crops account for about 87% of the total agricultural output. Estimates of area and
yield form the basis of these estimates. While estimates of area are based on a reporting
system that is a mix of complete coverage and coverage by a sample, those of yield are
based on a system of crop cutting experiments and General Crop Estimation
Surveys. Advance estimates of crop production are also required even before the crops
are harvested for policy purposes. The first such assessment of the kharif crop is made in
the middle of September, the second – a second assessment of the kharif crop and the
first assessment of the rabi crop – in January, the third at the end of March or early April
and the fourth in June. Time series of final estimates of annual production3, gross area
under different crops and yield per hectare of these crops and Index Numbers on these
variables [base year the triennium ending (TE) 1993-94 = 100] are published in ASG. So
are estimates of production of crops by States.

[Read 1. pp.1 – 4 and Table Set 4, ASG 2008; 2. Sections 4.2 to 4.4, Chapter 4, NSC
Report, pp. 118 – 128.]

3
Crop area forecasts and final area estimates are now sample based as suggested by NSC
107

5.5.5 Livestock Census

The latest (17th ) quinquennial livestock census for which results are available, conducted
in October, 2003, collected information, district-wise on livestock, poultry, fishery and
also agricultural implements. Livestock covers cattle, buffaloes, sheep, goats, pigs, horses
and ponies, mules, donkeys, camels, yak, mithun and also pigs, dogs and rabbits. These
are classified by age, sex, breed, function. Poultry covers cock, hen, duck, and drake,
which are classified as desi and ‘improved’ varieties. Fishery covers fishing activity
(inland capture, inland culture, marine capture and marine culture), persons engaged in
fishing, craft/gear by type, size and horsepower, agricultural implements/equipment,
equipment for livestock and poultry and horticulture tools. The results are available on
the DAHDF website up to district level. ASG and BAHS also present some census data.
The fieldwork for the 18th Livestock Census has been completed in October, 2007.
Quick results of the Census based on village/ward are expected by March, 2009.

[Read Section 4.13 & 4.14, Chapter 4, NSC Report pp. 144 – 147; & www.dahd.nic.in

5.5.6 Data On Production Of Major Livestock Products

AHSD is responsible for collection of statistics on animal husbandry, dairy and fisheries.
These are published in BAHS. The latest relates to 2006 and it presents data, production
of milk, eggs, meat and wool, per capita availability of milk and eggs, contribution of
cows, buffaloes and goats to milk production and of fowls and ducks to egg production,
imports/exports of livestock/livestock products, area under fodder crops, pastures and
grazing, dry and green fodder production, artificial inseminations performed,
achievements in key components of dairy development, livestock and poultry. State wise
and time series data are presented in most cases.

[Read 1. Section 4.15, Chapter 4, NSC Report pp. 147 – 148; 2. www.dahd.nic.in ; 3.
Table Sets 19 &

5.5.7 Agricultural Statistics At a Glance (ASG)

The total geographical area of the country is made up of land and water bodies like rivers
and lakes. Land in turn consists of forests, barren and uncultivable land, land used for
non-agricultural purposes, pastures, fallows, cultivable land and so on. What is the
pattern of utilisation of land and how has this pattern been changing over time? How
much is used for agriculture? Land utilisation statistics are available in ASG. ASG also
provides information on size distribution of operational holdings, cropping intensity,
irrigation status, irrigation source, consumption of fertilizer and farmyard manure by size
classes of operational holdings and crops, soil conservation, utilisation of inputs and so
on.

The subject Subsidies is a much-debated subject nowadays and agricultural subsidies in


developing countries and developed countries of Europe and in USA are also much in the
news. ASG provides a time series of the amount of subsidy given to agriculture with its
break-up into subsidy for (i) fertilizers, (ii) electricity and (iii) irrigation (the excess of
108

operating costs of the Government Irrigation System over gross revenue is treated as the
imputed irrigation subsidy) and (iv) other subsidies given to marginal farmers and
Farmers’ Cooperative Societies in the form of seeds, development of oil seeds, pulses,
etc. ASG also presents the share of agricultural subsidies in selected OECD countries –
and in particular a that shows the amount of support to farmers, irrespective of the
sectoral structure of a given country.

Other kinds of data on the agricultural sector presented in ASG are procurement of food
and non-food grains, Marketed Surplus Ratios of important agricultural commodities, per
capita availability of important articles of consumption, stocks of cereals, imports and
exports of agricultural commodities and so on.

[Read Table Sets 9 to 16, ASG (2008)]

5.5.8 Another Source of Data on Irrigation

Besides DESMOA, data on irrigation are collected by the Central Water Commission
(CWC) under the Ministry of Water Resources (MOWR). CWC collects
hydrological data on all the important river systems in the country through 877
hydrological observation sites. The Ministry conducts periodic Censuses of Minor
Irrigation Works along with a sample check to correct the Census data. The latest
Census related to 2000-01. The report, which can be seen in www.wrmin.nic.in,
provides information on minor irrigation works like the type of works, crop-wise
utilisation of the potential created, the manner of distribution. NSC has stressed the need
for statistical analysis of the data with CWC and the MOWR, for users being made aware
of the reasons for variation between MOWR data and DESMOA data and reduction in
time lag of both data.

[Read Section 4.8, Chapter 4, NSC Report, pp. 134 – 136, & Annexure 4.7]

5.5.9 Other Data on the Agricultural Sector

Data on forest cover is part of land-use statistics presented on the basis of a nine fold
land-use classification in ASG. Forest Survey of India (FSI) also collects data on forest
cover through a biennial survey by using Remote Sensing (RS) technology since 1987.
Digital interpretation has reduced the time lag in the availability of such data obtained
earlier through periodic reports from field formations. There are discrepancies between
ASG & FSI data on forest area due to differences in concepts and definitions. Data on
production of industrial wood, minor forest produce and fuel wood are available with the
Principal Chief Conservator of Forests in the Ministry of Environment & Forests.

The annual reports of National Bank for Agriculture and Rural Development
(NABARD) and its other publications like Statistical Statements Relating to the
Cooperative Movement in India and Key Statistics on Cooperative Banks, besides its
website and the RBI Handbook are useful sources of information on agricultural credit.
NAS provides data on the contribution of agriculture and its sub-sectors to GDP and
other measures of national/domestic product, value of output of various agricultural
crops, livestock products, forestry products, inland fish and marine fish and on capital
formation in agriculture and animal husbandry, forestry and logging and fishing.
109

[Read Sections 4.5 (pp. 129 – 130) and 4.17 (pp. 150 – 152), Chapter 4, NSC Report.]

5.6 INDUSTRIAL DATA

5.6.1 Introduction

The industrial sector can be divided into a number of subgroups on the basis of
framework factors like coverage of certain laws, employment size of establishments or
criteria for promotional support by Government. Such groupings are the organised and
unorganised sectors, the factory sector (covered by the Factories Act, 1948), small-scale
industries, cottage industries, handicrafts, khadi and village industries (KVI), directory
establishments (DE) (those employing six or more persons), non-directory establishments
(NDE) (employing at least one person), own account enterprises (OAE) (sel employed).
Attempts have been made to get at a detailed look at the characteristics of some of these
sub-sectors of the industrial sector, as the data sources covering the whole sector often do
not provide information in such detail. Let us turn to the kind of data available for
individual subgroups and those that cover the entire industrial sector.

5.6.2 Data Sources Covering the Entire Industrial Sector

These sources provide levels of industrial employment. The first is the decennial
Population Census (2001 is the latest) providing data, up to the district level, on levels
of employment (i) by economic activities and broad occupational divisions and (ii) by
economic sectors, age groups and education. The time lag in availability is large. The
second is the quinquennial sample surveys relating to employment and unemployment
conducted by the National Sample Surveys Organisation (NSSO), the latest being for
2004-05. These also provide similar type of data on industrial employment up to State
levels within a year or two. Data by the 72 NSS regions are also possible with the unit
record data available on floppies from NSSO. The third is the Employment Market
Information Programme (EMIP) of the Directorate General of Employment &
Training (DGE&T), Ministry of Labour & Employment and the State Directorates
of Employment (SDEs), based on statutory quarterly employment returns from non
agricultural establishments in the private sector employing 10 or more persons and all
public sector establishments. (the organised sector). It provides data on employment in
the organised sector at quarterly intervals down to district levels (Quarterly Reviews) in
about a year’s time. Detailed data by economic activity are available in the Annual
Employment Reviews of the DGE&T and SDEs after a large time lag..

Economic Census (EC): Conducted by CSO since 1977, the latest (the fifth) EC was in
2005. It covers all economic enterprises in the country except those engaged in crop
production and plantation and provides data on employment in these enterprises, besides
providing a frame for the conduct of more detailed follow up (enterprise) surveys (FuS)
covering different segments of the uorganised non-agricultural sector. EC gathers basic
information on the number of enterprises and their employment by location, type of
activity and nature of operation. The all-India Report for EC 2005 (accessible on the
MOPSI website) and most of the State reports have been published.

5.6.3 Factory (Registered) Sector – Annual Survey of Industries (ASI)


110

The Annual Survey of Industries (ASI) launched in 1960 collects detailed industrial
statistics relating to industrial units in the country like capital, output, input, value added,
employment and factor shares and the survey has been conducted every year since 1960
except in 1972. The frames for the survey since 1998-99 consists of (i) all factories
registered under Sections 2m(i) and 2m(ii) of the Factories Act, 1948 employing 10 or
more workers using power as well as those employing 20 workers but without using
power and (ii) biri and cigar manufacturing establishments registered under the Biri and
Cigar Workers (Conditions of Employment) Act, 1966 with coverage of units as in (i)
above.
.
The reference period for the survey is the accounting year April to March preceding the
date of the survey. The sampling design and the schedules for the survey were revised in
1997-98, keeping in view the need to reduce the time lag in the availability of the results
of the survey. The survey does not attempt estimates at the district level. NIC 04 is used
for classifying economic activities from ASI 2004-05. Final results of ASI 2004-05 have
been released results relating to selected characteristics at various levels of aggregation
available in the ASI section of MOSPI website are: (i) all industries by States, (ii) all
India by 2-digit level of NIC 04 with rural-urban break-up, (iii) all India by 2/3/4- digit
level of NIC 04, (iv) States by 2/3/4- digit level of NIC 04 and (v) Unit level data with
suppressed identification, etc. Data for the past surveys are also available on the website.

CSO has also released time series data on ASI in 5 parts, each volume covering parts
of the period 1959 to 1997-98, which present data on important characteristics for
all-India at two-digit and three-digit NIC code levels and for the States at two-digit
NIC code levels. These publications are also available in electronic media on payment.
EPWRF (April, 2002) also provides time series ASI data on the principal
characteristics of the factory sector along with concepts and definitions used. These
are also on the website of EPWRF and on interactive CD ROMS.

The data available from ASI can be used to derive estimates of important technical ratios
like capital-output ratio, labour-output ratio, capital–labour ratio, labour cost per unit of
output, factor shares in net value added and productivity measures for different industries
as also trends in these parameters. The most important use of the detailed results arises
from the fact that these enable derivation of estimates of (i) the input structure per unit of
output at the individual industry and (ii) the proportions of the output of each industry
that are used as inputs in other industries, enabling us to use the technique of input-output
analysis to evaluate the impact of a change effected in (say) the output of an industry on
the rest of the economy. The construction of the I-O TT for the Indian economy is largely
based on ASI data.

[Read 1. ASI section of MOSPI website; 2. EPWRF (April, 2002); 3. Section 5.1,
Chapter 5, NSC Report, pp.162 – 173.]

5.6.4 Monthly Production of Selected Industries and Index of Industrial


Production (IIP)
111

CSO prepares and releases monthly indices of industrial production (IIP) and the
monthly use-based index of industrial production (base year 1993-94). The present
IIP with base year 1993-94 is a quantitative index based on production data received from
14 source agencies covering 543 items clubbed into 285 groups in the basket of items of
the index. The SDESs had been preparing IIPs for their respective areas but these were
not comparable with each other or with CSO’s national IIP because of differences in the
base year, basket of items, data and methodology used for constructing the indices. The
work of preparing State-wise IIPs comparable with the national IIP is at different stages
in different States and Union Territories. CSO releases Quick Estimates of IIP within
six weeks of the close of the reference month, in line with SDDS requirements. CSO has
released the IIP for December, 2007, the first revision of IIP for November 2007 and
the final IIP for September, 2007 through the Press Release of 12/2/09 (see MOSPI
website). NSC has a made a number of recommendations to improve the quality of IIP.

The CSO monthly publication Monthly Production of Selected Industries in India


provides monthly data regarding production in individual industries covered in IIP along
with monthly IIP at 2-digit level and monthly use-based IIP. These are also published in
the Monthly Abstract of Statistics and RBI Handbook. The latter also gives index
numbers of Infrastructure Industries. The Indian Bureau of Mines (IBM) publishes a
Monthly Statistics of Mineral Production. The CSO publication Energy Statistics
(the latest is 2008) provides at one place data on different sources of energy - time series
on production, availability, consumption and price indices of major sources of
conventional energy, which is also available on floppies. The Ministries of Petroleum
and Gas, Power and Non-Conventional Energy provide information in their respective
spheres of activity. EIS of CMIE publishes volumes on Energy and Infrastructure that
present detailed data on the trends in these sectors and the EPWRF website gives time
series data on industrial production.

[Read MOSPI & EPWRF websites; Section 5.4, Chapter 5, NSC Report. Pp.187 – 200].

5.6.5 Data on the Unorganised Industrial Sector

The Development Commissioner for Small Scale Industries (DCSSI) in the Central
Ministry of Small Scale Industries and the State Directorates of Industries provide data
on small-scale industrial units registered with the latter set of agencies. The DCSSI has
conducted a census of small scale industrial units thrice – the latest in November, 2002
(reference year 2001-02). The results of the third census are in the publication Final
Results: Third all India Census of SSI – 2001-02 of the Ministry of Small Scale
Industries. Broad details of the performance of small-scale industries are available in the
Annual Reports of the Ministry of Small Scale Industries. Time series data on
employment, production, labour productivity in small-scale industries (SSI) and value of
exports of the products of small-scale industry are also available in the RBI Handbook.
Data on some part of Khadi and Village Industries Commission (KVIC), handlooms
and handicrafts do get included in ASI but data relating exclusive to these sub-sectors are
available in the Annual Reports of these organisations or in the Annual Reports of the
Ministries under which these Boards/Commissions function.
112

FuSs as a follow up of EC have covered unorganised manufacturing at quinquennial


intervals from 1978-79 to 2005-06 (and during 1999-00 and 2000-01). The results of
2005-06 survey (62nd Round) have been published in NSSO reports numbered 524 to
526 on the unorganised manufacturing sector (i) operational characteristics; (ii)
employment, assets and borrowings; and (iii) input, output and value added. NSC has
made a number of recommendations for enhancing the quality of data on this sector.

How is the capital financed? Let us look at some of the sources that throw light on these
matters in the next sub-section.

[Read Sections 5.2 & 5.3, Chapter 5, NSC Report. Pp. 173 – 187.]

5.6.6 Industrial Credit and Finance

The RBI Handbook provides time series data on the sectoral deployment of non-food
gross bank credit provided by Scheduled Commercial Banks to different sectors of the
economy and also on the health of SSI and non-SSI units. The last category gives data on
sick and weak units for SSI and non-SSI sectors and the amounts outstanding (loans)
from each of these categories of units.

The ASI provides some data on financial aspects of industries – fixed capital, working
capital, invested capital, loans outstanding and also the interest burden of industrial units
(up to the 4-digit NIC code level). From where and how have the industries raised capital
needed by them? We have looked at once source of capital or working capital, namely,
bank credit. Time series data on new capital issues and the kinds of shares/instruments
issued (ordinary, preference or rights shares or debentures, etc.,) and the composition of
those contributing to capital (like promoters, financial institutions, insurance companies,
Government, underwriters and the public) are also presented in RBI Handbook. Also
available sare data on assistance sanctioned and disbursed by financial institutions like
Industrial Development Bank of India (IDBI) etc., and financig of project costs of
companies..The publication of the Securities Exchange Board of India (SEBI) Handbook
of Statistics on the Indian Securities Market - 2008 provides annual and monthly time
series data on industry-wise classification of capital raised through the securities market.
A reference to the two volumes of CMIE (EIS), Industry: Financial Aggregates and
Industry: Market Size and Shares would be rewarding. Section 5.9 deals with data on
foreign direct investment (FDI), another source of capital finance.

5.6.7 Contribution to GDP

The National Accounts Statistics (NAS) presents a short time series of estimates of (i)
value of output and GDP of each two-digit NIC code level industry in the registered and
the unregistered sub-sector of the manufacturing sector, (ii) value of output of major and
minor minerals and GDP and NDP of the mining & quarrying sector, and (iii) GDP and
NDP of the sub-sectors electricity, gas and water supply.
113

5.7 TRADE

5.7.1 Introduction

Trade is the means of building up an enduring relationship between countries and the
means available to any country for accessing goods and services not available locally for
various reasons like the lack of technical know-how. It is also the means of earning
foreign exchange through exports so that such foreign exchange could be utilised to
finance essential imports and to seek the much-needed technical know-how from outside
the country for the development of industrial and technical infrastructure to strengthen its
production capabilities. Trade pacts or agreements between countries or groups of
countries constitute one way of developing and expanding trade, as these provide easier
and tariff-free access to goods from member countries. While efforts towards such an
objective will be of help in expanding our trade, globalisation and the emergence of
World Trade Organisation (WTO) have only sharpened the need to ensure efficiency in
the production of goods and services to compete in international markets to improve our
share of world merchandise trade and trade in services. Trade is also closely tied up with
our development objectives since trade deficit or surplus, made up of deficit/surplus in
merchandise trade and trade in services, contributes to current account deficit or surplus.
Data on trade in merchandise and services would enable us to appreciate the trends and
structure of trade and identify areas of strength and those with promise but need sustained
attention.

5.7.2 Merchandise Trade

The Directorate General of Commercial Intelligence and Statistics (DGCI&S)


collects and compiles statistics on imports and exports. It releases these data at regular
intervals through their publications and through CDs. It prepares “Quick Estimates” on
aggregate data of exports and imports and principal commodities within two weeks of the
reference month and releases these in the monthly press release. It publishes a monthly
brochure (i) Foreign Trade Statistics of India (Principal Commodities and Countries)
containing provisional data issued to meet the urgent needs of the Ministry of Commerce,
other government organisations, Commodity Boards (CBs), Export Promotion Councils
(EPCs) and research organisations. It contains commodity-wise, country-wise and port-
wise foreign trade information, (ii) Monthly Statistics of Foreign Trade of India,
Volume I (Exports) & Volume II (Imports) containing detailed data on foreign trade at
the 8-digit level codes of the ITS(HTS) (iii) Foreign Trade Statistics of India
(Principal Commodities and Countries), and (iv) Inland and Coastal Trade Statistics
and Shipping & Air Cargo Statistics. The DGCI&S website (www.dgciskol.nic.in)
provides summary data on principal commodities and countries, access on free and
payment basis to final foreign trade data at 8-digit commodity and principal commodity
level, a Priced Information Service System (PISS) to private parties, EPCs, CBs,
Foreign Embassies etc., and aggregate and detailed data to Centre for Monitoring
Indian Economy (CMIE), Mumbai for an efficient trade intelligence service. The
DGCI&S data are also presented in the RBI Handbook 2008, CSO’s Monthly Abstract
of Statistics, CMIE’s volume on Foreign Trade and BoP and EPWRF website.

What would we like to know about foreign trade? The volume of trade, that is, the
volume of exports and imports, the size of export earnings, the expenditure on imports,
114

the size of exports relative to imports, earnings from exports compared to expenses
incurred in imports since exports earn foreign exchange while imports imply outflow of
foreign exchange. We should like to know about the trends in these variables. Besides
looking at the trends in the quantum and value of imports and exports, it is important to
analyse the growth in foreign trade both in terms of value and volume, since both are
subject to changes over time. Exports and imports are made up of a large number of
commodities and fluctuations in the export and imports of individual commodities
contribute to overall fluctuations in the volume and value of exports and imports. We,
therefore, need a composite indicator of the trends in trade. The index number of
foreign trade of a country is a useful indicator of the temporal fluctuations in exports and
imports of the country in terms of value, quantum and unit price and so on. Similarly,
measures of the terms of trade could be derived from such indices relating to imports and
exports. The existing index numbers have the base year 1978-79. RBI Handbook 2008
publishes time series data (DGCI&S data) on value (in US $ and Indian Rs.) of exports
and imports and trade balance, value of exports of selected commodities to principal
countries, Direction of Foreign Trade (in US $ Indian Rs.) by trade areas, groups of
countries and countries, year-wise Unit Value Indices (UVI) and Quantum Indices (QI)
for imports and exports and for each product and the three terms of trade measures, Gross
Terms of Trade (GTT), Net Terms of Trade (NTT) and Income Terms of Trade (ITT).

RBI also generates data on merchandise imports and exports or trade data. The Balance
of Payments (BoP) data reported by RBI (published in RBI Handbook) show the value
of merchandise imports on the debit side and that of exports on the credit side and also
trade balance, all in the balance payment format as part of current account, which also
shows another entity ‘invisibles’. However, there is a divergence in trade deficit /surplus
in merchandise trade shown by DGCI&S data and that shown by RBI’s BoP data. This
discrepancy also affects data on current account deficit (CAD) or surplus (CDS), since
CAD/CDS is the total of trade deficit/surplus and net invisibles. (see later for
‘invisibles’). There are three reasons for the divergence between the two sources. First,
DGCI&S tracks physical imports and exports while BoP data tracks payment transactions
relating to merchandise trade. Second, DGCI&S data fail to capture Government imports,
which are exemptrd from customs duty. (e.g. Defence imports). Finally, DGCI&S data do
not capture imports that do not cross the customs boundary (e.g. oil rigs and some
aircrafts) while they are still paid for and get captured in BoP data.

5.7.3 Services Trade

Besides export and import of merchandise, a number of services, like transportation


services, travel services, software, Information Technology-Enabled Services (ITES),
business services and professional services are exported and imported. These are captured
by “non-factor services” included in the entry “Invisibles” in the Tables on India’s
Overall BoP and on Key Components of India’s BoP in the RBI Handbook 2008.
Table 146 of the Handbook gives the distribution of ‘net invisibles’ by four types of
transactions – ‘net non-factor services’, etc. ‘Net non-factor services’ are further divided
into five classes, ‘travel – net’, ‘miscellaneous – net’, etc. and under each of these classes
are shown ‘Receipts’ and ‘Payments’, which respectively correspond to export and
import of these services. But Table 42 on India’s Overall BoP in the RBI Monthly
Bulletin, Feb. 2009 gives more detailed data. Naming the item “non-factor services” in
the BoP table of the Handbook as “services”, it gives (i) figures for ‘credit’, ‘debit’ and
115

‘net’ against each of the items in the BoP table and (ii) separate figures for “software
services” under the category “ miscellaneous “.

5.7.4 E - Commerce

A newly emerging way of conducting business is E – Commerce, defined as the


production, distribution, marketing, sale or delivery of goods and services by electronic
means (telephone, fax, internet), is growing fast. It is necessary to organise collection of
data relating to this area. The NSC has made some recommendations in this regard.

[Read 1. Notes on/Footnotes to Tables on Foreign Trade and BOP, RBI Handbook
(2008) and RBI Monthly Bulletin Feb., 2009; 2. Section 10.9 pp. (383 – 391) & 10.12
(pp. 396 – 398), Chapter 10, NSC Report.].

5.8 FINANCE

5.8.1 Introduction

While trade and finance have been closely bound up with each other ever since the time
money replaced barter as the means of exchange, finance is the lifeline of all activities. It
flows from the public as taxes to Government, as savings to banking and financial
institutions and as share capital or bonds or debentures to the entrepreneur. It then gets
used for a variety of development and non-development activities through Government
and other agencies and flows back to members of the public as incomes in various ways,
as factor incomes. It would, therefore, be of interest to know how funds get mobilised for
various purposes and get used. This section looks at the kind of data available that could
enable us to analyse this mobilisation process and the flows of funds to different areas of
activity. .

The finance sector consists of public finances, the central bank (the RBI), the scheduled
banks, urban and rural cooperative banks and related institutions. The financial market
consists of the stock exchanges dealing with scrips like shares, bonds and other debt
instruments, the primary and secondary markets, the foreign exchange market, the
treasury bills market and the securities market where financial institutions, mutual funds,
foreign institutional investors, market intermediaries, the market regulator the Securities
Exchange Board of India (SEBI), the banking sector and the RBI all play important roles.
There is also the unorganised sector made up of financial operators like money-lenders
and pawn brokers. Insurance is another area of finance.

5.8.2 Public Finances

What would we like to know about public finances? We would like to know how they are
managed. What are the sources of such finances and how and on what are they spent?
Does the Government restrict its expenditure within its means or does it spend beyond the
resources available to it? Does it, in the process, borrow heavily to finance its
expenditure? The Budget documents of the Central and State Governments, the pre-
Budget Economic Survey and the publication Indian Public Finance Statistics of the
Ministry of Finance, the Planning Commission’s Five Year Plan Documents and RBI
116

Handbook 2008 and the RBI Monthly Bulletins and EPWRF website provide a variety
of data on public finances.

The Economic Survey, for instance, gives an overall summary of the budgetary
transactions of the Central and State governments and Union Territory Administrations.
This includes the internal and extra-budgetary resources of the public sector undertakings
for their plans and indicates the total outlay, the current revenues, the gap between the
two, the manner in which the gap is financed by net internal and external capital receipts
and finally, the overall budgetary deficit. It gives the break-up of the outlay into
developmental and non-developmental outlays and the components of these and those of
current revenues. The RBI Handbook 2008 presents time series data in respect of public
finances in five groups – (i) Central Govt. Finances, (ii) Finances of the State Govts.,
(iii) Combined Finances of Central and State Govts., and (v) Transactions with the
Rest of the World. Besides covering data areas like Govt. receipts and expenditure, the
first two groups cover key deficit indicators, the financing of Gross Fiscal Deficit (GFD)
and outstanding liabilities. The third group covers in addition the range and weighted
averages of Central and State Govt. dated securities and the shares of categories of
holders of Central and State Govt. Securities.

We have looked at one area of the fourth group, namely, trade in merchandise and
services in the section on Trade. There are other areas in which India interacts with the
rest of the world. Foreign exchange flows into the country as a result of exports from
India, external assistance/aid/loans/borrowings, returns from Indian investments abroad,
remittance and deposits from NRI and foreign investment (FDI and portfolio investment)
in India. Foreign exchange reserves are used up for purposes like financing imports,
retiring foreign debts and investment abroad. What is the net result of these transactions
on the foreign exchange reserves? What are the trends in these flows and their
components? What is the size of the current account imbalance relative to GDP and its
composition? If it is a deficit, is it investment-driven? What is the size of foreign
exchange reserves relative to macro-aggregates like GDP, the size of imports and the size
of the short term external debt? While the Weekly Statistical Supplement and the RBI
Monthly Bulletin give data on forex reserves and related data, the RBI Handbook gives
time series data (in US$ and in Indian Rupees) on these parameters. As for FDI, the
coverage of data compiled by RBI and the Department of Industrial Policy &
Promotion (DIPP) in the Ministry of Commerce & Industry since is in accordance with
the best international practices since 2000-01. The RBI Handbook 2008, the SEBI
Handbook 2008 and the RBI Monthly Bulletin, Feb. 2009 and the websites of
www.rbi.org.in and www.dipp.nic.in provide time series data on FDI.

5.8.3 Currency, Coinage, Money and Banking

Economic transactions need a medium of exchange. We have come a long way from the
days of barter and come to the use of money and equivalent financial instruments as the
medium of exchange. Banks function as important financial intermediaries not only in
this process but also in matters of resource mobilisation and the deployment of such
resources. The central bank of the country (RBI in India) regulates the functioning of the
banking system. In addition, it issues currency notes and takes steps to regulate the
money supply in the economy to achieve simultaneously the objectives of ensuring
adequate credit to development activities and maintain stability in prices. We should,
therefore, be interested in data on money supply or the stock of money and its structure
117

and the factors that bring about changes in these, the kind of aggregates that need
monitoring, the transactions in the banking system in pursuance of the nation’s
development objectives, the flow of credit to different activities, indicators of the health
and efficiency of banks which are the custodians of the savings of the public. We should
also be interested in data on prices, as price level affects the purchasing power of money
and indices of prices appropriate for the purpose/group in question – prices for producers
and consumer prices for different groups of consumers.

Most of these of data are compiled by the RBI on the basis of its records and those of
NABARD and returns that it receives from banks and can be found in RBI Bulletins and
RBI Handbook. These are also published in the Monthly Abstract of Statistics of
CSO. The Wholesale Price Index (WPI) (base year 1993-94) is compiled by the
Economic Adviser’s Office in the Ministry of Industry, the Consumer Price Index for
Industrial Workers (CPI - IW) (base year 2000) and CPI for Agricultural Labour (CPI -
AL) (base year 1986-87) by Labour Bureau), Shimla, and CPI for Urban Non-Manual
Employees (CPI - UNME) (base year 1984-85) by CSO and published by the agencies
concerned (also posted on their websites) and are available in the RBI and CSO
publications mentioned above. The Monthly Abstract of Statistics also gives monthly
data on average rural retail prices of (i) selected commodities/services and (ii)
controlled/rationed items collected by NSSO. Two other reports of the Reserve Bank of
India published every year – the Report on Currency and Finance and the Report on
Trends in Banking provide a wealth of information of use to analysts. EPWRF website
provides data on Banking, Money and Finance.

5.8.4 Financial Markets

What would we like to know about financial market and their functioning? We would
like to know about the ways in which financial resources can be accessed and at what
cost. What are the prevailing interest rates payable for funds to meet short-term or long-
term requirements? How do new ventures access the large amount of resources that are
needed for the new ventures? How do term lending institutions access funds required for
their operations? What are the sources of funds?

RBI, which regulates banking operations and the operations of NBFCs and FIs, SEBI,
which regulates the capital market and the Department of Company Affairs, which
administers the Companies Act, are the major sources of data on financial markets. The
RBI Handbook 2008 (also RBI website) and SEBI’s Handbook of Statistics on the
Indian Securities Market 2008 (also, www.sebi.gov.in) contain comprehensive data on
financial markets. The two together provide time series data on several aspects of
financial markets. Examples are the structure of interest rates, resource mobilisation in
the Private Placement Market, net resources mobilised by mutual funds (MFs), new
capital issues by non-govt. public ltd. companies, absorption of private capital issues –
the no. of issuing companies, the number of shares and amount subscribed by various
categories of subscribers, annual averages of share price indices, resources raised by the
corporate sector through equity/debt issues, the share of private placement in total debt
and total resource mobilisation, the share of debt in total resource mobilisation; pattern of
funding for non-govt. non-financial public limited companies, capital raised by economic
activity/size of capital raised/region, trends on trading on stock exchanges, indicators of
liquidity - market capitalisation-GDP ratio (BSE & NSE), turnover ratio (BSE) and
118

traded value ratio (BSE & NSE) and comparative evaluation of indices (BSE SENSEX
etc.) through Price to Earnings Ratio and Price to Book Ratio.

[Read Sections 10.1 to 10.11, Chapter 10, NSC Report, pp.337 – 396; Article “New
Monetary and Liquidity Aggregates”, RBI Bulletin of November, 2000.]

5.9 SOCIAL SECTORS

5.9.1 Introduction

Social Sector consists of education, health, employment, environment and levels of living
or quality of life in general. Investments in this sector pay rich dividends in terms of
rising productivity, distributed growth, reduction in social and economic inequalities and
levels of poverty, though after a relatively longer time span than in the case of investment
in physical sectors. Let us look at the kind of data available in this sector.

5.9.2 Employment, Unemployment and Labour Force

Employment is the means to participate in the development process and also benefit from
it. Creation of employment opportunities is an important instrument for tackling poverty
and to empower people, especially women. We should, therefore, know how many are
employed and how many are ready to work but are unable to gain access to employment
opportunities. How do women fare in these matters? Or, for that matter, what is the
experience of men and women belonging to different social/religious/disadvantaged
groups? Are children employed in any economic activity that is not only hazardous to
their health but which also adversely impacts on our dream of a golden future for them
through efforts to ensure their mental and physical well being. What is the quality of
employment opportunities available to the work force? What are the conditions in which
people work?

(a) Magnitude of Employment & Unemployment

Data on employment in selected sectors are available from several sources. We have
already looked at EMIP of DGE&T. This is the only source of employment data on the
organised sector of the economy that is available at quarterly intervals but is subject to
the limitations arising out of non-response in the submission of returns and
incompleteness of the employers’ register (the frame). EMIP also produces a biennial
reports on the occupational and educational pattern of employment in the public and
private sectors, based on another return but these have lost their utility over the years due
to a high level of non response.

The DGE&T and the SDEs also provide data on the number of jobseekers by age, sex
and educational qualifications and type of social/physical handicap on the live register of
employment exchanges. All the registered jobseekers are not unemployed and all the
unemployed do not register with the employment exchange, registration being voluntary.
Some register at more than one exchange. The size of the live register cannot be an
accurate estimate of the level of unemployment. It does represent the extent of pressure in
the job market, especially for Govt. and public sector jobs.
119

The second and third sources, EC and ASI, have also been discussed earlier. The quality
of ASI data is tied to the completeness of the frame of factories, which, in turn depends
on the quality of the enforcement of the two relevant Acts. Data on employment in the
Railways and the Banking sector are available from Railway Board and RBI
respectively. The Indian Labour Statistics (ILS) (the latest is for 2006) published by
the Labour Bureau (LB), Ministry of Labour & Employment, Shimla presents data on
employment in a number of sectors that are covered by different labour legislations, but
all these suffer from inadequacy of response in the submission of statutory returns and
partial coverage of the relevant Act.

Comprehensive data on employment and unemployment covering the entire country at


regular intervals are available from two sources, namely, the decennial Population
Census and the quinquennial sample surveys on employment and unemployment
(EUS) of the National Sample Survey Organisation (NSSO). Workers in the Census
are enumerated as main workers and marginal workers. The Census 2001 presents data
up to the district level on the male and female workforce (main workers and marginal
workers) by detailed economic activity categories and the employed and unemployed by
age and education and social groups. (B & C Series Tables). These are available on CDs
and floppies for users on payment and on the Census website http://www.census.nic.in.

The latest NSSO quinquennial EUSs relates to 2004-05 (61st Round). Its results are
published in NSSO reports numbered 515 to 521. Report no. 515 Parts I & II
“Employment & Unemployment Situation in India” is the overall report. These
provide data on employment and unemployment in greater detail than the Census up to
the State level. Analyse at the level of the 72 NSS regions is possible with the unit record
data obtainable from NSSO. EUSs provide data on the distribution of the
employed/unemployed by a number of characteristics like sex, rural/urban residence, age,
education, employment status, economic activity (of the employed) and the monthly per
capita expenditure class (MPCE) of the household concerned and also data on the
incidence of underemployment, average daily wage levels of workers, etc.

(b) Quality/Adequacy of Employment

EUS data throws light on aspects of quality and adequacy of employment, like
underemployment of the employed, the share of casual employment in employment,
male-female/regular/casual worker wage differentials and employment status of the
workers from poor households. ILS of LB provide data on wage/earnings in the
organised sector based on statutory returns and the distribution of workers in different
occupations by level of earnings in selected industries through Occupational Wage
Surveys (OWS). (Some of the sixth round reports have come out in 2008). ASI gives
data on wages/ emoluments in different industries in the factory sector. DE, NDE and
OAE and other Establishment Surveys provide data on average annual earnings for
men, women and children in the unorganised sector. Wage Rates in Rural India for
2005-2006 and the 2005 Report on the Working of the Minimum Wages Act, 1948
give rural/unorganised sector wage levels vis-à-vis statutory minimum wages. Data on
child labour and bonded labour from the Census and NSSO do not fully reflect the
ground level realities.
120

ILS also publishes data on several aspects of labour welfare like industrial injuries,
compensation to workers for injuries and death, industrial disputes and access to health
insurance and provident fund. These are incomplete, being based on statutory returns.
ILS also gives statistics relating to welfare funds set up in different industries. LB’s
reports on their ongoing programme of surveys throw light on the working and living
conditions of Scheduled Caste/Tribe workers, unorganised workers and contract labour.

[Read 1. http://www.census.nic.in (especially the write-ups on metadata); 2. Sub-Paper


4.1 of Paper on Agenda Item 4, 15th COCSSO, CSO; 3. Section 9.4, NSC Report, pp.
292 – 302; 4. Chapter 2 of NSSO((1997), pp.3 – 7. 5. Sections on Labour Bureau and
DGE&T of http://labour.nic.in; 4. Sections on EC and FuS of www.mospi.gov.in ]

5.9.3 Education

(a) Introduction

Education is an important instrument for empowering people, especially women.


Education nurtures and develops their innate talents and capabilities and enables them to
contribute effectively to the development process and reap its benefits. It is also an
effective instrument for reducing social and economic inequalities. We have built up over
the years a vast educational system in an attempt to provide education for all to ensure
that the skill and expertise needs of a developing economy are met effectively and at the
same time, to monitor the functioning of the educational system as an effective
instrument for tackling inequalities. We would, therefore, like to look at data on different
aspects of the educational and training system such as its size and structure, the physical
inputs available to it for its effective functioning, its geographical spread, the type of
skills and expertise it seeks to generate, the access of different sections of society and
areas of the country to it and the progress made towards the goals like ‘total literacy’,
‘universalisation of secondary education’, ‘education for all’ and ‘removal of
inequalities’..

The Department of School Education and the Department of Higher Education of the
Ministry of Human Resources Development (MHRD), the National Council for
Educational Research and Training (NCERT) and the University Grants Commission
(UGC) collect and publish educational statistics and conduct research studies and surveys
in the area of education. Selected Educational Statistics (2004-05), Education in India
-Vols. I & IV (1998-99) and Annual Financial Statistics of Educational Sector (2005-
06) published annually by MHRD, the All India Educational Survey (Seventh AIES) –
(2002-03) of the NCERT, Annual Report of the DGE&T, the National Technical
Manpower Information System (NTMIS) and annual Manpower Profile of India of
the Institute of Applied Manpower Research (IAMR), the 52nd , 53rd, , 55th and 61st
Round surveys of NSSO (1995-96, 1998-99, 1999-00 and 2004-05), the B and C series
tables of Census 2001 and the Planning Commission’s National Human Development
Report, (NHDR), 2001 and the State HDR reports are the major sources of data on
education. .
(b) Educational Infrastructure: These volumes together give data on the number of
institutions established at various levels – schools, colleges, polytechnics, industrial
training institutes and facilities for apprenticeship training, their intake capacity, teaching
positions created and filled, training facilities for the physically disadvantaged, special
121

campaigns like Sarva Siksha Abhiyan and Total Literacy Campaign, adult education,
availability and adequacy of physical facilities in schools – type of buildings, number of
rooms for teaching vis-a-vis the number of pupils, access to drinking water and toilets
and urinals (and separately for girls}, distance of the school from the pupil’s residence
and direct and indirect expenditure on education.
(c) Infrastructure Utilisation and Access to Educational Opportunities: These
volumes provide data on literacy rates – overall and for different groups, enrolment,
enrolment ratios and drop out rates at different levels of education and courses, teacher-
pupil ratios, output from various professional and non-professional courses, utilisation
patterns of professional manpower, stocks of different categories of manpower, the
educational profile of the population, incidence of disability in the population, output of
training facilities created for the vocational rehabilitation of the physically challenged
and their vocational rehabilitation.

[Read . 1. Sub-Paper 4.2 of Paper on Agenda Item 4, 15th COCSSO, pp.18 – 35;
2. Section 9.5, Chapter 9, NSC Report, pp. 306 – 323.]

5.9.4 Health

(a) Introduction: One of the important dimensions of quality of life is health. A healthy
individual can contribute effectively to production of goods and services. Investment in
health is, therefore, an essential instrument of raising the quality of life of people and the
productivity of the labour force. What is the health status of the population? What are the
challenges to the health of the population and how are these being tackled? What kind of
data is available about these aspects of the population, the health infrastructure and the
efforts being made to deal with problems of health? What is the impact of these on the
health situation, especially of women and children? The annual publication National
Health Profile of India (NHPI) (2007 is the latest) of the Central Bureau of Health
Intelligence (CBHI) of the Ministry of Health & Family Welfare (MHFW), the
publications Sample Registration System (SRS): Statistical Reports, SRS
Compendium of India’s Fertility & Mortality Indicators, 1971-1997, Mortality
Statistics and Cause of Death and SRS Bulletin (half yearly) and the Social &
Cultural Tables (C Series Tables) of Census 2001 of the Registrar General India &
Census Commissioner (RGI&CC), the Planning Commission’s NHDR 2001 and the
HDRs of the State Governments and the Report on the National Family Health
Survey (NFHS – 3): 2004-05 of the International Institute of Population Sciences
contain a large amount of information on these aspects of health.

(b) Health Infrastructure: NHPI provides data on the number of public and private
hospitals, dispensaries and the number of beds in these in rural/urban areas, similar data
on various health insurance schemes of the Government in different sectors/for different
sections of population, facilities for Indian Systems of Medicine, facilities for training
medical and health manpower, manning of medical and health positions in the health
system, stocks of medical and health manpower, programmes for controlling specific
communicable diseases and expenditure on health and family welfare.

(c) Public Health, Morbidity and Mortality Indicators: NHPI presents data on
programme for vaccination of children and pregnant women, incidence of communicable
122

and other diseases and mortality due to these, incidence of leprosy and tuberculosis,
National Aids Control Programme and other National Control/Eradication programmes,
levels of utilisation of different health insurance schemes, infant mortality, maternal
mortality, birth and death rates, fertility rates, incidence of disability and expectation of
life.

(d) National Family Health Survey (NFHS)-3: NFHS – 1, 2 & 3 conducted in 1992-93,
1998-99 and 2005-06 succeeded in building up an important demographic and health
database in India. These provide State-level estimates of demographic and health
parameters and also data on various socio-economic and programmatic factors that are
crucial for bringing about desired changes in India’s demographic and health situation.
NFHS – 3 covers all states. Some of the types of data provided are (website
http://www.nfhsindia.org) data on age at first marriage of women, current fertility,
median age of women at the first and last birth of child, knowledge and practice of
contraception, estimates of age-specific death rates, crude death rates, infant/child
mortality rates, morbidity of selected diseases, immunization of children, vitamin A
supplementation of children, nutritional status of children, anaemia among them and
indicators of acute and chronic malnutrition among children – weight for age index,
height for age index and weight for height index, health status of women (Body Mass
Index and prevalence of anaemia), health problems of pregnancy and so on.

[Read 1. Section 9.3, Chapter 9, NSC Report, pp.275 - 292.]

5.9.5 Environment

The process of development adversely affects the environment and through it, the quality
of life of society. For instance, the excessive use of fertilizers and pesticides rob the soil
of its nutrients. Letting sewers and drainage and industrial effluents without prior
treatment into rivers and water bodies pollute these, causing destruction of aquatic life
and endangering the health of people using such polluted water. The recent outcry in
Tirupur, near Coimbatore, a place well known for garment exports, against untreated
effluents from garment factories being let into the river used for drinking purposes is a
case in point. The exhaust fumes containing Carbon Monoxide (CO) and lead (Pb)
particles let in to the air we breathe by vehicles using petrol or diesel is an example of air
pollution. The best example of industrial pollution through insufficient safety measures is
the Bhopal gas disaster where lethal gases leaking from a factory’s storage cylinder killed
many people immediately and maimed many others for life. The forest cover of the
country is continuously getting reduced due to indiscriminate felling of trees leading to
reduction in rainfall and changes in rainfall pattern, besides climatic changes. The
destruction of mangroves along seacoasts for housing/tourism development often leads to
soil erosion along the coast by the sea. The adverse effects of current models of
development on environment and the realisation of the need to take note of the cost to
development represented by such effects have now led to the development of
environmental economics as a new discipline in economics.

The Central and State Pollution Control Boards and the Ministry of Environment and
Forests (MOEF) evolve and monitor implementation of policies to protect the
environment. Statistics on environment are collected through this process by these
agencies and the CSO. The annual reports of the MOEF and the Compendium on
123

Environment Statistics, India 2007 published by the CSO from time to time are
excellent sources of data on environment. The latter especially is very comprehensive and
includes a very informative write up. The Compendium (and the annual report of MOEF)
can be accessed in the websites of the two organisations. Illustrative types of data on
environment available from these publications are Ambient Air Quality Status
[concentration of Sulphur di-oxide, Nitrogen di-oxide and Solid Particulate Matter (SPM)
in air] in major cities of India, percentage of petrol-driven two-wheelers, three-wheelers
and four-wheelers meeting CO emission standards, and water quality of Yamuna river (in
the Delhi stretch) in respect of selective physio-chemical parameters during a year –
dissolved oxygen (milligrams./litre), Biological Oxygen Demand (BOD) (mg./l), faecal
coliforms (number/100ml), total coliforms (number/100ml) and ammonical nitrogen
(mg/l). (SPM consists of metallic oxide of silicon, calcium and other deleterious metals.
The most common contamination in water is from disease-bearing human wastes that are
usually detected by measuring faecal coliform levels.)

[Read 1. Section 9.7, Chapter 9, NSC Report, pp. 327 – 331.]

5.9.6 Quality of Life

We have already looked at several of the factors determining the quality of life of the
people – education, health, employment and environment. Shelter and amenities is
another. Ironically, development projects also displace people from their normal way of
life. One other factor, an important one, is the level of income or consumption. The
relevant data are available from quiquennial surveys of the NSSO on consumer
expenditure - those on levels of consumption for different MPCE classes. These and
others considered in the earlier sections lead to measures of the dimensions of poverty
and inequalities in income (consumption) and non-income aspects of life, HDIs, Gender
Development Indices (GDI) measuring gender discrimination, BMIs evaluating the health
status of women and the measures, Weight for Age Index, Height for Age Index and
Weight for Height Index gauging the nutritional status of children. All these measures are
also available from NSSO, the NHRD Report 2001 and the State HDI Reports for
judging the quality of life of the population and of the Scheduled Castes and Tribes. The
Social Statistics Division of CSO has a number of regular publications presenting data on
the elderly, gender differentials in different areas, progress towards millennium
development goals and a report on home based workers. (see MOSPI website.) The
Sachar Commission Report on Minorities and the reports of the Commissions for (i)
Scheduled Castes and Tribes, (ii) Backward Classes Commission, (iii) the Minorities and
(iv) the Women at the national and state levels review improvements in the quality of life
of these sections of society and their mainstreaming, empowerment and physical and
emotional safety and in so doing assemble an enormous amount of data from various
sources. Likewise, the Commissions for the Aged and for Children are sources of data on
these groups gathered at one place from various primary sources. The Annual Reports of
the Ministry of Social Justice and Empowerment, the nodal agency for the
welfare/development/empowerment of all these groups and the physically and mentally
challenged, is another source of data on the status of these vulnerable groups in society.

[Read 1. Sections 9.6 (pp. 323 – 327) & 9.8 (pp. 331 – 336), Chapter 9, NSC Report. ]
124

5.10 LET US SUM UP

The Indian Statistical System with the National Commission on Statistics at the apex
collects, compiles and disseminates an enormous amount of data on diverse aspects of the
Indian economy. Sections 5.4 to 5.10 have highlighted the characteristics of the database
on different aspects of the Indian economy, looking specifically at certain major ones.

National Accounts Statistics, consisting of estimates of national income, saving and


investment and related macroeconomic aggregates are compiled by CSO and published
annually in their publication NAS, besides releasing quick estimates and advance
estimates of national income through press notes. Estimates of SDP are similarly made by
SDESs. These are also available in a number of other publications like the RBI
Handbook and the websites of MOSPI, RBI and the States. CSO also compiles an I-OTT,
which is useful in analysing the impact of changes in a sector of the economy on the
others and on the entire economy. Estimates of agricultural crop production and inputs
and cost of cultivation, livestock and livestock products and data on land utilisation,
irrigation, size distribution of operational holdings and their contribution to production
and access to agricultural inputs are compiled by the Ministry of agriculture and mainly
disseminated through the publication ASG of DESMOA. Data on industrial production
and related data flow through ASI, the IIP compiled by CSO and SDESs, the CSO
publication Monthly Production of Selected Industries, EC and FuS, the Census of Small
Scale Industries and RBI and SEBI Handbooks, besides publications of EPWRF and
CMIE(EIS). DGCI&ES and RBI Handbook and Bulletins are the repositories of data on
trade in merchandise and services. Statistical data on government finances, India’s
transactions with the rest of the world are available in the RBI Handbook and Bulletins,
the budget documents and the Economic Survey. The SEBI and RBI Handbooks and RBI
Bulletins are fairly comprehensive sources of information on financial markets.
Substantial data on different aspects of the employment and unemployment situations
flow from the decennial population census and the NSSO’s quinquennial EU surveys.
ASI, EC and EMPI also generate data on employment though only for parts of the
economy. The educational administrative system generates a large amount of data on the
educational infrastructure built up over the years and the extent of its utilisation, while
the census, surveys and other regular data collection mechanisms make sizable additions
to it. The health system likewise generates a lot of data that is supplemented by the
census and other regular efforts and in particular the NFHS. Data on environment are
compiled and disseminated by the MOEF and CSO. Those on quality of life consist of a
large body of data consisting of levels of consumption by MPCE classes, estimates of
poverty and inequality, indices like HDI and GDI, the status of the socially and
physically challenged, minorities and the elderly.

The observations of NSC on these data areas and their suggestions can be seen in the
reading material cited under each Section/Sub-section. The entire NSC Report should in
fact serve as a guide to anyone wishing to know about data in any area and their quality.

5. 11 FURTHER SUGGESTED READING

Improvement of the database is a continuous process. The National Statistical


Commission is the most recent effort to take a look at the Indian database and identify
deficiencies in it and to suggest steps to overcome these and develop new initiatives to
125

build a data system that delivers quality data that is reliable and timely. The National
Commission on Statistics and the Indian statistical system working under its guidance are
already taking steps in this direction. As a researcher and analyst requiring statistical data
for your work it would be useful to be in touch with developments relating to the review,
refinement and expansion of the database in different aspects/sectors of the economy.

• The Journal of Income and Wealth of the Indian Association for Research in
National Income and Wealth is useful for those interested in methodological
developments in the field of National Accounts and examination of questions of
adequacy or suitability of available data for use in National Accounts work. For
instance, the Journal’s recent issues (Issue No. 24 – 1&2) have a Paper A Case
Study on Estimation of Green GDPof Manufacturing Sector in India by S.K.Nath
& Samiram Mallick that would be relevant in the context of the emerging
emphasis on environment-friendly industrial development. Another Paper
Services Sector in the Indian Growth Process: Myths and Reality by Sanjay
Kumar Hansda would be useful in the current context of the perceived dominance
of the services sector in GDP.

• The discrepancy between estimates of PFCE made by CSO and those made from
NSS household surveys has been the subject of discussion for a long time. The
Report on the Cross Validation Study on Estimates of Private Consumption
Expenditure Available from Household Surveys and National Accounts prepared
by CSO and NSSO for the Study Group on Non - Sampling Errors is published in
Sarvekshana, (Issue 88, Vol. XXV XXVI, No. 41, pp. 1 – 69.). Also see in this
connection Section 13.4 (pp. 492 to 506) and in particular sub-sections 13.4.7
(about the above Study)(pp. 503 – 506), NSC Report.

• Report fthe Committee on Capital Formation in Agriculture.

• Environmental Accounting/ Natural Resource Accounting are areas of interest to


economists. A look at Natural Resource Accounts of Air & Water Pollution –
Case Studies of Andhra Pradesh and Himachal Pradesh and Environmental
Accounting of Land and Forestry Sector – Madhya Pradesh and Himachal
Pradesh would be rewarding. These are on the MOSPI website.

• A clear understanding of the concepts used in the surveys of NSSO would be very
useful for an analyst/researcher. Explanations of technical terms, their definitions
and the underlying concepts in NSS socioeconomic surveys up to the 55th Round
(excluding the terms used in ASI, price collection work and crop surveys) are
given in Concepts and Definitions Used in NSS (May, 2001). Modifications made
in definitions etc., in recent Rounds (60 onwards) are available Round wise. See
NSSO/SDRD section of the MOSPI website.

• National Seminar on NSS 61st Round Results (October, 2007) Report on the
Seminar and Papers are in the MOSPI website.

• Official Statistics Seminars Series organised by MOSPI to promote new ideas.


Papers of the Second Seminar (Nov., 2004) contain suggestions to overcome
deficiencies in the data in various areas and suggestions for improvement, etc. For
126

instance, a Paper Reengineering ASI to improve the database of the Organised


Sector by G.C.Manna makes a number of suggestions to tackle NSC’s
observations and suggestions. See MOSPI website for the seminar papers.

• The National Commission on Statistics Workshop on Conceptual Issues relating


to Measurement of Employment and Unemployment (Dec., 2008). Papers and the
proceedings can be seen on the Commission’s section of the MOSPI website.

5.12 REFERENCES

Chandrasekhar, C.P. (Ed)(2001): India’s Socio-economic Data Base, New Delhi.


Tulka.
CSO(1979): The article “Comparable Estimates of SDP – 1970-71 to 1975-76” in NAS,
(January, 1979), CSO, MOSPI, Govt. of India, New Delhi.
------(1980): Article “The Status of State Income Estimates” appearing in the Monthly
Abstract of Satistics, (October, 1980), CSO.

------(March, 1994): National Accounts Statistics (NAS) – Factor Incomes, CSO,


MOSPI, Govt. of India, New Delhi.
------(1994): NAS, 1994, CSO, MOSPI, Govt. of India..
------(1999 a): New Series on NAS (Base Year 1993-94), CSO, MOSPI, Govt. of India,
New Delhi.
------(1999 b): NAS 1999, CSO, MOSPI, Govt. of India.
------(2005): NAS 2005, CSO, MOSPI, Govt. of India.
------(Feb. 2006): New Series of NAS (1999-00 prices), CSO.
------(2007): National Accounts Statistics – Sources and Methods, CSO, MOSPI , Govt.
of India, New Delhi.
------(2008): NAS, 2008, CSO, MOSPI, Govt. of India.

Department of Agriculture (several years): Cost of Cultivation in India, DESMOA


Ministry of Agricukture, Govt. of India, New Delhi.
EPWRF (April, 2002): Annual Survey of Industries, 1973-74 to 1997-98 – A Database
on the Industrial Sector in India, EPWRF, Mumbai.
------------(June, 2003): Domestic Products of States of India: 1960-61 to 2000-01,
EPWRF, Mumbai.
------------(Dec., 2004): National Accounts of Statistics of India, 1950-51 to 2002-03 – Linked Series with 1993-94 as the base Year,
EPWRF,Mumbai.

International Labour Organisation(1993)::Resolution concerning Statistics of


Employment in the Informal Sector, 15th International Conference of Labour Statisticians
(ICLS), Geneva.
Indian Association for Research in National Income and Wealth & Institute of
Economic Growth (1998): Golden Jubilee Seminar on Data Base of the Indian
Economy, Delhi. .
Journal of Income & Wealth (1976): Article “Mahabaleshwar Accounts of States” in
the October, 1976 issue.
127

Katyal, R.P., Sardana, M.G., and Satyanarayana, J. (2001): Estimates of DDP,


Discussion Paper 2, National Workshop on State HDRs and the Estimation of District
Income Poverty Under the State HDR Project Executed by the Planning Commission
(GOI) with UNDP Support held in Bangalore in July, 2001, UNDP, New Delhi.
MOSPI (2001): Report of the National Statistical Commission, MOSPI, Govt. of India,
NSSO (1997): Report No. 409 – Employment and Unemployment in India, 1993-94 (50th
Round).
System of National Accounts 1993 (SNA 1993) Commission of the European
Communities, International Monetary Fund, Organisation for Economic Cooperation
and Development, United Nations, World Bank, Brussels/Luxembourg, New York, Paris,
Washington, D.C.
World Trade Organisation (2007): International Trade Statistics 2007, WTO, Geneva.

5.13 MODEL QUESTIONS

1. Describe the organisational structure of the Indian Statistical System.


2. What do you understand by the term Special Data Dissemination Standards?
What data are supplied by India under SDDS? What are the release calendars for
our data?
3. Why do you we need to make estimates of National Income and related macro
aggregates? List the various components of the system of National Accounts.
4. What is the need to make estimates of macro aggregates at current as well as at
constant prices? Why is the base year changed from time to time? Which have
been the base years so far for national income and related aggregates?
5. Are estimates of SDP prepared by SDESs comparable and why? What problems
arise as a result? How have these been tackled?
6. What are the approaches available for preparing SDP and district domestic
product? Which approach is being adopted and with what consequences?
7. What are the relationships depicted in the Input-Output Tables? How are these
tables useful in research work and economic policy decisions?
8. What kind of saving data are published by CSO in National Accounts Statistics?
9. With the help of NAS 2008, explain the structure of savings in India.
10. What are the gaps/deficiencies in data that come in the way of better estimates of
Saving? What steps can and should be taken to fill these gaps/deficiencies?
11. What do you mean by the term capital formation?
12. What estimates of regarding capital formation are presented in the national
accounts? What are their limitations? What data gaps come in the way of
compiling better estimates? What should be done to overcome these problems?
13. What are the data sources on contribution of different sectors to capital
formation?
14. Explain using national accounts how capital formation is financed?
15. What type of data will you need to assess the performance of Indian economy?
Explain the various sources of such data and their quality.
16. How are annual estimates of (i) crop production and (ii) crop forecast made by
DESMOA? What are NSC’s views on the procedures/methodology adopted for
these?
17. Give two major sources of data on Agriculture and allied activities like dairy and
fisheries and comment on their quality and time lag in availability.
128

18. From where and how can you get data on per-capita availability of milk, egg,
wool, cow and buffaloes? What do you think of its timeliness and reliability?
19. What kind of data on inequalities is available from the Agricultural Census?
Comment on the uses to which such data this can be put.
20. What do you mean by the term 'cropping intensity'?
21. What data is available on subsidy given to agriculture? Comment on its accuracy.
Attempt to work out measures like those relevant to subsidy worked out in OECD
countries and presented in ASG 2004. (Producer Support Estimate – PSE and
%PSE).
22. What are the major efforts made to collect data on different aspects of ag. Sector?
23. Discuss the characteristics of data flowing from the agencies involved in
compilation of agricultural data.
24. Indicate the major sources of data on levels of industrial employment. Comment
on their scope, coverage, reliability and timeliness.
25. Discuss the role of the Economic Census in the industrial database.
26. Discuss the kinds of data that ASI provides to an understanding of different
aspects of the factory sector. What are its contributions to economic analysis?
27. Discuss the adequacy, quality and the representative character of the Index of
Industrial Production (IIP).
28. Discuss the limitations of the time series data on small scale industries available
from DCSSI.
29. Enumerate the data sources on flow of credit to different sub sectors of industry.
30. "The detailed data available in the industrial sphere relates to the factor sector"
explain.
31. Do you think that data available on the unorganised sector is inadequate? What
suggestion would you like to make in this regard?
32. Which are the major two sources of data on merchandise trade? What kind of
Trade Data is compiled by the two sources?
33. How are the measures Gross Terms of Trade, Net Terms of Trade and Income
Terms of Trade obtained?
34. What are the reasons for divergence between the two sources of trade data?
35. Explain the terms ‘net invisibles' and ‘non factor services’. To what detail is data
on trade in services available and where?
36. Indicate the documents that provide different kinds of data on public finances
37. Indicate the various measures of deficit and the relationship between them.
38. Identify the transactions other than trade in merchandise India has with the rest of
world?
39. Which document does contain the methodology for compiling liquidity
aggregates?
40. List the kind of time series data available in RBI Handbook 2008.
41. Identify the major sources of data on financial markets.
42. Name the sources that provide data on employment and unemployment.
43. Discuss the scope, reliability and utility of EMIP and live register data as a source
of data on levels of employment and levels unemployment respectively.
43 Explain the kind of data on employment and unemployment available from the
population census 2001.
129

44. Explain the different measures of employment/unemployment/labour force used


by NSSO in different quinquennial surveys and their use in judging quality and
adequacy of employment.
45. Discuss the availability of data on adequacy and quality of employment and the
utility of those that are available.
46. Identify the different aspects of labour welfare on which data are compiled by the
Labour Bureau.
47. Name the organisations/institutions that collect and publish data on different
aspects of education, their timeliness and quality.
48. What is the contribution of AIES of NCER&T to education database?
49. How has the National Technical Manpower Information System (NTMIS)
improved data availability on development and utilisation of technical manpower?
50. What kind of data are published in NHPI? Comment on their quality and
timeliness.
51. What is the Sample Registration System? What information does it supply on a
regular basis? Comment on the quality and timelines of the data generated.
52. How have the series of NFHSs helped in developing the health information
system, especially on the health status of women and children?
53. Examine the availability of data on air, water and soil pollution in India.
54. What measures are available for measuring quality of life? Examine the adequacy
of the Indian database for developing reliable estimates of these measures?
55. Inclusive growth is very much in the news. Examine the adequacy of the database
for measuring the flow of the benefits of growth to (a) women and (b) the bottom
half of the population (in the distribution by MPCE classes).
130

BLOCK 6 USE OF SPSS AND EVIEWS PACKAGES FOR


ANALYSIS AND PRESENTATION OF DATA

Structure

6.0 Objectives
6.1 Introduction
6.2 An overview of the block
6.3 SPSS Package
6.3.1 Features of SPSS for Windows
6.3.2 Getting acquaintance with SPSS
6.3.3 Menu commands and Sub-commands
6.3.4 Basic steps in Data analysis
6.3.5 Defining, Editing and Entering Data
6.3.6 Data file management functions
6.3.7 Running a Preliminary Analysis
6.3.8 Understanding Relationship between Variable: Data Analysis
6.3.9 SPSS Production Facility
6.4 Statistical Analysis System (SAS)
6.5 NUDIST
6.6 EVIEWS Package
6.6.1 EVIEWS Files and Data
6.6.1.1 Creating a Work file
6.6.1.2 Importing Time Series Data from Excel
6.6.1.3 Transforming the Data
6.6.1.4 Copying Output
6.6.1.5 Examining the Data
6.6.1.6 Displaying Correlation and Covariance Matrices
6.6.1.7 Seasonality of the series
6.6.1.8 Estimating Equations
6.6.1.9 Testing for Unit Roots
6.6.1.10 ARIMA/ARMA identification and estimation
6.6.1.11 Granger Causality Test
6.6.2 Vector Auto Regression (VAR)

6.0 OBJECTIVES

After studying this block you will be able to


• describe the main features of SPSS,
• develop skill in the use of SPSS for basic statistical analysis with a special focus
on the measures of central tendency, dispersion, correlation, and regression
analysis,
• present the data and SPSS results graphically,
• explain the basic features of EVIEWS package, and
• acquire skills to analyze time series data through application of various
sophisticated time series methods in EVIEWS.
131

6.1 INTRODUCTION

We have learned in block 2 to 5 that statistical analysis and interpretation of data


constitute an integral part of research in economics. The methodological advances in
quantitative analysis are also accompanied by a significant revolution in the computing
power of the desktops/laptops which are often called PC. Earlier, the softwares which
could only be run on large mainframe computers can now be run with considerable ease
on the PCs. One of such package is MS-Excel. You would certainly be familiar with this
software and would know that it provides statistical, analytical and scientific functions. It
has various features to offer namely fast calculations, what-if analysis, charts (graphs),
automatic re-calculations and many more. Besides this, the software packages being used
in large set of data, data analysis, result presentation, etc. for policy decision and research
tool simplification include: RATS, SPSS, NCSS, MATLAB, LIMDEP, OX STATA, and
EVIEWS. Among all these sophisticated econometrics packages, SPSS and EVIEWS are
popular software packages used in data analysis and data presentation in social sciences
in general and economics in particular. Hence in this block, we shall focus the SPSS and
EVIEWS fundamentals and the use of their statistical components. We shall also look at
the several statistical techniques (for quantitative and qualitative data analysis) and
discuss situations in which you would use each of these techniques. Light will also be
thrown on the assumptions made by each method, how to set up analysis using
SPSS/EVIEWS, as well as how to interpret the results.
The statistical analysis system (SAS) software for quantitative data analysis will also be
discussed. For qualitative data analysis we well introduce software called NUDIST.
6.2 AN OVERVIEW OF THE BLOCK

SPSS (Statistical Package for Social Sciences) is one of the packages often preferred by
the researchers and analysists for data management and detailed statistical analysis. It
provides techniques for data processing and procedures for logistic regression, log-linear
analysis, multivariate analysis and analysis of variance. It also equips us with the
procedures for constrained non-linear regression obit, Cox and actuarial survival analysis.
Further, it also creates high quality presentation of data. SPSS also performs
comprehensive forecasting and time series analysis with multiple curve fitting models,
smoothing models and methods for estimating of autoregressive functions. In the first
part of this block, efforts have been made to introduce the various operational procedures
relating to data processing, statistical analysis and presentation of data.

6.3 STATISTICAL PACAKAGE FOR SOCIAL SCIENCES (SPSS)


PACAKAGE

Once the data has been collected, the first step is to look at it in a variety of ways. While
there are many specialized software application packages for different types of data
analysis (relating to scientific, commercial and financial problems), a researcher is often
faced with a situation where the general treatment and standard statistical analysis of the
quantitative data is required. SPSS (Statistical Package for Social Sciences) is one-such
package that is often used by researchers and analysts for data management and exploring
it before attempting a detailed statistical analysis. It is a preferred choice for research
analysis due to its easy-to-use interface and comprehensive range of data manipulation
and analytical tools.
132

Suppose, you are interested in knowing the attitude of students towards distance
education and for that you have administered a data collection instrument (commonly
known as questionnaire) to some students. Now you want to process and analyze the data.
Till recently, data were processed manually and it was indeed a cumbersome process.
Fortunately, now we live in an age when high-speed computers can do the job of
processing and analysis of data in a very short period of time and of course without
errors. What you have to do is to learn some fundamental concepts used in this
programme. Now you can sit at the computer and process and analyze the data that you
have collected by administering a questionnaire. In fact, you will find it helpful and
interesting to keep the SPSS Application guide nearby while you process and analyze
your data.
Here to help you work with the SPSS, some general features are highlighted next.
6.3.1 Features of SPSS for Windows

SPSS is one of the leading desktop statistical packages. It is an ideal companion to the
database and spreadsheet, combining many of their features, as well as, adding its own
specialized functions. SPSS for Windows is available, as a base module and a number of
optional add-on enhancements are also available. Some versions present SPSS as an
integrated package including the base and some important add-on modules.
SPSS Professional Statistics provides techniques to examine similarities and
dissimilarities in data and to classify data, identify underlying dimensions in a data set. It
includes procedures for cluster, k-cluster, discriminate, factor, multi- dimensional scaling,
and proximity and reliability analysis.
SPSS Advanced Statistics includes procedures for logistic regression, log-linear analysis,
multivariate analysis and analysis of variance. This module also includes procedures for
constrained non-linear regression, probit, Cox and actuarial survival analysis.
SPSS Tables create a high quality presentation – quality tabular reports including stub
and banner tables and display of multiple response data sets. The new features include
pivot tables, a valuable tool for presentation of selected analytical output tables.
SPSS Trends perform comprehensive forecasting and time series analysis with multiple
curve fitting models, smoothing models and methods for estimation of autoregressive
functions.
SPSS categories perform conjoint analysis and optimal scaling procedure, including
correspondence analysis.
SPSS also provides simplified tabular analysis of categories data, develops predictive
models, screens out extraneous predictor variables, and produces easy-to-read tree
diagrams that segment a population into sub-groups that share similar characteristics.
Recently, the SPSS Corporation announced the release of SPSS version 15.0. Many new
add-on products have also been launched in the recent months. You can consult the SPSS
World Wide Web site for the latest developments and additions to the computing power
of SPSS. Technical support is also available to the registered users at the SPSS site. The
SPSS Web site is http://www.spss.com. Select white papers on SPSS applications in
major disciplines are also available on this site.
The present unit discusses some of the commonly used data management techniques and
statistical procedures using SPSS 11.5 version. Since new features are added almost
133

daily, you are advised to check for these details on the currently installed version of SPSS
on your computer and also consult the user manuals before undertaking complex type of
data analysis. The on-line help is also available. There may be some procedures and
syntax-related changes from one version to another. In case these are not available on
your version of SPSS, please consult the relevant SPSS authorized representative or the
WWW site of the SPSS corporation. The most recent version of SPSS is now called
PASW.
With this basic knowledge let us get acquainted with the SPSS.
6.3.2 Getting Acquaintance with SPSS
The SPSS for Windows can be run from Windows 3.x or Window 95 through Windows
98 or later operating systems such as UNIX, Mac and mainframe versions of the SPSS
software. These are also available on the SPSS software. The illustrations in this unit are
based on SPSS version for Window 95/98/NT operating systems. We are assuming that
SPSS is installed on your machine.
Starting SPSS
The SPSS for Windows uses graphical environment, descriptive menus and simple dialog
boxes to do most of the work. It produces three type of files, namely data files, chart files
and text files.
To start SPSS, click the start button on your computer. On the start menu that appears,
click Program. Another menu appears on the right of the start menu. If there is an entry
marked SPSS, that’s the one you want to click. If there isn’t, click the program group
where SPSS was installed and an entry marked SPSS will appear. Click the SPSS 11.4
(or which ever version entry). You will know when the SPSS has started and an SPSS
Data Editor window appears. To begin with, the SPSS data editor window will be empty
and a number of menus will appear on the top of the window. We can start the operations
by loading a data set or by creating a new file for which data is to be entered from the
data editor window. The data can also be imported from other programs like Dbase,
ASCII, Excel and Lotus, we will learn about this in a little while from now.
Exiting SPSS
Make sure that all SPSS and other files are saved before quitting the program. You
should exit the software by shutting off the program by selecting Exit SPSS command
from the file menu of the SPSS Data Editor window. In case of unsaved files, the SPSS
will prompt you to save or discard the changes in the file.
Saving data and other files
Many types of file can be saved using ‘save’ or ‘save as’ command. Various types of file
used in SPSS are: Data, -Syntax, Chart or Output. Files from spreadsheets or other
databases can also be imported by following the appropriate procedure. Similarly, an
SPSS file can be saved as a spreadsheet or in Dbase format. Select the appropriate save
type command and save the file. The SPSS data files are saved with .sav as the secondary
name. Though SPSS files could be given any name, the use of reserved words and
symbols is to be avoided in all types of file names.
Printing of data and output files
The contents of SPSS data files, Output Navigator files and Syntax Files can be printed-
using the standard ‘Print’ Command. The SPSS uses the default printer for printing. In
the case of network printers, an appropriate printer should be selected for printing the
134

output. It is suggested that ink jet or laser jet printers should be used for printing graphs
and charts. Tabular data can be easily printed using a Dot matrix Printer.
Operating Windows in SPSS
There are seven type of Windows in SPSS which are frequently referred to during the
data management and analysis stages. These are:
Data Editor
As mentioned earlier, the data editor window opens automatically as soon the SPSS gets loaded.
To begin with, the data editor does not contain any data. The file containing the data for analysis
has to be loaded with the help of ‘file’ menu sub-commands by using various options available
for this purpose. The contents of the active data file are displayed in the data editor window. Only
one data editor window will be active at a time. No statistical operations can be performed until
some data is loaded into data editor.
Output Navigator
All SPSS messages, statistical results, tables and charts are displayed in the output navigator. The
output navigator can be opened/closed using the File Open/New Command. The output in the
navigator window can be edited and saved for future reference. The Output Navigator opens
automatically, the first time some output is generated. The user can customize the presentation of
reports and tables displayed in the Output Navigator. The output can be directly imported into
reports prepared under word processing packages, and the output files are saved with an
extension xxxx.spo.
Pivot Tables
The output shown in the Output Navigator can be modified in many ways using the Edit and
Pivot Table Option, which can be used to edit text, swap rows and columns, add colour, prepare
custom made reports/output, create and display selectively multi-dimensional tables. The results
can be selectively hidden and shown using features available in Pivot Tables.
Graphics
The Chart Editor helps in switching between various types of charts, in swapping of X - Y axis,
changing colour and providing facilities for presenting data and results through various type of
graphical presentations. It is useful for customizing the charts to highlight specific features of the
charts and maps.
Text Editor
The text output not displayed in the Pivot Tables can be modified with the help of Text Editor. It
works like an ordinary Text Editor. The output can be saved for future reference or sharing
purposes.
Syntax Editor
The Syntax Editor can be opened and closed like any other file using the File Open/New
command. The use of Syntax File is recommended when the same type of analysis is to be
performed at frequent intervals of time or on a large number of data files. Using Syntax File for
such purposes automates complex analysis and also avoids errors due to frequent typing of the
same command. The commands can be pasted on the Syntax files using a particular command
and pastes buttons from the menu. Experienced users can directly type the commands in the
Syntax window. To run the syntax, select the commands to be executed and click on the run
button at the top of the syntax window. All or some selected commands from the Syntax File will
be executed. The Syntax File is saved as xx.sps.
135

Script Editor
This facility is normally used by the advanced users. It offers fully featured programming
environment that uses the Sax BASIC language and includes a Script Editor, Object Browser,
Debugging features and context sensitive help. Scripting allows you to automate tasks in SPSS
including:
• Automatically customizing output
• Open and save data files
• Display and manipulate SPSS dialog boxes
• Run data transformation and statistical procedures using SPSS command Syntax
• Export charts as graphic files in a variety of formats.
The present module will not go into the details of the advanced features of SPSS including
scripting. .

6.3.3 MENU COMMANDS AND SUB-COMMANDS


Most of the commands can be executed by making appropriate selections from the menu bar.
Some additional commands and procedures are available only through the Syntax Window. The
SPSS user manuals provide a comprehensive list of commands, which are not available through
menu driven options. If you want a comprehensive overview of the basics of SPSS, there is an
on-line tutorial, as extensive help on SPSS is available by using the ‘Help’ menu command. The
CD version of the software contains an additional demo module.
Since SPSS is menu driven, each Window has its own menu bar. While some of the menu bars
are common, the others are specific to a particular, type of Window. We will present below in
Table 6.1. The menu and sub-menus of the Data Editor window (Refer to Figure 6.1) highlights
the SPSS data editor.

Figure 6.1: SPSS data editor shows the data editor menus. Each command in the main menu has a number of
sub-commands.
Table 6.1 : Components of data editor menu
Menu Function/sub-commands
File Open and Save data file, to import data created in other formats like Lotus,
Excel, Dbf etc. Print control functions like page setup, printer setup and
associated functions. ASCII data can also be read into SPSS.
Edit These functions are similar to those available in general packages. These
include undo, redo, cut, copy, paste, paste variable, find, find and replace.
Option setting for the SPSS is controlled through Edit menu.
136

View Customize tool bars, Fonts, grid and display of data, displays option for
showing value labels.
Data This is a very important menu as far as management of the data is concerned.
Variable definition, inserting new variables, transposing templates, aggregating
and merging of data files, splitting data files for specific analysis are some
important commands in Data Menu.
Transform Compute new variables, recede, random number generation, ranking, time
series data transformation, count and missing value analysis are undertaken
using Transform Command.
Analyze As the name implies, analyze Menu incorporates statistical procedures,
frequency distribution, cross-tabulations, comparison of means, correlation,
simple and multiple regression, ANOVA, Log linear regression, discriminate
analysis, factor analysis, non-parametric tests and time series analysis are
undertaken using analyze menu.
Graphs Includes options for generating various type of custom made graphics like bar,
pie, area, X- Y and high-low charts, pareto, control charts, box-plots,
histograms, P-P and Q-Q charts and time series representation of data.
Utilities Information about variables, information on working a data file, run scripts and
define sets are some of the important functions carried out through Utilities
command.
Window Windows menu are used to switch between SPSS windows.
Help Context specific help through dialog boxes, demo of the software, and
information about the software are some of the important options under Help
command. It provides a connection for the SPSS home page. The statistical
coach included in the help module is very useful in understanding various
stages of executing a procedure.
Setting The Options
The SPSS provides a facility for setting up of the user defined options. Use the Edit menu and
then select Options. The following types of optional setting are allowed in SPSS as illustrated in
Figure 6.2. Make the appropriate changes to set the options according to your choice.

Figure 6.2: SPSS options


137

With this basic knowledge about commands and sub-command now let us learn about the basic
steps in data analysis.

6.3.4 Basic Steps in Data Analysis


There are five basic steps involved in data analysis using SPSS. These are shown in the Figure
6.3.
Let us review these steps.
Bring your data into SPSS: You can bring your data into SPSS in the following ways:
• Enter data directly into SPSS Data Editor
• Open previously saved SPSS data file
• Read a spreadsheet data into SPSS data editor
• Import data from DBF files
• Import data-from RDBMS packages like Access, Oracle, Power Builder, etc.
Basic steps in Data Analysis

Step 1
Define your data
Step 2
Get your data entered
into SPSS data Editor

Step 3
Select the variables
for the analysis
Step 4
Select a procedure from the
menus to calculate statistics
Step 5
Run the procedure and
look at the results

Figure 6.3: Steps in data analysis


Select the Variables: The variables in the active file are listed each time a dialog box is opened.
Select the appropriate variables for the selected procedure. Selection of at least one variable is
necessary to run a statistical procedure. The variables may be numeric, string, date or logical.
You should be aware that string variables cannot be manipulated to the same extent as the
numeric variables.
Select a Procedure from Menus: Before embarking on a statistical analysis, it is advised that
you are clear as to what analysis is to be performed. Select the corresponding procedure to work
on the data or create charts or tables using the selected procedure.
The command could either be directly executed or pasted on a Syntax Window. As mentioned
earlier, pasting the command on the Syntax Window will be useful for undertaking batch
processing or for subsequent use, especially where the same type of repetitive analysis required.
Pasting the command will not lead to its execution. The command has to be selected and executed
using the run command.
138

Figure 6.4 : Variables in the active file


Run the Procedure and Examine the Output: After completing the selection process for the
procedure and the variables, execute the SPSS command. Most of the commands are executed by
clicking OK on the dialog box. The processor will execute the procedures and produce a report in
the Output Navigator.
So then the basic steps involved in data analysis are clear. Before analysis we need to define, edit
and enter data. Let us get to know about this next.

6.3.5 DEFINING, EDITING AND ENTERING DATA


As mentioned earlier, there are many options for creating SPSS data files. The data can either be
directly entered through Data Editor or imported from spreadsheets, ASCII file and other
RDBMS packages like oracle and Access. Let us understand how to start, define, edit and enter
data in the SPSS.
Starting the SPSS Session
Click the Start button and select SPSS 11.4 from program menu or double click the icon of SPSS
11.4. When you start an SPSS session, the Data Editor opens automatically. The Data Editor
provides a spreadsheet for entering and editing data and creating data files. (See the Figure 6.5).

Figure 6.5: Data editor


139

Important features of the Data Editor include:


• The data is around in the form of rows and columns in the data editor under
• Rows represent cases. For example, each respondent (student/subject) is a case
• First column represents case numbers
• Columns represent variables. For example, each question is a variable (sometimes it may
represent more that one variable)
• Cells represent values. Each cell in defined as the intersection of a row and a column and
refers to the value of a particular variable for a specified case/ direction.
Coding of Data
Before we enter data, we assign codes to the values of variables to make data entry easier. For
example, in the “attitude study” gender is a variable that can take on two values. These have been
coded so that “1” represents “Male” and “2” represents “Female”.
A Sample Code Book is illustrated here with:
Variable Name : V1
Variable Label : Gender
Variable Labels : Male = 1
Female = 2
Variable Name : V2
Variable Label : Level of Education
Value Labels : Literate = 1
Primary = 2
Secondary = 3
Graduate = 4
Define Variable
Once you prepare your Code Book, you need to include it in the programme for further action to
be taken. The process of including the code book is known as “Define Variable”.
• A name for the variable (up to 8 characters only)
• A description (label)
• A series of labels which explain the values entered (value labels)
• A declaration as to which values are non-valid and should be excluded from the statistical
analysis and other operation (missing values). This information is important to understand
the no-response pattern and also to specify the observation which should be excluded from
the analysis.
Table 6.2 : Variable definition table provides an example of the above description
Variable Variable Value Missing Variable
Name Lable Lables Values Type
STID Student None None Number, 6 digits
identification no decimal place
number
Name None None None String, 24
character long
Gender Sex of M male X String 1 character
respondent F female long
X Unknown
MTL Marital status 1 Married 9 Number, 1
140

2 Widowed character
3 Divorced
4 Separated
5 Never
married
9 Missing
DOB Date of birth None None Date, dd mm yy

To define variable click at Variable View (see the Figure 6.6).


In the left bottom corner of the Data Editor there are two Commands namely Data View and
Variable View.

Figure 6.6: Variable view


Next, click in the first column and type variable name, label & values.
Enter the name you wish to use for the cell. In the example we have chosen the name ‘V1’ to
stand for Gender. Next, click on the label cell than type “Gender” in the variable label cell. Then
click on the values cell, a dialog box will appear (refer to Figure 6.7) in that type value & Value
label and then click on Add button.
For example, type “1” in the value box, then click the value label box and type “Male” and finally
click the “Add” button. Repeat for the other value of the variable. Once you have finished
assigning value labels, click on continue button.
• In case you need to change the labels you can always return to this dialog box. The change
button can be used to change a value label.
• The remove button can be used to remove a value label.
• The cancel button can be used to cancel your labeling work and help button can be used to
access the SPSS on line help.
Now go ahead and define the remaining variables.
141

Figure 6.7: Value labels dialog box


So you have seen that defining variables is easy in SPSS. The variable names can be changed and
altered with ease even during analysis. Any change made to the working files will be permanently
changed only when the data file is saved using ‘save’ or ‘save as’ command.
Data can be entered directly using SPSS Data Editor window. However, if the data is large, you
are advised to use a data entry package. The data can also be edited / changed in the data editor
window. To change the value in any cell, bring the cursor to the particular cell, enter the new
value and press enter. New variables can also be added and the existing variables can be deleted
in the Data Editor Window. Let us learn how to enter data using SPSS data editor.
Entering Data
1. Select a cell in the Data Editor
2. Enter the data value. The value is displayed at the top of the Data Editor
3. Press Enter or select another cell to store the value.
Example
To enter the data for the “Attitude Study”, simply move the cursor to the upper-left-hand corner
and enter 1 for the first respondent’s gender (male), then move the cursor one cell to the right and
enter 1 for the level of education (literate), and so on. On the screen you will see like Figure 6.8.

Figure 6.8: Data file


Go ahead and enter the first 10 cases. Now save the data.
Saving the data,
142

To save data
• From the menus choose:
File
Save
(Click on save)
Because these data have not been saved previously you will see a dialogue box prompting you to
enter a file name. Type in the name “attitude” and click OK button. SPSS will then save the
data to this file. (SPSS will automatically attach the ‘.sav’ extension if you do not type it in.)

6.3.6 DATA FILE MANAGEMENT FUNCTIONS


SPSS is very flexible as far as management of data files is concerned. While only one file can be
opened for analysis at a time, the SPSS provides flexibility in merging multiple data files with the
same structure into one single data file, merging files to add new variables, partially select the
cases for analysis, and make groups of data based on certain characteristics and use different
weights for different variables. Some of these functions are discussed below. Groups of data can
also be defined to facilitate the analysis of the most commonly referred variables (see utilities and
data commands).
Merging Data Files
Researchers are often faced with a situation where data from different files are to be merged or a
limited number of variables from large and complex data files are required. The following types
of facility are available for merging files using SPSS.
Adding variables: Refer to Figure 6.9. Adding variables is useful when two data files contain the
information about same case but on different variables. For example, the teachers’ database may
contain two files, one having the educational qualifications and the other having the names of the
courses taught. Both the files could be combined to analyze the variables available in them. The
data on a key and unique variable from both the files can be combined easily. The key variables
must have the same name in both the data files. Both the data files should be sorted on the
common key variable.
Adding cases: Refer to Figure 6.9. This option is used when the data from two files having the
same variables are to be combined. For example, you may record the same information for
students in different study centers in India and abroad. The data can be merged to create a
centralized database by using Add cases command.

Figure 6.9 : Merge files


143

Aggregate Data
Aggregate Data command combines groups of cases into a single summary case and creates a
new aggregated data file. Refer to Figure 6.10. Cases are aggregated, based on the value of one or
more grouping variables. The new (aggregated) file contains one record for each group. The
aggregate file could be saved with a specific name to be provided by you the user. Otherwise, the
default name is aggregate. Say, For example, the data on learners, achievement could be
aggregated by sex, state and region.
A number of aggregate functions are available in the SPSS. These include sum, mean, number of
cases, maximum value, minimum value, standard deviation, and first and the last value. Other
Summary functions include percentage and fractions below and above a particular cut-off user-
defined value.

Figure 6.10: Aggregate data file

Split File
The researcher is often interested in the comparison of a summary and other statistics based on
certain group behavior. For example, in a study of learning achievement, the researcher may be
interested in comparing the mean scores for students belonging to different sex groups. The sex is
taken as a grouping variable. Multiple grouping variables can also be selected. A maximum of
eight grouping variables can be defined. Cases need to be sorted out by grouping variables. Two
options are available for comparative analysis. These are: compare groups and organize output by
groups. The split file is available under Data menu for making such comparisons. Refer to Figure
6.10 above.
Select Cases
Select case command can be used for selecting a random sub-sample or sub-group of cases based
on specified criteria that includes variables and complex expressions. The following criteria are
used for Select Case command.
Select if (condition is satisfied) variable value and their range. Date and time range Arithmetic
expression, Logical expression, Functions, Row numbers.
Following the Select Case command, the unselected cases can either be deleted or temporarily
filtered. Deleted cases are removed from the active file and cannot be recovered. You should be
careful while selecting Delete option. Filtered option will be deleted temporarily. When the Select
Case option is on, it is indicated in the Data Editor window.
Next, let us review the aspects linked with running a preliminary analysis.
144

6.3.7 RUNNING A PRELIMINARY ANALYSIS


Before running advanced statistical analysis, it is important that you understand the salient
features of your data. Use of statistical applications on a data set, the behavior of which is not
known, can give misleading conclusions. The following section explains the six characteristics
which must be examined for a given data set before attempting an advanced analysis.
Six Characteristics of a Dataset
One strong argument for using computers and graphical presentation of the data is the advantage
of viewing the data in a variety of ways. Preliminary exploration of data and its graphical
presentation helps attain these objectives. The following characteristics will help you in deciding
on the best plan for data management, analysis and presentation. SPSS includes commands for
analyzing of data along the following lines.
Shape: The shape of the data will be the main factor in determining what set of summary statistics
best explains the data. Shape is commonly categorized as symmetric, left-skewed or right-
skewed, and as uni-modal, bi-modal or multi-modal. Frequency distribution, plots and graphical
presentation of data, histogram, P-P, Q-Q, scatter, Box-Plot are illustrative of the techniques that
can be used for determining the shape of a data set. It is important that the user should have
enough knowledge of the properties of various statistical distributions, their graphical
presentations, characteristics and limitations.
Location: Location is simpler and more descriptive than measures of central tendency. Common
measures of location are the mean and the median. Measures of central tendency also can be
calculated for various sub-groups of a data set.
Spread: This measure describes the amount of variation in the data. Again approximate value is
initially sufficient with the measure of spread being informed by the shape of the data, and its
intended use. Common measures of spread are variance, standard deviation and inter-quartile
range. Percentile range is another measure which is used for measurement of dispersion.
Outliers: Outliers are data values that lie away from the general cluster of values. Each outlier
needs to be examined to determine if it represents a possible value from the population being
studied, in which case it should be retained, or if it is non-representative (or an error) in which
case it should be excluded. You should properly weigh and carefully examine the behavior of
outliers before accepting or rejecting of an observation/case. The best choice to display when
looking for outliers is Box-plot. Range, i.e., maximum and minimum values can also be used to
examine the behavior of outliers.
Clustering: Clustering implies that data tend to bunch around certain values. Clustering shows
most clearly on a dot-plot. Histogram, stem and leaf analysis are also important procedures to
examine the clustering pattern of a data set.
Association and relationship: Researchers often look for associative characteristics or
similarities and dissimilarities in the behavior of some variables. For example, achievement
scores and hours of study may be positively correlated whereas the teacher motivation and drop-
out rate may be negatively associated with each other. Correlation coefficient is the most
commonly used measure for understanding the nature and magnitude of association between two
variables.
You should be clear that association does not imply relationship. A relationship is defined by the
cause and effect type of link. Normally, there is one dependent variable and one or more than one
independent variable in the cause and effect relationship. Cause and effect relationship is captured
through regression analysis.
The analysis of data along the above lines provides considerable insight into the nature of data
and also helps researchers in understanding key relationships between variables. It is assumed
that the relationships are of linear type. Non-linear relationships can also be examined using non-
linear techniques of analysis and also by using data transformation techniques as described next.
145

Data Transformation
Data transformation is a very useful aspect of SPSS. Using data transformation, you can collapse
categories, recode the data and create new variables based on complex equations and conditional
statements. Some of the functions are detailed below:
Compute variable:
• Compute values for numeric or string variables
• Create new variables or replace the value of existing variables. For the new variables, you
can specify the variable type and label.
• Compute values selectively for sub-sets of data based on logical conditions.
• Use built-in functions, statistical functions, distribution functions and string functions.
Recode variables
Recoding of variables is an important characteristic of data management using SPSS. Many
continuous and discrete variables need to be recoded for meaningful analysis. Recoding can be
done either within the same variable or a new variable can be generated. Recoding in the same
variable will replace the original values for this purpose. Recoding in a new variable will replace
the old values with new values. The following example illustrates the need and use of recoding
variables.
A survey of the primary schools was conducted in Delhi. Along with other variables, information
on the type of management was also collected. The management code was designed as follows:
1) Government
2) Local bodies
3) Private aided
4) Private unaided
5) Others
Let us assume that a comparative analysis of the government and the private management schools
is to be undertaken. This will be done by combining categories 1, 2, 3 and 4. This can be achieved
by recoding the management code as 1 (for 1 and 2 categories) and 2 for 3 and 4 categories into a
new variable.
Assuming that a database on primary schools in Delhi is available, the enrolment analysis could
be attempted by making suitable categories, i.e. schools with less than 50 students, 51 - 150, 151 -
250 and more than 250 students. This could be achieved by recoding the enrolment variable into a
new variable ‘category’. The analysis could be attempted by changing the class range for
category. If at a later stage in the analysis, it is found that a new category is to be introduced, it
can again be achieved by recoding the enrolment data.
Count
Count is an important command available in SPSS and is used for counting occurrences of the
same value(s) in a list, if variables are within the same case. For example, a survey might contain
a list of books purchased (yes/no) by the students. You could count the number of ‘yes’
responses, or a new variable can be generated which gives the value of count indicating the
number of books bought.
Procedure to run count command
Choose Transform from the main Menu
Choose count
146

Enter the name of a target variable (variable where the count value will be stored)
Select two or more variables of the same type (numeric or string)
Click define variable and specify which value(s) to be counted.
Click OK after the selection has been made.
In survey on learners’ achievement, the answer code to each question in language and
mathematics could be recorded for each student. The codes could be ‘1’ for the correct answer ‘2’
for the wrong answer and ‘3’ for no reply. Count command can then be used to count the number
of correct answers.
Rank Cases
Rank Cases command can be used to rank observations in ascending or descending order. Other
options available for ranking cases are shown in the right hand panel of the Figure 6.11.

Figure 6.11: Rank case file


Next, we shall review the graphical presentation of data.
Graphical Presentation of Data
SPSS offers extensive facilities for viewing the data and its key features in high resolution charts
and plots. From the main menu, select Graphs and the screen shown in Figure 6.12 appears.
Various types of Graph that can be drawn using SPSS are indicated in the sub-commands.

Figure 6.12: Graphics command


Select a chart type from the Graphics menu. This opens a chart dialog box as shown in Figure
6.13.

Figure 6.13: Bar chart windows


147

After the appropriate selections have been made, the output is displayed in the output Navigator
window. The chart can be modified by a double click on any part of the chart. Some typical
modifications include the following:
• Edit axis titles and labels and footnotes
• Change scale (X - Y)
• Edit the legend
• Add or modify a title
• Add annotation
• Add an outer frame
Another important category of charts is High-Low which is often used to represent variables like
maximum and minimum temperature in a day, Stock market behavior or other similar variables.
Box-plot and Error Bar charts help you to visualize distribution and dispersion. Box plot displays
the median and quartiles and special symbols are used to identify outliers, if any. Error Bar chart
displays the mean and confidence intervals or standard errors: To obtain a box-plot, choose Box
plot from the Graphs menu. The simple box plot for mean scores obtained in English and Hindi is
shown in Figure 6.14.

Figure 6.14: Box plot


The above figure shows that there were a large number of outliers in the case of Hindi scores as
compared to English. The outliers were along the higher side. This shows that many students
were scoring very high marks. The sizes (numbers) of cases are shown along the X-axis. The
boxes show the median and the quartile values for both the tests.
Scatter plots highlight the relationship between two quantitative variables by plotting the actual
values along X - Y axis, The scatter plots are useful to examine the actual nature of relationship
between these variables. This could be either linear or non-linear in form. To help visualize the
relationship, you can add a simple linear or a quadratic regression line. A 3-D scatter plot adds a
third variable in the relationship. You can rotate the two dimensional projection of the three
dimensions to delineate the underlying patterns. In order to obtain a scatter plot, select Scatter
from the Graphs option.
A histogram will be obtained by selecting Histogram option from the Graphs menu. The variable
for which a histogram is to be obtained should be selected from the dialog box. The normal curve
can also be displayed along with the histogram to visually see the extent of similarity between the
actual distribution of values and the normal curve.
Pareto and Control charts are used to analyze and improve the quality of an ongoing process. You
may refer to the SPSS manuals for use of these techniques.
148

6.3.8 UNDERSTANDING RELATIONSHIPS BETWEEN


VARIABLES: DATA ANALYSIS
The foregoing details focused on the techniques of analysis describing the behavior of individual
variables. However, most of the research studies require relationships between two or more
variables to be examined. For example, one may be interested in questions like, “do the
achievement scores of boys and girls differ in the same class?”
Now, how to analyze this? Next, we shall review some of the various data analysis features
specific to parametric and non-parametric tests.
Parametric Test
Under this sub-section we shall review frequency tables, cross-tabulations, correlations, ANOVA,
simple regression. We begin with frequency distribution.
A. Frequency Tables
To analyze the data from menu, choose
Analyze → Descriptive Statistics → Frequencies
(Refer to Figure 6.15)

Figure 6.15: Frequencies window


Click the button with the picture of the right arrow. This will move list of selected variables on
the right (see Figure 6.16).

Figure 6.16: Frequencies dialog box


149

Select one or more variables: To do this click the variable “V1” to select it for analysis, then
optionally, you can:
• Click Statistics for descriptive statistics for quantitative variables.
• Click chart for bar chart, pie-charts and histogram.
If you click statistics you will get a dialog box as shown in Figure 6.17.

Figure 6.17: Frequencies statistics dialog box


Now click the boxes for statistics you wish to apply for your data like, mean, standard deviation,
etc. Then click the ‘Continue’ button.
Now click the OK button, it automatically opens an Output – SPSS Viewer Window showing the
Frequencies Tables as illustrated in Figure 6.18.

Figure 6.18: Showing frequency table


In case you wish to have charts then select Graphs from the menu in that you will see types of
charts. Select the chart you wish to have and then the variable.
Nest, let us focus on cross-tabulation feature.
150

B. Cross-tabulations
Cross-tabulation is the simplest procedure to describe a relationship between two or more
categories of variables.
Suppose, you are interested in knowing whether there is an association between two categorical
variables such as “gender” and “attitude towards infant feeding”, you have to cross-tabulate the
two variable and use some statistical tests.
To cross-tabulate:
From the menus (refer to Figure 6.15) choose:
Analyze
Descriptive Statistics
Crosstabs...
You will get a dialog box as shown in Figure 6.19.
Select “Gender” for the row variable and “Attitude” for the column variable.
Select one or more row variables and one or more column variables. Optionally you can:

Figure 6.19: Crosstabs dialog box

• Click Statistics for statistical tests


• Click cells for percentages
If you click ‘statistics’ you will get statistics dialog box as shown in Figure 6.20.

Figure 6.20: Statistics dialog box


151

In this dialog box click the box next to the statistical tests you wish to apply. Say for example if
you wish to apply chi-square test, click the box next to chi-square. Then click the continue button.
Next, click the cells button. You will get Crosstabs: Cell Display dialog box as shown in Figure
6.21.Click on the row or column (or both) percentage box. Click the continue button, then click
the OK button. The table shown in Figure 6.21 will appear in the output navigator window:

Figure 6.21: Showing crosstab table


Next, let us review bivariate correlation.
C. Bivariate Correlations
To obtain Bivariate Correlations
From the menus (refer to Figure 6.5), choose:
Analyze
Correlate
Bivariate

Figure 6.22: Bivariate Correlations


Select two or more numeric variables. The following options are also available:
• Correlation coefficients
• Test of significance
• Flag significant correlation
Next, we shall review how to calculate the independent sample T-test.
D. Independent Samples T-test
To obtain an Independent samples T-test from the menus (refer to Figure 6.23)
Analyze
152

Compare Means
Independent-Samples T-test

Figure 6.23: Independent samples T-test dialog box


First select a quantitative test variable. Then select a single grouping variable and click Define
Groups to specify two codes for the groups you want to compare.
For example; you wish to test a hypothesis that “Did males and females have similar mean
attitude scores? To test the hypothesis of equality of means for two groups, we can use the t-test
statistic.
Figure 6.24 displays the independent samples T-test
• Select “Attitude” as the test variable
• Select “Gender” as the grouping variable
• Click the button define groups
• Enter “1” for group 1 and “2” for group 2 (as shown in Figure 6.24)
• Click the continue button to return to the previous dialog box.
• Click ‘OK’ button to run the procedure.

Figure 6.24: Define groups dialog box


Comparing several means (ANOVA)
When we are interested in an independent variable that has more than two groups, then we will
need to use the analysis of variance (ANOVA).
153

Suppose you are interested in testing the hypothesis. “Do students in each of the three groups of
religious affiliation have similar mean attitude scores?”
From the menu (refer to Figure 6.25) choose
Analyze → Compare Means → One-way ANOVA
You will see the dialog box, as shown in Figure 6.25.

Figure 6.25: One-way ANOVA dialog box


• Select attitude data as the dependent variable and religious affiliation as the factor (i.e.,
independent variable)
• Click the ‘OK’ button to run the procedure.
Next, we shall review simple regression process.
E. Linear Regression
Suppose you have a question. “How well can we predict ‘attitude’ of students if we know
something about their levels of education?” We need to conduct a simple regression analysis to
answer the question.
To conduct this analysis, from the Analyze pull-down menu, select Regression, then choose
Linear...The dialog box shown in Figure 6.26 will appear.

Figure 6.26: Linear regression dialog box


Select ‘attitude’( ) as the dependent variable and students levels of education as the independent
variable. Note that SPSS provides for many important options that are useful in conducting
regression analysis. These are available via the Analyze..., Plots..., Save..., and Options... buttons.
Readers interested in learning more about regression analysis are encouraged to review
154

Schroeder, Sjoquist, and Stephan (1986), as well as, the chapter on regression in the SPSS
manuals (which details these analysis options). Click the OK button to run the procedure. The
results of regression analysis will appear in the output navigator window.
Linear regression is the most commonly used procedure for the analysis of a cause and effect
relationship between one dependent variable and a number of independent variables. The
dependent and independent variables should, be quantitative. Categorical variables like sex and
religion should be recoded to dummy (binary) variables or other types of contrast variables. An
important assumption of the regression analysis is that the distribution of the dependent variable
is normal. Moreover, the relationship between the dependent and all the independent variables
should be linear and all observations should be independent of each other.
SPSS provides extensive scope for regression analysis using various types of selection processes.
The method of selecting of independent variables for linear regression analysis is an important
choice which the researcher should consider before running the analysis. You can construct a
variety of regression models from the same set of variables by using different methods.
You can enter all the variables in a single step or enter the independent variables selectively.
Variable selection method is shown in Figure 6.27.

Figure 6.27: Variable selection method


It allows you to specify how independent variables are entered into the regression analysis. The
following options are available:
• Enter: To enter all the variables in a single step, select Enter option.
• Remove: To remove the variables in a block in a single step.
• Forward: It enters one variable at a time based on the selected criterion.
• Backward: All variables are entered in the first instance and then one variable is removed at a
time on the selected criterion.
• Stepwise: Stepwise variable entry and removal examines the variables in the block at each
step for entry and removal. This is a forward step procedure.
All the variables must pass the tolerance criterion to be entered in the equation; regardless of the
entry method specified. The default tolerance limit is 0.0001. A new variable will not be entered
if it causes the tolerance of another variable already entered to be dropped below the tolerance
limit.
155

Linear Regression Statistics


The following statistics (refer to Figure 6.28) are available on linear regression models. Estimates
and Model Fit are the two options which are selected by default.

Figure 6.28: Linear regression statistics

Regression coefficients: The Estimates option displays regression coefficient, β , standard error,
standard coefficient beta, t-value; and two tailed significance level of t. Covariance matrix
displays a variance covariance matrix of regression coefficients with covariance of the diagonal
and variance of the diagonal. A correlation matrix will also be displayed.
Model fit: The variables entered and removed from the model are, displayed. Goodness of fit
statistics, R-square, multiple R, and adjusted R-square, standard error of the estimate and an
analysis of variance table is displayed.
If other options are ticked, the statistics corresponding to each of the options are also displayed in
the Output Navigator. If the data does not show linear relationship and the transformation
procedure does not help, try using Curve Estimation procedure.
Non-Parametric Tests
The non-parametric test procedure provides several tests that do not require assumptions about
the shape of the underlying distribution. These include the following most commonly used tests:
• Chi-square test
• Binomial test
• Run Test
• One sample Kolmogorov Semonov test
• Two independent Sample tests
• Tests for several independent samples
• Two related sample tests
• Tests for several related samples.
Here, we shall discuss the procedure for Chi-square test only. You are advised to consult the
SPSS’ users’ manual and other statistical books for detailed discussion on the other tests.
Chi-Square
Chi-square test (refer to Figure 6.29) is the most commonly used test in social science research.
The goodness of fit test compares the observed and the expected frequencies in each cell/category
to test either that all categories contain the same proportion of values or that each category
contains a user specified proportion of values.
156

Figure 6.29: Chi-square test


Consider that a bag contains red, white and yellow balls. You want to test the hypothesis that the
bag contains all type of balls in equal proportion. To obtain Chi-square test, choose Chi-square
from Non-parametric tests in the Statistics command. Select one or more variables. Each variable
produces a separate output.
By default, all categories have equal expected values as shown in the figure above. Categories
can have user specified proportions also. In order to provide user specific expected values, select
the Values option and add the user expected values. The sequence in which the values are entered
is very important in this case. It corresponds to the ascending order of the category values of the
test variable.

6.3.9 SPSS PRODUCTION FACILITY


The SPSS Production facility provides the ability to run SPSS in an automated mode. SPSS runs
unattended and uninterrupted and terminates after executing the last command. Production mode
is useful if you run the same set of time-consuming analysis periodically.
The SPSS Production facility uses command syntax file to tell SPSS about the commands to be
executed. We have already discussed the important features of the command syntax. The
command syntax file can be edited in a standard text editor.
To run the SPSS Production facility, quit the SPSS if it is already running. SPSS Production
facility cannot be run when SPSS is running. Start SPSS Production program from the start
window of window 95/98 or later version. Specify the syntax file that you want to use in the
production job. Click Browse to select the Syntax File. Save the production file job. Run the
production file job at any time.
Next, we shall review SAS and NUDIST package which are the other software packages
available for analysis of quantitative and qualitative data, respectively.

6.4 STATISTICAL ANALYSIS SYSTEM (SAS)


Like the SPSS, the Statistical Analysis System (SAS) package calculate descriptive statistics of
your choice e.g., Mean, Standard Deviation etc. SAS is available for both main frame and
personal computers. It is strong in its treatment of data, in clarity of its graphics and in certain
business applications. The various statistical procedures carried out by SAS are always preceded
by the word PROC which stands for procedure. The most commonly used SAS statistical
procedures are as follows: (Sprinthall et.al, 1991).
• PROC MEANS: Descriptive statistics (mean, standard deviation, maximum and minimum
values and so on).
• PROC CORR: Pearson correlation between two or more variables.
• PROC t-TEST: t-test for significant difference between the means of two groups.
157

• PROC ANOVA: Analysis of variance for all types of designs (one way, two-way and
others).
• PROC FREQ: Frequency distribution for one or more variables.
As pointed out by Klieger (1984) SAS package is comparatively more difficult to use due to its
procedural complexities. For greater details on SAS package, you are advised to consult the
books by Klieger and Sprinthall.

6.5 NUDIST
Computer programmes help in the analysis of qualitative data, especially in understanding a large
(say 500 or more pages) text database. Studies using large databases such as ethnographies with
extensive interviews, computer programmes provide an invaluable aid in research.
NUDIST (Non-numerical unstructured data indexing, searching and theorizing) programme was
developed in Australia in 1991. This package is used for qualitative analysis of data. Here we
present briefly the main features of this package. This software requires, 4 megabytes of RAM
and atleast 2 megabytes space for data files in your PC or MAC. In your PC it operates under
windows (Cres well 1998).
As a researcher this software will help you to provide the following:
1) Storing and organizing files: First establish document files and store information with the
NUDIST programme. Document files consist of transcript from an interview, notes of
observation or any article scanned from a newspaper.
2) Searching for themes: Tag segments of text from all the documents that relate to a single idea
or theme. For example, distance learners, in a study on effectiveness of distance education
talk about the role of academic counselors. The researcher can create a node in NUDIST as
‘Role of Academic Counselors’. Researcher will select text in the transcripts where learners
have talked about this role and merge it into role of Academic Counselors. Information can
be retained in this node and researcher can take print in different ways in which learners talk
about the role of academic counselors.
3) Crossing themes: Taking the same example of role of counselors, the researcher can relate
this node to other nodes. Suppose the other node is qualifications of counselors. There are
two categories like Graduate and Post Graduate. The researcher will ask NUDIST to cross
the two categories, role of counselors and qualification of counselors to see for example
whether there is any relation between graduate counselors and their role than the post
graduate counselors and their role. NUDIST software generates information for a matrix with
information in the cells reflecting different perspectives.
4) Diagramming: In this package; once the information is categorized, categories are identified.
These categories are developed into nine visual picture of the categories that display their
inter connectedness. This is called a tree diagram in NUDIST software. Tree diagram is a
hierarchical tree of categories where root node is at the top and parents and siblings in the
tree. This tree diagram is a useful device for discussing the data analysis of qualitative
research in conferences.
5) Creating a template: In a qualitative research, at the beginning of data analysis, the researcher
will create a template which is apriori code book for organizing information.
For further details on NUDIST software you may like to consult the following:
Kelle, E.(ed.), Computer aided qualitative data analysis, Thousand Oaks, CA: Sage, 1995.
Tesch, R., Qualitative research: Analysis types and software tools, Bristol, PA: Falmer, 1990.
158

6.6 EVIEWS PACKAGE

EVIEWS stands for Econometric Views. It is a new version of a statistical package for
manipulating time series data. It was originally the Time Series Processor (TSP) software
for large mainframe computers. As an econometric package, EVIEWS provides data
analysis, regression and forecasting tools. EVIEWS can be useful for multipurpose
analytics, but this introduction will focus on financial time series econometric analysis.
Once you get familiar with EVIEWS, the program is very user friendly.

6.6.1 EVIEWS files and Data:

In this section, we will describe how to create a new work file and import data into
EVIEWS. The various ways of handling the data into the work file are as follows:

6.6.1.1 Creating a workfile:

Before working on any analysis, one must first create a so-called workfile, which must be
of the exact size and type as the data you would like to work with. After the workfile is
created, EVIEWS will let you import data into that from Excel, Lotus, ASCII (text files
etc.). Data from other software packages such as SAS, SPSS, M-FIT, RATS etc. can not
directly imported to EVIEWS.

To create a workfile click File →New → Workfile and the following dialog box will
appear.

If one is working with time series data , then he / she needs to know the frequency of the
data such as daily, weekly monthly, annually etc as well as start and end date for the data.
In case of cross sectional data, one needs the number of the total observations. In case of
cross sectional data, one should choose undated or irregular and enter the start
observation and the end observation in the appropriate textboxes. Let us take an example
where time series data is imported from an excel file using the import function.
159

6.6.1.2 Importing Time Series Data from Excel

We have created a data file in excel. The Excel file has saved in the Path….. The
screenshot of the excel file is as follows:

Now the following five steps procedure should be used for importing time series data to
EVIEWS software.

1. Examine the contents of the files in the excel and note


o the start date and end date of the observations
o total number of observations
o the cell where the data starts (usually B2)
o the name of the variables and the order in which they appear
o the sheet name and the path of the sheet where it has saved.

The example has daily (5-days week) data with a start date of April 1991and an end date
of Dec, 2008.The data starts at B2 in the excel sheet called Sheet 1. There are 16
variables. Some of them have very long names and it will be good to make them shorter,
which will be easier to import data into EVIEWS work file and easier to work with later.

2. Create a new work file as per the above instructions.


One should choose daily (5 days week) and enter the start date and the end date. In case
of daily frequency data one would enter MM/DD/YYYY format. However, for quarterly
data, one would enter 1999:3, e.g. for the third quarter of 1999. Monthly data follows the
same pattern, i.e. 1999:3 means March 1999. In the case of irregular frequency of the
data, one would enter the total number of observations. After clicking OK you should
end up with the workfile as shown below.
160

As noted in the above workfile, the range as well as the sample is the period between 1st
Jan 1998 and 7th July 2009.There are always two different series, C and RESID, as
default. C is the column that will contain the coefficients from the last regression
equation that one has estimated. RESID is the column that will contain the residuals
from your last estimated model.

3. Click Procs → Import → Read Text- Lotus- Excel. In the dialog box for open
choose the excel format and browse for the file. Select the file and open. It is
important to mention over here that one should close the excel file before trying to
import it to EVIEWS. Otherwise there will be an error message.
4. A dialog box now appears in which it is very crucial to enter the correct
information. Any mistakes could result in an incomplete or even wrong dataset.
This is where our former check-up of the Excel file becomes very important. In
this example the dialog box should be filled out as follows:
161

The order of the data is by observation-series in columns. Upper-left data cell is B2and
the sheet name is Sheet 1. If there is only a single sheet (as in this example) it is not
necessary to name it.

The name of the series / variables have been changed (notice that no spaces are allowed
in the names) in order to make them easier to work with. However, if you would like to
import the names that are given in the excel, you simply enter the number of the variables
(in this case 8). These names can then be changed in EVIEWS using the Rename
function. However, using this method can cause problems if for example the names start
with a number and are very similar (e.g. names such as 30 day return, 5 day price change
etc.).

The sample to import is taken from the workfile. Here it is possible to exclude periods,
which can be useful in case you would like to get rid of any outliers.

The workfile that you should have by now as follows:


162

It contains a list of the 9 imported variables in alphabetical orders as well as the two
columns for the estimated coefficients and the residuals. It is always better to check if all
the variables have been correctly imported. This is because, while importing all the
variables, one may get the common errors including rows with “NA” and numbers that
are too high or low.

Another useful approach is to open the set of variables or all the variables as a group. One
can do this with following steps:
o clicking the first variable (as per your discretion)
o holding down the [Ctrl] key and clicking the other variable (one can choose
his/ her order of variables)
o clicking View→open as one window→open group
or simply just right clicking or double clicking on either of the selected series
and then clicking open group
5. If you are certain that you have imported the data correctly into the EVIEWS
workfile, you can now save this workfile by clicking File→Save As. The workfile
will be saved in EVIEWS own Wf1 format. A saved workfile can be opened later
by selecting File →Open File→Workfile from the main menu.

6.6.1.3 Transforming the Data:

While working on time series data, it is often very useful to transform the existing
variables to take care of scale and size and for other purposes. This can be done in
EVIEWS using the [Genr] button in the top right hand corner of your workfile. For
example one would like to work on the stock return file and imported the stock price data
into EVIEWS workfile. The stock return is explained as follows:

RSt = ln (St) – ln (St-1)


Where, RSt represents the stock price return and St and St-1 are the stock prices of time
period t and t-1. This continuous return can quickly be calculated by means of the DLOG
function. Simply click the [Genr] button and enter the equation below followed by OK.

This will create the variable RET_STOCKA and include it in the workfile. You can view
the returns by double clicking the variable.
163

Apart from DLOG, there are naturally a number of other mathematical functions as well
as simple addition, subtraction, division and multiplication available. More frequently,
one can use price differential as well as lagged variable in time series data for unit root
purposes. This can be performed by writing the following equation in [Genr] button.

Price Differential = Stock_A – Stock_A(-1)

Lag = Stock_A(-1)

6.6.1.4 Copying Output

Any graph or equation output can easily be copied into a word document. To copy a
table, simply select the area you want to copy and click Edit →Copy. A dialog box
should appear, where you would usually select the first option: Formatted –copy numbers
as they appear in the table, then you go to word /excel and paste the selected area and
change the size of the output until it suits to your document.

To copy a graph, click on it and a blue border should appear, then click Edit →Copy. In
the appearing dialog box, click copy to clickboard and then paste into word / excel. Again
the size can be adjusted to a suitable size.

6.6.1.5 Examining the Data

EVIEWS can be used for examining the data in a variety of ways. This is demonstrated
as follows:

Displaying Line Graphs:

If you want to select a few variables and display a line graph of each of the series, you
can follow the given example based on the previous mentioned EVIEWS workfile.

In this example, we want to view the four time series in the workfile, BSE 100, REER,
NEER, SENSEX. The procedure is to highlight the three variables (using the mouse and
164

[Ctrl] key) followed by a double or right click. Then you click Open Group and click the
[View] button in the appearing spreadsheet From this menu, you can click Multiple
Graphs →Line and four line graphs depicted below as follows. As one can see there are
other choices of graphs as well. In general, clicking the [View] button mentioned above
offers you many options of viewing your selected data.

If you would like to save the output in your workfile for latter use, you first click the
[Freez] button. In the new window which appears, click the [Name] button. In the dialog
box, you enter a name for the output and click OK. Now the output appears with a graph
icon in your workfile.

Drawing a Scatter Plot:

The procedure is similar to the one just mentioned above, but while using scatter plot,
one can use the other options as well, for example Scatter plots in connection with a
regression model.

We can show an example from our previous mentioned workfile. This is as follows:
165

Obtaining Descriptive Statistics and Histogram:

One can obtain the descriptive statistics and histogram of a series by double clicking the
series in the workfile. In the appearing spreadsheet you click the [View] button and
choose Descriptive Statistics→Histogram and Stats.

If you want to obtain descriptive statistics for several series at a time instead, you
highlight the relevant series (using the mouse and the [Ctrl] key), double or right click
and choose Open Group. In the appearing spreadsheet, click the [View] button and
choose Descriptive Statistics →Individual Samples. This procedure will not give you
the histograms, however. The descriptive statistics and histogram of one time series
variable (SENSEX) is as follows:
166

6.6.1.6 Displaying Correlation and Covariance Matrices:

The easiest way to display correlation and covariance matrices is to highlight the relevant
series (using the mouse and [Ctrl] key] and then click Quick →Group Statistics
→Correlations (or covariances if you want covariance matrix). This creates a new group
and produces a common sample correlation / covariance matrix. If a pair wise
correlation/covariance matrix is more suitable, this is produced by clicking [View] button
and choosing Correlations (or Covariances) →Pairwise Samples. One of the examples
of this is given below.

6.6.1.7 Seasonality of the Series:

Before undertaking any time series econometric analysis of the data, it is utmost
important to deseasonalised or to remove the seasonal fluctuations, if the frequency of the
time series data is quaterly or monthly etc. This is one of the major properties of time
series econometrics. To remove the seasonal fluctuations or to deseasonalize the data so
many methods are available, which are given below.
‹ Census X12.
‹ X11 (Historical) Method.
‹ Moving Average Method.
167

To perform the seasonal test, you can select any variable from the workfile and by double
clicking on it you can open the data file of that variable. Then by clicking on the [Procs],
you can find all the above mentioned tests to check the seasonality of the series. The
example of the seasonality test is as follows:

6.6.1.8 Estimating Equations:

In the following illustration, we will demonstrate how you estimate a regression model in
EVIEWS. When you have opened your workfile, click on the [Objects] button. Select
New Object → Equation and the following dialog box will appear.

Alternatively, you could have clicked Quick →Estimate Equation.

Say we want to estimate the Regression equation on stock price and exchange rates. In
the below example, Sensex is the dependent variable and NEER and REER are the
independent variables. You can enter the model in two ways. First you list the variables
followed by C for the intercept term and then the independent variable(s). There must be
single space between variable, so we will enter the following regression into the first
window.
168

SENSEX C NEER REER

After you have entered your perfect equation, select the estimation method and your
sample period. Then, just click OK and get the following output.

6.6.1.9 Testing for Unit Roots:

The non-stationary nature of most times series data and the need for avoiding the problem
of spurious or nonsense regression calls for the examination of their stationary property.
In brief, variables whose mean, variance and autocovariance (at various lags) change over
time are said to be nonstationary time series or the unit root4 variables. Alternatively, a
time series is stationary, if its mean, variance and autocovariance (at various lags) are
time independent.

Dickey and Fuller (1979) consider three different regression equations that can be used
to test the presence of a unit root:

ΔYt = γYt −1 + ε t … (1) (NONE)


ΔYt = a 0 + γYt −1 + ε t ...... (2) (INTERCEPT)
ΔYt = a 0 + γYt −1 + a 2 t + ε t ..... (3) (TREND & INTERCEPT)

In the above specifications, the difference among three regressions concerns the presence
of the deterministic elements a0, a2t. The first is a pure random walk model, in the second
an intercept or drift term has been added, and the third equation includes both a drift and
linear time trend. The parameter of interest in all the regression equations is γ; if γ = 0,
the {Yt} sequence contains a unit root. The test involves estimating one or more of the
equations above using OLS in order to obtain the estimated value of γ and associated
standard error. Comparing the resulting t-statistic with the appropriate value reported in
the Dickey Fuller tables allows to determine whether to accept or reject the null
hypothesis γ = 0.

In conducting the Dickey Fuller test as in equations 1, 2 and 3, it was assumed that the
error term εt was uncorrelated. But incase the error term εt is autocorrelated , Dickey and
Fuller have developed a test, known as the Augmented Dickey Fuller (ADF) test.

The ADF test may be specified as follows:

4
The term unit root refers to the root of the polynomial in the lag operator.
169

k
ΔYt = a 0 + a1 t + γYt −1 + β i ∑ ΔYt − i + ε t …. (3.1)
i =1

Where εt is a pure white noise error term Δ is the difference operator and γ and β are the
parameters. In the ADF test, we still test whether γ = 0 and the ADF test follows the same
asymptotic distribution as the DF statistic, so the same critical values can be used.

This test is quickly done in EVIEWS by double clicking the relevant time series go to the
spreadsheet view. Here, click the [View] button and select Unit Root Test. Alternatively,
You can click [Quick] button and select series and Unit Root Test. The following dialog
box should appear.
170

In the above dialog box, you have to first select the test type. Next select whether you
want to test unit root at level, first difference or second difference. And finally, you can
choose whether you would like to perform the unit root test with none, with intercept or
with trend and intercept (the equations are explained above).

Now we will examine whether a time series is stationary or not. In the following example
we have examined the unit root of stock index (SENSEX) both at level and first
difference by using Augmented Dickey Fuller Test. The results are as follows:
171

From the above result, the ADF test statistics to the left (-4.083999) is greater than the
critical values at all the statistical significance level. Hence the null hypothesis of unit
root is rejected and SENSEX is stationary at it’s first difference level.

6.6.1.10 ARIMA / ARMA Identification and Estimation:

The identification of an ARIMA model is done by examining a correlogram. In EVIEWS


you obtain a correlogram for at variable by double clicking the variable to open the
spreadsheet view. Here, click the [View] button and choose correlogram. In the appearing
dialog box, choose between level, first difference or second difference and then enter the
desired number of lags to include. An example of the correlogram for the variable
SENSEX is shown below.

An ARIMA estimation or the so called Box-Jenkins methodology for ARIMA model


(Madsen (1992) Maddala (1992)consists of four steps.. These steps are identification,
estimation, diagonisis and forecast. For more details, the reader may refer any standard
text book.

6.6.1.11 Granger Causality Test:

Granger’s causality may be defined as the forecasting relationship between two variables
proposed by Granger (1969) and popularised by Sims (1972). In brief, Granger causality
test states that if S & E are two time series variable and if past values of a variable S
significantly contribute to forecast the value of another variable ES is said to be Granger
cause E and vice versa. The test involves with the following two regression equations
172

n n
St = γ 0 + ∑i =1
α iEt−i + ∑ j =1
β j S t − j + u 1 t …(4)
m m
Et = γ 1 + ∑
i =1
λiEt−i + ∑
j =1
δ j S t − j + u 2 t ….…(5)

Where St and Et are the stock price and exchange rate to be tested, and u1t and u2t are
mutually uncorrelated white noise errors, and t denotes the time period. Equation (4)
postulates that current S is related to past values of S as well as of past E. Similarly,
equation (5) postulates that E is related to past values of E and S. The null hypothesis for
equation (4) is that there is no causation from S to E, thus the coefficients on the lagged S
n
are not significant, ∑β
j =1
j = 0 . Similarly, the null hypothesis for equation (5) is that there

is no causation from E to S, thus the coefficients of lagged E are not significant,


n

∑δ
j =1
j = 0 . Three possible conclusions that can be addressed from such analysis include

unidirectional causality, bi-directional causality and are independent to each other.

Granger causality test can easily performed in EVIEWS. In the workfile, you can select
the group of variable in your choice and click the [View] menu and select Granger’s
Causality. The result of Granger’s causality should appear as follows:

6.6.2 Vector Auto Regression (VAR):

By the very construction, a VAR system consists of a set of variables, each of which is
related to lags of itself, and of all other variables in the system. In other words, a VAR
system consists of a set of regression equations, each of which has an adjustment
mechanism such that even small changes in one variable component in the system may be
accounted automatically by possible adjustments in the rest of the variables in the system.
Thus, VAR provide a fairly unrestrictive approximation to a reduced form structural
model without assuming beforehand any of the variables as exogenous. Thus, by
avoiding the imposition of a priori restrictions on the model, the VAR adds significantly
to the flexibility of the model.
173

A VAR in the standard form is represented as :


St = a10 + a11 St-1 + a12 Et-1 +e1t
Et = a20 + a21St-1 + a12Et-1 + e2t
Where
• St is the stock price at the time period t,
• Et is the exchange rate at the time period t,
• a10 is element i of the vector Ao,
• aij is the element in row i and column j of the matrix A1 and eit
As the element i of the vector et and it represents in the above equation as e1t and e2t
respectively are white noise error term and both have zero mean and constant variances
and are individually serially un correlated.

Steps of VAR:

• Now we discuss about various steps involved in VAR estimation.


• To start with, VAR estimation procedure requires the selection of
variables to be included in the system. The variables included in the VAR
are selected according to the relevant economic model.
• The next step is to verify the stationarity of the variables.
• The last step is to select the appropriate lag length. The lag length of each
of the variables in the system is to be fixed. For this we use Likelihood
Ratio (LR) test.

After setting the lag length, now we are in a position to estimate the model. But it may be
noted that the coefficients obtained from the estimation of VAR model can’t be
interpreted directly. To overcome this problem, Litterman (1979) had suggested the use
of Innovation Accounting Techniques, which consists of both Impulse response functions
(IRFS) and Variance Decompositions (VDS).

Impulse Response Function:

Impulse response function is being used to trace out the dynamic interaction among
variables. It shows the dynamic response of all the variables in the system to a shock or
innovation. For computing the IRFS, it is essential that the variables in the system are
ordered.

Variance Decomposition:

Variance decomposition is used to detect the causal relations among the variables. It
explains the extent at which a variable is explained by the shocks in all the variables in
the system. The forecast error variance decomposition explains the proportion of the
movements in a sequence due to its own shocks verses shocks to the other variables.

Now we can take an example of VAR modeling between stock price and exchange rates.
Let us say we have considered SENSEX and BSE 100 to represent stock market and
NEER and REER to represent the effective exchange rates. In the EVIEWS workfile, you
can select the group of variables and open the group of variables. Then you click [Quick]
and select estimate VAR. The following dialog box should appear. Enter all the variables
174

under endogenous variables. Choose the optimum lag length as per the lag augmentation
criterion (refer next section) and click OK.

After clicking OK, the following output will be generated. As mentioned above, after
generating the following output, click[File] and select Lag Structure and select Lag
Leangth Criteria. This is given as follows:

Then the following output on lag length will be generated for various statistical lag
augmentation criterions. One can choose the optimum lag length as per any given
criterion.
175

For Impulse Response Function, you can click [File] and select Impulse Response..Then
the following dialog box should appear.

Now you can select the output in display format either in Table or Graph. Then select the
impulse and response variables along with periods ahead for forecasting. While doing so,
the following dialog box should appear.

In the above dialog box, we have selected the output in multiple graphs format. By
clicking OK, the following output table will be generated.
176

In the similar process, you can generate the variance decomposition output which is
shown as follows.

S-ar putea să vă placă și