Sunteți pe pagina 1din 32

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/282286137

Data Analysis and Interpretation

Conference Paper · September 2015

CITATION READS

1 7,266

1 author:

Vijayamohanan Pillai N
Centre for Development Studies
118 PUBLICATIONS   372 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Development View project

Poverty of Communism View project

All content following this page was uploaded by Vijayamohanan Pillai N on 29 September 2015.

The user has requested enhancement of the downloaded file.


Data Analysis and
Interpretation
Vijayamohanan Pillai N
CDS

Presented to the participants of an Induction Training Programme


organized by the Institute of Management in Government
in collaboration with DoPT, Government of India
on 25 September 2015.

9/29/15 CDS                       Vijayamohan 1
The purpose of analyzing data is
to obtain usable and useful information.

The analysis, irrespective of whether the data is


qualitative or quantitative, may:

describe and summarize the data

identify relationships between variables

compare variables

identify the difference between variables

forecast outcomes
9/29/15 CDS                       Vijayamohan 2
Data analysis
concerned with the analysis of data –
of any kind, and by any means.

Statistics
the art of collecting and interpreting data,
ranging from planning the collection to
presenting the conclusions,

covers all of data analysis (and some more).

the two terms practically coextensive.

9/29/15 CDS                       Vijayamohan 3
John Wilder Tukey (1962) “The Future of Data
Analysis”. Ann. Math. Statist., 33,1-67.

preferred "data analysis" over "statistics"

latter term is used by many in an overly


narrow sense, covering only those
aspects of the field that can be
captured through mathematics and
probability.

articulated the important distinction between


exploratory data analysis and
confirmatory data analysis,
believing that much statistical methodology placed too
great an emphasis on the latter.
9/29/15 CDS                       Vijayamohan 4
Tukey opened his paper with the words:

“For a long time I have thought that I was a


statistician, interested in inferences from the
particular to the general. But as I have watched
mathematical statistics evolve, I have had cause to
wonder and to doubt. ……… All in all, I have come to
feel that my central interest is in data analysis, which
I take to include, among other things: procedures for
analyzing data, techniques for interpreting the
results of such procedures, ways of planning the
gathering of data to make analysis easier, more
precise or more accurate, and all the machinery and
results of (mathematical) statistics which apply to
analyzing data.”
9/29/15 CDS                       Vijayamohan 5
Later on he emphasized:
“Data analysis, and the parts of statistics which adhere
to it, must then take on the characteristics of a science
rather than those of mathematics, specifically:

(1) Data analysis must seek for scope and usefulness


rather than security.

(2) Data analysis must be willing to err moderately


often in order that inadequate
evidence shall more often suggest the right answer.

(3) Data analysis must use mathematical argument and


mathematical results as
bases for judgment rather than as bases for proofs or
stamps of validity.”
9/29/15 CDS                       Vijayamohan 6
He remarked:
"Large parts of data analysis are
inferential in the
sample-to-population sense,
but these are only parts,
not the whole."

He argued:
"In data analysis we must look to a very
heavy emphasis on judgment.”

9/29/15 CDS                       Vijayamohan 7
Six blind men observing an elephant:

One feels the side and thinks:


elephant is like a wall.
Another feels the tusk and
thinks:
elphant is like a spear.
One feels the trunk and thinks:
elephant is like a snake.
Another feels the knee and thinks:
elephant is like a tree.
One touches the ear and thinks:
elephant is like a fan.
Another grasps the tail and thinks:
elephant is like a rope.

9/29/15 CDS                       Vijayamohan 8
Tukey (1962) remarked:
“Large parts of data analysis are inferential in the
sample-to-population sense,
but these are only parts, not the whole.

“Large parts of data analysis are incisive,


laying bare indications which we could not perceive by
simple and direct examination of the raw data,
but these too are parts, not the whole.

“Some parts of data analysis … are allocation, in the


sense that they guide us in the distribution
of effort ….

“Data analysis is a larger and more varied field than


inference, or incisive procedures, or allocation.”
9/29/15 CDS                       Vijayamohan 9
Traditionally data analysis divided into
descriptive statistics, and
confirmatory data analysis (CDA).

Descriptive statistics - quantitative description of the


main features of a set of data or sample
(simple summaries – summary statistics - about the
sample)
or visual, i.e. simple graphs.

9/29/15 CDS                       Vijayamohan 10
Summary statistics of a data set :
measures of central tendency,
measures of variability or dispersion,
and
measures of the shape of the
distribution of the data.

Univariate analysis
and
Bivariate analysis

9/29/15 CDS                       Vijayamohan 11
Univariate analysis
describes the distribution of a single variable,
in terms of its

measures of central tendency


(mean, median, and mode)

measures of dispersion or spread


(range, quantiles, variance (or standard
deviation)).

measures of the shape of the distribution


(skewness and kurtosis).
9/29/15 CDS                       Vijayamohan 12
Univariate analysis

possible thru graphs


(histograms, boxplots)

9/29/15 CDS                       Vijayamohan 13
Bivariate analysis

When a sample consists of two variables,

not only simple descriptive analysis, but also


the relationship between two different
variables :

Cross-tabulations and contingency tables


Graphical representation via scatterplots
Quantitative measures of association
(correlation and regression)

9/29/15 CDS                       Vijayamohan 14
Traditionally data analysis divided into
descriptive statistics, and confirmatory data
analysis (CDA).

CDA - confirming or falsifying existing


hypotheses. – Statistical hypothesis testing.

models are to be tested in isolation


against data specifically sampled for that
purpose.
Consequently, theory
reigns supreme in the business of model
specification.
9/29/15 CDS                       Vijayamohan 15
Data used only to reject or validate a model
and estimate its coefficients,

But never allowed to suggest new or better


models.

Data analysis, therefore, was confined to what


the philosopher R. W. Miller (1987: 173) called
“a lonely encounter of hypothesis with
evidence”.

Miller, R.W. (1987) Fact and Method: Explanation,


Confirmation and Reality in the Natural and the Social
Sciences, Princeton, NJ: Princeton University Press.
9/29/15 CDS                       Vijayamohan 16
Exploratory data analysis (EDA)

The set of techniques initiated by Tukey (1977)


and Mosteller and Tukey (1977)
rapidly evolved into a novel approach to data
analysis.

EDA puts the emphasis squarely on learning


from data so as to arrive at an
explanation which appears plausible in the
light of the evidence.

9/29/15 CDS                       Vijayamohan 17
John W. Tukey, (1977) Exploratory Data
Analysis, Reading, MA: Addison-Wesley.

Frederick Mosteller and John W. Tukey,


(1977) Data Analysis and Regression: A
Second Course in Statistics, Reading, MA:
Addison-Wesley.

9/29/15 CDS                       Vijayamohan 18
EDA,
unlike traditional modelling
approaches,
makes extensive use of analytical
graphics
along with numerical summaries.

9/29/15 CDS Vijayamohan 19


“EDA, exploratory data analysis,
is an approach to statistics which emphasizes
that a researcher should begin his or her
analysis by looking at the data,
on grounds that the more familiar one is with
one's data, the more effective they can be used
to develop, test, and refine theory.

9/29/15 CDS                       Vijayamohan 20
Econometricians are often accused of never
actually looking at their data.
Exploratory data analysts believe in the
inter-ocular trauma test:
keep looking at the data until the answer
hits you between the eyes!”
- Kennedy (1992: 284)

Peter Kennedy (1992) A Guide to


Econometrics, Oxford: Blackwell.

9/29/15 CDS                       Vijayamohan 21
Data mining

the analysis step of the "Knowledge Discovery


in Databases" process

an interdisciplinary subfield of computer


science,
the computational process of discovering
patterns in large data sets ("big data")

involving methods of
artificial intelligence,
machine learning,
statistics, and
9/29/15 CDS                       Vijayamohan 22
database systems.
Artificial intelligence

the intelligence exhibited by machines or


software.

also the name of the subject that studies


how to create computers and computer
software
that are capable of intelligent behavior.

9/29/15 CDS                       Vijayamohan 23
Machine learning

a subfield of computer science that evolved


from the study of pattern recognition and
computational learning theory in artificial
intelligence.

explores the study and construction of


algorithms that can learn from and make
predictions on data.

9/29/15 CDS                       Vijayamohan 24
The related terms
data dredging, data fishing, and data snooping

data mining methods to sample parts of a


larger population data set
that are too small for reliable statistical
inferences.
useful in creating new hypotheses to test
against the larger data populations.

also analyzing data without an a-priori


hypothesis

9/29/15 CDS                       Vijayamohan 25
Data mining

The overall goal :


to extract information from a data set and
transform it into an understandable structure
for further use

9/29/15 CDS                       Vijayamohan 26
Data Mining

Other terms :

Data Archaeology,
Information Harvesting,
Information Discovery,
Knowledge Extraction, etc

9/29/15 CDS                       Vijayamohan 27
Data interpretation:
Explaining the patterns and trends uncovered
through data analysis,
bringing all the background knowledge,
experience, and skills to bear on the question
and relating the data to existing scientific
ideas/theories.

Given the personal nature of the knowledge


we draw upon, this step can be subjective, but
that subjectivity is scrutinized through the
peer review process.

9/29/15 CDS                       Vijayamohan 28
As in science research,
disagreement may be common and
may generally lead to
more data collection and
new research methods and
new findings.

9/29/15 CDS                       Vijayamohan 29
Data interpretation
not a free-for-all,
nor are all interpretations equally valid.

Interpretation involves constructing a logical


argument that explains the data.

Interpretations are neither absolute truth nor


personal opinion:
They are inferences, suggestions, or
hypotheses about what the data mean,
based on scientific research
and individual expertise.
9/29/15 CDS                       Vijayamohan 30
9/29/15 CDS                       Vijayamohan 31
View publication stats

S-ar putea să vă placă și