Documente Academic
Documente Profesional
Documente Cultură
Abstract
This paper addressed what matters most in research. These include choice of
appropriate statistic and data interpretation. It also provided explanation to
some confusing terminologies including those whose use in some contexts
are subject of debate.
Key words: Choice of statistic; data interpretation; scales; Likert scale; Chi
square; difference between achievement and performance; influence, impact,
effect and perceptual studies.
Introduction
In recent time many world over, have raised eyebrows on some of the
choices of statistics made, patterns of data analysis, interpretation and even
the way tables are made (not in line with 6th APA style). Many claimed that
what is in use are far outdated. Among experts, sometimes there are
148
149
Mean & Applicable to all Achievement test, Possible with continuous data, Likert
standard designs Likert scales & scales but not head counts or data from
deviation other similar types inventory instruments, etc.
Chi square: two Surveys, expost facto, Inventory ratings This is used where the assumptions of
basic types demographic data in (categorical data), parametric statistics about the
any design counts (numerical distribution are not satisfied. We have
data).Generally Chi square test of goodness of
deals with discrete fit/contingency and chi square test of
data. difference/independence. It must be
noted that any given data no matter
what, can fit into one form of chi square
or the other and sometimes some have
no interpretation. Watch! Chi square
test generally is about the weakest
statistics for testing hypotheses and
should be used with caution.
t-test: 2 main Surveys, expost facto, Achievement test, A parametric statistics. It tests
types ( true experiment, etc measurements. difference between means & must
Independent t- Data from Likert reflect in the way you state your
test and scales transformed. hypotheses (one or two tailed). The
dependent or Note that it must be source of data (different group or same
related t-test) the means that will group of people) determines if you
be used. should use independent or dependent t-
tests (that is, unrelated or related t-
tests). Watch!
ANOVA Surveys, expost facto, Achievement test, A parametric statistic, has same
(Analysis of true experiment, etc. measurements. conditions with independent t-test and
Variance) Data from Likert used when you have more than two
scales groups to compare. When such happens
(transformed) posthoc analysis using Scheffe or LSD
converted or test, etc is necessary to know direction
means used can fit of significance…by pairing them in
in. groups of two for comparison
ANCOVA Quasi Experimental Achievement test, Can be used for 2 or more groups. If
(Analysis of study mainly-pretest measurements. above 2 groups Pair wise analysis will
Covariance) posttest type. Data from Likert be necessary to know the direction of
Assumes that the scales transformed
150
study groups are not or means used can significance.
equivalent. fit in.
However, if the data do not satisfy the
assumptions of ANCOVA, ANOVA of
the post test scores using the pretest
measure as a blocking variable will be
appropriate.
Reliabilities
Cronbach Alpha Surveys, expost facto, Polytomously Good for Likert scales & its kinds
experiments, etc scored instrument where no one right or wrong answer
eg Likert scales & generally.
its kinds generally
Spearman Rank Any design Essay tests, etc Can If more than two judges or scorers use
Order be use for two other statistics; some use ANOVA but
correlation judges or scorers the result is not giving you “r”. Simply
only if the differences among the scorers ( 3
& above) are significant then it is
reliable
Spearman Any design Any instrument Split half type of reliability. Best with
Brown split into two eg achievement test and total item number
Prophecy odd even or first should be EVEN.
and second half.
Pearson Product Experimental designs Any instrument Can be use for two judges or scorers
Moment mainly with retention used but preferably only
test. achievement test
151
For other If more than two scorers are used in any
statistics please case above then Kendall Coefficient of
read before Concordance W would be applicable
making choices
This Table 2 should be considered as guide only, and each case should be considered on
its merits.
152
(a) If data are censored.
(b) The Kruskal-Wallis test is used for comparing ordinal or non-Normal
variables for more than two groups (it is used when the distribution violate
assumptions of one-way ANOVA), and is a generalisation of the Mann-
Whitney U test (is the non-parametric statistics for 2, 3-way ANOVA). The
technique details are described in more advanced books and is available in
common software (Epi-Info, Minitab, SPSS).
(c) If the outcome variable is the dependent variable, then provided the
residuals (see) are plausibly Normal, then the distribution of the independent
variable is not important.
153
(d) There are a number of more advanced techniques, such as Poisson
regression, for dealing with these situations. However, they require certain
assumptions and it is often easier to either dichotomise the outcome variable
or treat it as continuous.
Mean
Sum of Squares Df Square F Sig.
Between Groups 1.276 2 .638 3.615 .028
Within Groups 52.440 297 .177
Total 53.717 299
(I) Class (J) Class Mean Difference (I-J) Std. Error Sig.
JS 1 JS 3 -.07691 .05930 .432
SS 3 -.15901* .05914 .028
JS 3
SS 3 -.08210 .05988 .392
For the ANOVA, F2, 297 = 3.62, P=.028 0.03<0.05. This is significant and
we therefore reject the null hypothesis. You either reject or do not reject a
null hypothesis. When P is less or equal to 0.05 it is significant but it is
154
not significant if P is greater than 0.05. This is the reverse of the old style
in terms of t-crit and t-cal; eg if t-cal≥tcrit, it is significant but if t -cal<t-
crit, it is not significant. The old style of t-calculated and t-critical is out
of use though correct. (This is because the output from electronic analysis
of SPSS and others gives both the value of the test statistics and the
associated level of sig unlike the manual that you need to read from
book). Alpha value of 0.05 means that in every 100 there is a chance that
95 is correct and 5 wrong or put differently for 100 patients experimented
with new drug under test, there is the chance that 5 persons will die. This
informs the choice of alpha value of 0.01 by Health sciences,
engineering, etc. We want drug tested and no person will die or at most
only a person will die out of 100. In education and social sciences we
need to note that when we reduce the probability of making Type 1(α)
error (setting the alpha level too high that you reject results that should be
accepted), we increase the probability of making Type 2 Error (β) (when
alpha value is set too low that you accept results that should be rejected
eg 0.1). It is therefore desirable to strike a balance on alpha level that is
not too small nor too large since we are dealing with objects, human
opinions and events.
Table 5: t-Test on mean difference between Male and Female Creativity rating
Std. T Sig. 2 tail
Gender N Mean Deviation df
Students Female 157 3.0916 .4576 296 0.131 0.896
Mean Rating Male
on Creativity 143 3.1565 .3822
t296 =0.13, P= 0.896>0.05; it is not significant. Do not reject null hypothesis.
Table 7: Correlations
Test1 Test2
Test1 Pearson Correlation 1 .969**
Sig. (2-tailed) .000
Sum of Squares and Cross-products 3095.800 2840.800
Covariance 162.937 149.516
N 20 20
**. Correlation is significant at the 0.01 level (2-tailed).
The Scales
The construction of the scales involve generating a list of statement
(questions and items) about what is been measured and providing a set of
graduated response options. Using this graduated a respondent is expected to
indicate a degree of agreement or disagreement with the statement.
The advantageous side of the Likert Scale is that they are the most universal
method for survey collection, therefore they are easily understood. The
responses are easily quantifiable and subjective to computation of some
mathematical analysis. Since it does not require the participant to provide a
simple and concrete yes or no answer, it does not force the participant to
156
take a stand on a particular topic, but allows them to respond in a degree of
agreement; this makes question answering easier on the respondent. Also,
the responses presented accommodate neutral or undecided feelings of
participants. These responses are very easy to code when collating data since
a single number represents the participant’s response. Likert surveys are also
quick, efficient and inexpensive methods for data collection. They have high
versatility and can be sent out through mail, over the internet, or given in
person.
Attitudes of the population for one particular item in reality exist on a vast,
multi-dimensional continuum. However, the Likert Scale is uni-dimensional
and only gives 5-7 options of choice, and the space between each choice
cannot possibly be equidistant. Therefore, it fails to measure the true
attitudes of respondents. Also, it is not unlikely that peoples’ answers will be
influenced by previous questions, or will heavily concentrate on one
response side (agree/disagree). Frequently, people avoid choosing the
“extremes” options on the scale, because of the negative implications
involved with “extremists”, even if an extreme choice would be the most
accurate. While these remain strong criticisms for the use of Likert scale to
measure attitude, it only calls for use with caution as the scale is still
relevant.
157
My parents have provided support SD D N A SA
for my prayer and fasting
programme.
Likert Items
I eat healthy foods on a regular SD D N A SA
basis.
When I purchase food at the grocery SD D N A SA
store, I ignore "junk" food.
Difference between Likert-type items and Likert scale items is that Likert-
type items are single questions that use some aspects of the original Likert
response alternatives eg prayer and fasting as in Table 10. Here, multiple
questions may be used in a research instrument and there is no attempt by
the researcher to combine the responses from the items into a composite
scale. Each item stands for an idea (eg experience and provision of support
by parents). A Likert scale, on the other hand, is composed of a series of
four or more items that are combined into a single composite score/variable
during the data analysis process. Combined, the items are used to provide a
quantitative measure of a character or personality trait (in Table 10 it is
healthy food eg ignoring junk still means eating healthy foods).
The statements are framed such that half are positively cued whereas the
other half is negatively cued. To avoid response set, the positive and
negative statements are placed in alternate positions. The response options or
categories are weighted or scored in such a way that a higher value indicates
a more positive/intense response or attitude. Thus for the positively cued
statement, the options are weighed or scored as follows:
AS = 5; A = 4; U=3 D=2 SD = 1
And for the negatively cued item, the weighting /scoring is reversed thus:
AS = 1; A = 2; U=3 D=4 SD = 5
158
Several variations of the Likert scale, with varying number of points and
descriptions (ie adjectives) can be found in the literature.
The response options are not always expressed in terms of degree of
agreement or disagreement. Other appropriate terms may be used in place of
agreement. For instance, other expressions (adjectival labels) or descriptions
such as those indicating degree of importance or adequacy could be used.
The use and interpretation of the ‘undecided’ or ‘neutral’ response category
in the Likert-type scale has become quite controversial. The main issues here
which border on weighing and position of the ‘undecided’ response category
are examined below with proposal on how these could be resolved.
Table 11. Suggested Data Analysis Procedures for Likert-Type and Likert Scale
Data
160
Likert Type Data Likert Scale Data
Example
A public opinion poll surveyed a simple random sample of
1000 voters. Respondents were classified by gender (male
or female) and by voting preference (Republican,
Democrat, or Independent). Results are shown in
the contingency table below.
162
Voting Preferences
Row total
Republican Democrat Independent
164
When to Use the Chi-Square Goodness of Fit Test
The chi-square goodness of fit test is appropriate when the
following conditions are met:
The sampling method is simple random sampling.
The variable under study is categorical.
The expected value of the number of sample observations in
each level of the variable is at least 5.
For a chi-square goodness of fit test, the hypotheses take the
following form:
165
sample size times the hypothesized proportion from the null
hypothesis
Ei = npi
where Ei is the expected frequency count for the ith level of
the categorical variable, n is the total sample size, and p i is
the hypothesized proportion of observations in level i.
Test statistic. The test statistic is a chi-square random
variable (χ2) defined by the following equation.
χ2= Σ [ (Oi - Ei)2 / Ei ]
where Oi is the observed frequency count for the ith level of
the categorical variable, and Ei is the expected frequency
count for the ith level of the categorical variable.
P-value. The P-value is the probability of observing a sample
statistic as extreme as the test statistic. Since the test statistic
is a chi-square, use the Chi-Square Distribution Calculator to
assess the probability associated with the test statistic. Use
the degrees of freedom computed above.
Interpret Results
If the sample findings are unlikely, given the null hypothesis, the
researcher rejects the null hypothesis. Typically, this involves
comparing the P-value to the significance level, and reject the null
hypothesis when the P-value is less than the significance level.
Notice, if p>0.05 level of significance we accept null hypothesis
but if p<0.05 we reject the null hypothesis.
Test Your Understanding
166
Example
Heinemann Company distributes books. The company policy is
that 30% of the books are English texts, 60% science, and 10% are
social sciences.
Suppose a random sample of 100 books has 50 English texts, 45
sciences, and 5 social sciences. Is this consistent with Heinemann’s
policy? Use a 0.05 level of significance.
The solution to this problem takes four steps: (1) state the
hypotheses, (2) formulate an analysis plan, (3) analyze sample
data, and (4) interpret results. We work through those steps below:
(Ei) = n * pi
(E1) = 100 * 0.30 = 30
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10
χ2 = Σ [ (Oi - Ei)2 / Ei ]
χ = [ (50 - 30) / 30 ] + [ (45 - 60) / 60 ] + [ (5 - 10)2 / 10 ]
2 2 2
χ2 = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58
where DF is the degrees of freedom, k is the number of levels of the categorical
variable, n is the number of observations in the sample, Ei is
the expected frequency count for level i, Oi is the observed
frequency count for level i, and χ2 is the chi-square test
statistic.
The P-value is the probability that a chi-square statistic
having 2 degrees of freedom is more extreme than 19.58.
167
We use the Chi-Square Distribution Calculator to find P(χ2 >
19.58) = 0.0001.
Interpret results. Since the P-value (0.0001) is less than the significance
level (0.05), we cannot accept the null hypothesis. Notice, if p>0.05 level of
significance we do not reject null hypothesis but if p<0.05 we reject the null
hypothesis.
These rigor may not be necessary if you use SPSS for analysis. However, the
interpretation is the same.
a.Discussion of findings
In discussion of findings it is necessary to note the following:
1. It is not necessary to represent results afresh. Simply state
the finding and move on to discussion. People are not
comfortable to see figures while discussing your results.
Figures stop at the level of result presentation.
2. It is appropriate that only empirical studies are used to
discuss empirical findings.
3. If you want people to cite your work and ideas, it is
appropriate to adduce reason/s for your findings while
170
discussing. Maturity is displayed when convincing reason/s
are given for a finding. It is necessary.
4. Until you have linked your finding/s to previous studies and
or suggested and explained reason for the finding,
discussion has not taken place but mere presentation of
result.
b.Conclusion
Conclusions are categorical statements on your findings.
Simply put it is your position in the paper especially if it is a
non empirical paper. By implication it is based on both the
finding and convincing discussion done to enable you take a
stand. For instance a finding in a study is “there is no
significant difference in mean achievement between male and
female students taught basic science using activity method”.
The conclusion from this finding could be “gender is not a
factor in students’ achievement in basic science when activity
method is used in teaching”. Put differently, “activity method
is a good strategy that could be used to eliminate gender
differences in students’ achievement in basic science”. The
conclusion from here is not “male and female students do not
differ significantly in basic science achievement” as often
presented by many.
c. Recommendations
Recommendations must be an offshoot of a finding in a study.
Do not recommend on what you did not find and do not extend
the coverage beyond your geographic and content scope
especially for non experimental studies. Even for experimental
studies (quasi inclusive), a study on methods using concept
mapping cannot be generalized to include discussion method or
a study on primary school pupils cannot be generalized to
include university students. From a finding on attitude of
students towards science in Gwer West LGA, you cannot freely
recommend what is to be done in the entire Benue State.
171
Though debatable there is a school of thought that says there is
an exception to this in Experimental studies that are not
culture biased, that is, one can generalize the recommendation
beyond the immediate scope. This also is linked to why we
accept small sample size for such studies. For instance, what is
the sample size and how many countries/continents did Pavlov
conduct his experiment on salivating dog before the
generalized conclusion given as a theory on operant
conditioning? Consider this finding “there is no significant
difference in mean achievement between male and female
students taught basic science using activity method” as
example. A recommendation could be “science teachers in
Gwer West LGA are encouraged to use activity method in
teaching in coeducational secondary schools since it is gender
friendly”.
d.Limitation of study
Limitation is different from delimitation or scope. Limitation is
not limits or set back experienced in your study due to finance,
time, distance, interest, inconvenience, etc. which you had
control over, and could be avoided or you were not compelled
to do. However, limitations are those things which at the start
of study were not envisaged but came up suddenly and beyond
your control. These may include attrition rate of subjects in the
study due to sudden Fulani invasion in Gwer West LGA; Strike
action embarked upon by teachers which affected your original
plan and time lines; use of small sample size in a survey which
of course is the entire population and you cannot get more as
described in the required sample characteristics; use of quasi
experimental design in a study that true experimental design
could have been better but because it is a school setting you
cannot use it, etc. In the last two examples the researcher was
aware of the limitations from the start of the study but went for
a better option and such cases steps are taken to forestall their
172
effects like use of ANCOVA for data analysis, use of entire
population as sample, etc.
e. Contribution to knowledge
It is better to first identify where one could derive contribution
to knowledge from in a study. Usually it is from the statement
of problem and summary of literature review. Though the
significance of the study may imply that but is more of utility
than novelty and therefore not often used to determine
contribution to knowledge. Note that for any study you cannot
locate those it will be significant too, the essence of that study
stands questionable and the same applies to contribution to
knowledge. The following could guide if a study has
contribution to knowledge:
1.What is new in it? What is the need gap that your findings
will close? Is the study the first to address such issue such
that your finding is entirely new and novel? Such studies
may lack relevant empirical literature for review but it is
allowed for novelty sake.
2.Is it the first in the study area/location though studied in other
location? This kind of study could be a replication or
modified form but not significantly different from previous
studies.
3.Are you doing it because of time lag, generation gap or
noticeable changes after many years(say 10 years and
above)?. This of course must be stressed in background and
statement of problem.
4.Are you introducing new variables into a previous study?
This could be dependent, independent or moderator variable.
This gives a fairly new direction to the study in purpose,
scope and findings.
173
5.Is it an action research? In general, a research targeted at
finding way of solving an identified/existing
problem/challenge on ground within the shortest possible
time is action research. Here it does not matter if such study
was carried out previously in other areas or same area but it
is determined by what issue is currently on ground which
has not been addressed. Remember that if previous studies
have adequately addressed it there may be no need for
another. Any research that is directed by the teacher in the
classroom to address his/her challenge is also called action
research. Notice that there may be ethical issues to consider
with action research, such as, "Is it fair for one group to get a
type of instruction that may be more effective than another?"
This aspect is seriously abused by the way people use lecture
method in control group even when they know it is not a
comparable method to that used in experimental class and
will place the control group at a disadvantage.
6.In general it is not all findings in a study that may form part
of contribution to knowledge rather it is those that are novel,
striking, fill observed gap in literature, or raise surprises.
174
1. Effect is often used in cause and effect studies, expost
facto and causal comparative studies or
experimental/quasi design studies eg, Effect of cognitive
reasoning ability and prior exposure to content on Upper
Basic two students’ achievement/performance in Basic
Science.
Conclusion
What matters most in research is central in this paper. They include
choice of appropriate statistic and data interpretation. It also
provided explanation to some confusing terminologies including
those whose use in some contexts are subject of debate. The fact
remains that research is dynamic and knowledge about research
must be constantly changing to remain relevant. Though this
chapter is not all inclusive, there is much to be consulted once you
know what you are looking for.
176
References/Bibliography
Achor, E. E. & Ejigbo, M. A. (2006). A guide to writing research
report. Lagos-Nigeria: Sam Artrade
Pallant, J. (2001). A step by step guide to data analysis using SPSS for
Windows (Versions 10 and 11). Australia: Allen & Unwin.
177