Documente Academic
Documente Profesional
Documente Cultură
LIST OF CONTRIBUTORS
vii
REVIEWER ACKNOWLEDGMENTS
ix
EDITORS COMMENTS
xi
xiii
27
53
79
vi
95
121
145
173
189
LIST OF CONTRIBUTORS
Sunita S. Ahlawat
Philip R. Beaulieu
Chris H. Durden
Dann G. Fisher
Timothy J. Fogarty
Alan H. Friedberg
Joanne P. Healy
Theresa Libby
Karen S. McKenzie
Marybeth M. Murphy
Jeffrey J. Quirin
John T. Reisch
Andrew J. Rosman
viii
John T. Sweeney
Brad Tuttle
Mark J. Ullrich
(Deceased)
Brett R. Wilkinson
Katherine J. Wilkinson
REVIEWER ACKNOWLEDGMENTS
The Editor and Associate Editors at AABR would like to thank the many excellent
reviewers who have volunteered their time and expertise to make this an outstanding publication. Publishing quality papers in a timely manner would not be possible
without their efforts.
Elizabeth Dreike Almer
Portland State University, USA
Roger Debreceny
Nanyang Technological University,
Singapore
John C. Anderson
San Diego State University, USA
William N. Dilla
Iowa State University, USA
Philip R. Beaulieu
University of Calgary, Canada
Alan S. Dunk
University of Tasmania, Australia
Jean Bedard
Northeastern University, USA
Jennifer D. Goodwin
University of Queensland, Australia
James Bierstaker
University of Massachusetts, Boston,
USA
Glen Gray
California State University,
Northridge, USA
Dennis M. Bline
Bryant College, USA
Heather Hermanson
Kennesaw State University, USA
Robert H. Chenhall
Monash University, Australia
Freddie Choo
San Francisco State University, USA
Karen L. Hooks
Florida Atlantic University, USA
Christie L. Comunale
Long Island University C.W. Post
Campus, USA
James E. Hunton
Bentley College, USA
Charles Cullinan
Bryant College, USA
Mike Kirschenheiter
Columbia University, USA
Elizabeth Davis
Baylor University, USA
Stacy Kovar
Kansas State University, USA
ix
Kip R. Krumwiede
Brigham Young University, USA
Robert J. Parker
University of South Florida, USA
Theresa Libby
Wilfrid Laurier University, Canada
Will Quilliam
University of South Florida, USA
Daryl Lindsay
University of Saskatchewan, Canada
John Reisch
East Carolina University, USA
Timothy J. Louwers
Louisiana State University, USA
Michael Roberts
University of Alabama, USA
Nace Magner
Western Kentucky University, USA
Andrew J. Rosman
University of Connecticut, USA
James Maroney
Northeastern University, USA
Steve G. Sutton
University of Connecticut, USA and
University of Melbourne, Australia
Lokman Mia
Griffith University Gold Coast,
Australia
Linda Thorne
York University, Canada
Venky Nagar
University of Michigan, USA
Sandra Vera-Munoz
University of Notre Dame, USA
Marcus Odom
Southern Illinois University, USA
Sally A. Webber
Northern Illinois University, USA
Ed ODonnell
Arizona State University, USA
Kristin Wentzel
La Salle University, USA
William R. Pasewark
Texas Tech University, USA
Patrick Wheeler
University of Missouri, USA
Laurie Pant
Suffolk University, USA
Stephen W. Wheeler
University of the Pacific, USA
EDITORS COMMENTS
Welcome to Volume 6 of Advances in Accounting Behavioral Research. This
issue contains an eclectic collection of behavioral research papers that examine
several very important issues. Several of the papers focus on various aspects
of auditors decisions such as professional commitment in public accounting
firms, mitigating bias via group decision making, and appropriately using sample
information to estimate errors in governmental auditing. The decisions of other
professionals that use accounting information such as commercial lenders and
divisional managers are also examined. Two papers examine how accounting
information impacts the behaviors of individuals within an organization under
various incentive structures. Two other papers provide perspectives on overall
research with one developing a classification scheme for new assurance services
and the other examining factors that impact research productivity of accounting
faculty members. Overall, this is a very enlightening group of papers that provide
insight into the behaviors of various users of accounting information.
Vicky Arnold
Editor
xi
MANUSCRIPT SUBMISSION
Manuscripts should be forwarded to the editor, Vicky Arnold, at Vicky.
Arnold@business.uconn.edu via e-mail. All text, tables, and figures should be incorporated into a word document prior to submission. The manuscript should also
include a title page containing the name and address of all authors and a concise
abstract. Also, include a separate word document with any experimental materials
or survey instruments. If you are unable to submit electronically, please forward
the manuscript along with the experimental materials to the following address:
Vicky Arnold, Editor
Advances in Accounting Behavioral Research
Department of Accounting U41A
School of Business
University of Connecticut
Storrs, CT 06269-2041, USA
xiii
xiv
References should follow the APA (American Psychological Association) standard. References should be indicated by giving (in parentheses) the authors name
followed by the date of the journal or book; or with the date in parentheses, as in
suggested by Earley (2000).
In the text, use the form Rosman et al. (1995) where there are more than two
authors, but list all authors in the references. Quotations of more than one line
of text from cited works should be indented and citation should include the page
number of the quotation; e.g. (Dunbar, 2001, p. 56).
Citations for all articles referenced in the text of the manuscript should be shown
in alphabetical order in the reference list at the end of the manuscript. Only articles
referenced in the text should be included in the reference list. Format for references
is as follows:
For Journals
Dunn, C. L., & Gerard, G. J. (2001). Auditor efficiency and effectiveness with
diagrammatic and linguistic conceptual model representations. International
Journal of Accounting Information Systems, 2(3), 140.
For Books
Ashton, R. H., & Ashton, A. H. (1995). Judgment and decision-making research
in accounting and auditing. New York, NY: Cambridge University Press.
For a Thesis
Smedley, G. A. (2001). The effects of optimization on cognitive skill acquisition
from intelligent decision aids. Unpublished doctoral dissertation, University.
xv
INTRODUCTION
The accounting scandals that have marked the dawn of the 21st century, such
as Enron, MCI, and Global Crossing, have damaged the credibility of the audit
report and the reputation of the public accounting industry. Perhaps more than
ever, commitment to the ideals and standards of the auditing profession is vital
Advances in Accounting Behavioral Research
Advances in Accounting Behavioral Research, Volume 6, 325
Copyright 2003 by Elsevier Ltd.
All rights of reproduction in any form reserved
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06001-0
Political Ideology
Socialization encourages persons to become similar to their profession, not only
as it is embodied by other organizational members, but also as it is defined by
the professions espoused ideals (Fogerty, 1992, p. 139). This description of
the socialization process implies the existence of a prototypic public accountant
embodying desirable characteristics, values and attitudes. The more effective the
socialization processes, the greater the correspondence between the prototype
and the professional member. Some values and attitudes (i.e. commitment,
identification) may be more readily influenced and inculcated by the socialization process than others (i.e. religious preferences). It is also possible that
some prototypic characteristics are not amendable by socialization (i.e. gender,
race).
A particularly appropriate theory for examining the influence of prototypes
on socialization processes in the auditing profession is self-categorization theory
(SCT) (Chatman et al., 1998; Hogg & Terry, 2000; Tajfel & Turner, 1985).2 SCT
focuses on the process whereby individuals define their self-concept in relation
to their membership in social groups. Prototype-based comparisons, whereby
social categorization of the individual into favorable in-group or unfavorable
out-group membership occurs, lies at the heart of SCT processes (Hogg &
Terry, 2000, p. 122). Prototypes are cognitive representations of the defining
and stereotypical features of in-groups, embodying exemplary or ideal types
and capturing characteristics that differentiate them from other groups. These
characteristics include demographic attributes, behaviors, attitudes and values.
Critical to the notion of prototypes is that they accentuate similarities within and
differences between groups (Hogg & Terry, 2000). For example, because the prototypical partner in public accounting is male, an in-group characteristic may be
masculinity and an out-group characteristic femininity (Maupin, 1993; Maupin &
Lehman, 1994).3
Prototype-based self-categorization is relevant for modeling professional
commitment as a socialization process directed towards cultivating professional
values (Jeffery & Weatherholt, 1996; Larson, 1977) for several reasons. First, ingroup members, reflecting prototypic characteristics, are more likely to cooperate
with each other and to compete with out-group members (Chatman et al., 1998).
Second, in-group members are likely to receive favorable treatment compared to
out-group members (Ashforth & Mael, 1989). This favoritism may be reflected
in work assignments, performance evaluations, receipt of voluntary mentoring, or
through informal signals of preference relative to out-group members. As a result,
in-group members are likely to maintain more favorable attitudes towards their
profession and be more readily socialized than out-group members. Third, SCT
implies that a prototypically homogeneous audit profession is likely to develop,
Control Paths
Prior research has documented that partners in public accounting are typically
male (Hooks & Cheramy, 1994; Hull & Umansky, 1997) and, on average,
have developed to the conventional level of moral reasoning (Sweeney, 1995).
Researchers have suggested that masculinity (Maupin, 1993; Maupin & Lehman,
1994) and conventional moral reasoning (Ponemon, 1992) represent prototypes
in public accounting. Since the influence of both gender and moral reasoning on
professional commitment has been examined in prior research, these variables
are included as control paths in the model of professional commitment.
Although the literature suggests that gender barriers in public accounting may
preclude women from attaining the same level of commitment to the profession as
men (Maupin, 1993; Maupin & Lehman, 1994), the results of empirical research
have been equivocal. Gaffney et al. (1993) found that family obligations increased
the professional commitment of men in public accounting but had no effect on
womens professional commitment. Street et al. (1993), after controlling for
positional level, did not find a difference in professional commitment between
female and male public accountants.
Covaleski et al. (1998) contend that although women may have broken the
glass ceiling to attaining partnership in Big 6 firms, there is still a paucity of
high-level female partners. Women who are unable or unwilling to adapt masculine characteristics required by the male-dominated culture of public accounting
may encounter obstacles in making partner (Maupin & Lehman, 1994). Given
the predominance of the male partners and the difficulties that woman may
encounter in adopting in-group male qualities, women in public accounting may
represent an out-group and have correspondingly less professional commitment
than men.
H3. Male auditors will have greater professional commitment than will female
auditors.
Ethics researchers in accounting have consistently found that the ethical development of auditors, as measured by the P score of the Defining Issues Test
(DIT) (Rest, 1986, 1993), most commonly reflected conventional reasoning
and was inversely related to positional level (Lampe & Finn, 1992; Ponemon
& Gabhart, 1993; Shaub, 1994). This result seemingly contradicts Kohlbergs
(1969) moral development theory, which holds that development is sequential
and progressive but not regressive. Ponemon (1992) contended that the inverse
relationship between P scores and rank in public accounting organizations was
the result of a selection-socialization process whereby firms prefer to hire and
then promote individuals with a shared set of ethical values and beliefs. He found
10
11
12
METHOD
Sample
Prior to collecting data, management representatives from offices of multiple
public accounting firms agreed to participate in the study and to provide
auditor subjects. Three international firms (Big 5: large), two national firms
(medium), and six local or regional firms (small) participated in the study. The
appropriate office representative indicated the approximate number of available
auditor subjects. The office representative was then provided with the required
number of research instruments to distribute to the participants. Each research
instrument consisted of a questionnaire, the six-story DIT and instructions
enclosed in a stamped, return envelope addressed to the researchers. Participation
13
Firm Size
Totals
Small
Medium
Large
Staff
Senior
Supervisor
Manager
Partner
23
15
11
10
29
22
10
8
14
9
55
63
19
39
22
100
88
38
63
60
Totals
88
63
198
349
Males:
Females:
Mean:
S.D.:
Range:
230
119
Liberals:
Conservatives:
63
286
Average age:
Average experience:
Professional
Commitment
P Score
75.51
11.73
41103
42.14
12.53
8.373.3
Measures
Professional commitment (PC) was measured with the 15-item scale adapted by
Aranya et al. (1981) from the Porter et al. (1974) organizational commitment
questionnaire. This scale has been utilized extensively by accounting researchers
to measure professional commitment (Aranya et al., 1982; Gaffney et al., 1993;
Harrell et al., 1986; Jeffery & Weatherholt, 1996; Street et al., 1993). Researchers
have indicated that the scale has good internal consistency, with Cronbachs
14
alpha reported in the high 0.80s (Aranya et al., 1981; Aranya & Ferris, 1984;
Bline et al., 1991).
Bline et al. (1991), in an extensive examination of the psychometric properties
of the professional commitment questionnaire, report that the scale measures a
construct distinct from organizational commitment. Their tests indicated that the
professional commitment scale has adequate reliability and validity. Furthermore,
the professional commitment construct correlated positively with job satisfaction
and negatively with intent to leave the profession. Other accounting researchers
have reported negative correlations between the professional commitment scale
and organizational-professional conflict (Aranya et al., 1981; Harrell et al., 1986)
and positive correlations with favorable work attitudes in public accounting
(Aranya et al., 1982).6
Ethical development was measured by the sample respondents P score
from the 6-story DIT (Rest, 1979, 1986, 1993). The P score is a continuous
measure, ranging from 0 to 95, reflecting the relative importance a subject gives
to principled moral reasoning in resolving moral dilemmas (Rest et al., 1997,
p. 498). Rest (1993) reports an average P score of 45 for college graduates,
although accounting researchers have generally found that public accountants
score lower than adults from the general population at similar educational levels
(Ponemon, 1992; Sweeney, 1995). Rest (1986, pp. 176179) contends that the P
score correlates most strongly with educational level but only weakly with gender,
intelligence and ethnic background. Gender, however, appears to have a stronger
influence on accountants P scores than it does in the general population, with
females attaining significantly higher scores (Bernardi & Arnold, 1997; Enyon
et al., 1997; Shaub, 1994; Sweeney, 1995).
The DIT has been subjected to extensive reliability and validity tests with
generally good results (Rest, 1979, 1986; Rest et al., 1999). Some researchers
(Emler et al., 1983), however, contend that the DIT contains a political bias. In
studies with accounting subjects, Sweeney and Fisher (1998, 1999) found that the
DIT contained an imbedded political content that tended to overstate the scores
of political liberals and to understate the scores of political conservatives. They
suggest that researchers utilizing the DIT control for subjects political ideology in
order to more clearly interpret the relationship between P scores and the variable
of interest.
Subjects indicated their political ideology in response to the following question: Regarding important social and political issues, would you classify your
opinion or perspective as primarily conservative or liberal? Forcing subjects to
identify their positions as primarily liberal or conservative is consistent with prior
research (Sweeney, 1995) and eliminates the ambiguity of a political moderate
classification.
15
EMPIRICAL RESULTS
Correlations
Table 2 presents correlation coefficients for professional commitment and
variables of interest. Subjects professional commitment is negatively associated
with the size of their respective firm and positively associated with their positional
level. Political ideology and gender are associated with professional commitment
and DIT P scores. Political ideology is not correlated with gender, position, or
firm size. The significant association between gender and position results from the
underrepresentation of female auditors at the higher ranks. The association between
firm size and position is an apparent artifact of the non-random sample selection
process.
(1)
(2)
(3)
(4)
(5)
(6)
Professional
Commitment (1)
Firm
Size (2)
Political
Ideology (3)
1.000
0.246**
0.132**
0.017
0.234**
0.116*
1.000
0.087
0.080
0.146**
0.054
1.000
0.194**
0.046
0.055
N = 349.
p < 0.05 (one tailed significance).
p < 0.01 (one tailed significance).
P Score (4)
Position (5)
Gender (6)
1.000
0.105*
0.205**
1.000
0.353**
1.000
16
be problematic since several of the commonly used fit indices are sample size
dependent. For this reason, multiple measures of overall model fit are reported in
this study.
The Normed Fit Index (NFI) (Bentler & Bonett, 1980) has an index range
from 0 to 1, with values over 0.9 indicating a good fit. This index may be viewed
as the percentage of observed-measure covariation explained by a given model.
The disadvantage of the NFI is that it can underestimate goodness-of-fit in small
samples. Bentlers (1990) revised Normed Comparative Fit Index (CFI) is based
upon the Bentler and Bonett (1980) NFI but with a correction for sample-size
dependency. CFI values always lie between 0 and 1, with values over 0.9 indicating
a relatively good fit (Bentler, 1990). Finally, the Adjusted Goodness of Fit Index
17
Independent
Variable
PC
Firm size
Political ideology
Gender
P Score
Position
Gender
Gender
Political ideology
Position
P Score
Associated
Hypothesis
H1
H2
H3
H4
H5
Path
Coefficient
t-Value
p-Value
0.236
0.154
0.042
0.059
0.184
0.353
0.195
0.183
4.65
2.98
0.76
1.13
3.43
7.04
3.78
3.55
0.001
0.002
0.224
0.132
0.001
0.001
0.001
0.001
(AGFI), devised by Joreskog and Sorbom (1984), is an additional fit index that
ranges from 0 to 1, with values above 0.9 indicating acceptable fit. Specifically,
in addition to the traditional Goodness of Fit Index (GFI), the Adjusted Goodness
of Fit Index (AGFI), the Normed Fit Index (NFI), and the Comparative Fit Index
(CFI) are reported in this study. This lends some assurance that the measures of
fit produced are not spurious.
Figurative depictions of the results of the structural equation analysis are
presented in Fig. 2. With GFI, AGFI, NFI, and CFI values exceeding 0.9 in
all instances, the theoretical model appears to provide a very good fit with the
dataset.
Tabular results of the structural equation analysis including a listing of each
hypothesis and its corresponding path coefficient are presented in Table 3.
Consistent with the relatively high model fit indices, results in Table 3 indicate
that an overwhelming majority of the associations hypothesized in the current
study and suggested by prior literature were significant, providing further support
for the proposed theoretical model of professional commitment.
Tests of Hypotheses
Hypothesis 1 predicts a negative relationship between firm size and professional
commitment. The path coefficient for this theoretical link is 0.236 and is
significant at the p < 0.001 level. Thus, smaller firms tend to have employees
who possess higher levels of professional commitment.
Hypothesis 2 predicts that conservative auditors will demonstrate higher
professional commitment than liberal auditors. For the full sample, a one
18
tailed t-test indicated that the professional commitment of politically conservative auditors was higher than that of liberal auditors (76.2 vs. 72.2;
p < 0.017). The path coefficient for this theoretical link is 0.154 and is
significant at the p < 0.002 level. This result provides support for H2 and
implies that political ideology is an influential socialization variable in public
accounting.
Male auditors are predicted in H3 to have higher professional commitment
than female auditors. For the full sample, males reported higher commitment
than females (76.5 vs. 73.6; p < 0.016) but the association between positional
level and gender must be considered before drawing any conclusions regarding
the gender-professional commitment relationship. The control path between
gender and position has a coefficient 0.353 and is significant at the p < 0.001
level, implying that male auditors in the sample are more likely to inhabit
higher level positions. After controlling for the influence of positional level,
the path coefficient linking gender and professional commitment is 0.042 and
insignificant. This result suggests that gender does not play a direct role in the
development of an auditors professional commitment.
Hypothesis 4 predicts that there is a positive relationship between ethical
development, as measured by DIT P scores, and professional commitment.
In order to unambiguously interpret this path, the associations between P
score and gender and P score and political ideology must be considered. The
corresponding coefficient for the path between gender and P score is 0.195
and significant ( p < 0.001). This result suggests that female auditors attain
higher P scores than their male counterparts. Additionally, the path coefficient
linking political ideology with P score is 0.183 and also significant ( p < 0.001),
suggesting that politically liberal auditors attain higher P scores than politically
conservative auditors. After controlling for the influence of gender and political
ideology, the path coefficient between P score and professional commitment
is 0.059 and insignificant. H4 is therefore rejected, as an auditors ethical
development does not appear to directly influence his or her professional
commitment.
Hypothesis 5 predicts that there is a positive relationship between position and
professional commitment. The path coefficient linking these two constructs is
0.184 and is significant at the p < 0.001 level. This result provides support for H5
and suggests that auditors employed at higher levels within their respective firms
exhibit higher levels of professional commitment, although the relationship is not
necessarily linear. Furthermore, it is not clear from the analysis whether auditors
with higher levels of professional commitment are more likely to be promoted, or
whether auditors develop higher professional commitment as they advance within
the profession.
19
Additional Analysis
Table 4 examines the influence of the significant main effects on auditors
professional commitment, partitioned by firm size, position and political ideology.
Professional commitment scores are highest in the regional firms and, as expected,
at the partner level. Senior auditors in regional and national firms also demonstrate
relatively high commitment. The influence of political ideology is evident, as
Mean PC
Small
88
80.02
Regional
63
76.44
Position
Mean PC
Staff
23
75.57
Senior
15
80.20
Supervisor
11
77.91
Manager
10
76.90
Partner
29
85.34
Staff
22
76.64
Senior
10
80.3
72.75
14
72.93
80.44
Staff
55
73.47
Senior
63
69.08
Supervisor
19
73.89
Manager
39
72.23
Partner
22
85.55
Supervisor
Manager
Partner
Big 6
198
73.21
Political Ideology
Mean PC
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
17
6
11
4
9
2
8
2
21
8
77.06
71.33
81.27
77.25
80.11
68.00
77.13
76.00
86.38
82.63
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
16
6
9
1
8
0
13
1
8
1
78.31
72.17
78.89
93.00
72.75
71.92
86.00
79.25
90.00
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
Conservative
Liberal
48
7
48
15
17
2
32
7
21
1
74.13
69.00
70.04
66.00
75.35
61.50
73.31
67.29
85.10
95.00
20
conservative auditors have higher commitment than liberal auditors for every cell
containing at least two liberal auditors.
An objective of socialization is to insure that management promotes those
individuals who reflect the culture and values of the organization (Fogerty, 1992;
Kanter, 1977; Ponemon, 1992). If conservative ideology is a strongly held value
in the culture of public accounting, then politically conservative auditors should
perceive greater opportunities for advancement than politically liberal auditors.
To provide further evidence of the socializing influence of political ideology
in public accounting, subjects who were not partners were asked to respond
to the following question: Please indicate what you believe are your chances
(likelihood) of making partner in your present firm. The Likert response scale for
the question ranged from 1 (very low) to 7 (very high). Conservative auditors, on
average, perceived their opportunities for advancement to partner as significantly
greater than liberal auditors (3.68 vs. 2.96; p < 0.0003).
21
22
NOTES
1. Former Securities and Exchange (SEC) Commissioner Arthur Levitt questioned
whether the expansion into more lucrative services compromises the traditional audit
function (Covaleski, 1999). Suggesting that the audit has merely become a conduit for
selling other services, Levitt contends that auditors may not be sufficiently committed to
societal expectations and professional standards.
2. SCT is an extension of social identity theory (SIT) (Ashforth & Mael, 1989; Brown,
2000; Tajfel & Turner, 1985). SIT maintains that ones social identity is derived primarily
from group membership, that people strive to maintain a positive identity, and that this
positive identity largely results from favorable comparisons between relevant in-groups
and out-groups (Ashforth & Mael, 1989).
3. Fogerty (2000, p. 13) described the socializing influence of prototypes in public
accounting firms when he stated: Experienced organizational members selectively provide
reinforcement, communicate the approved range for action, and serve as examples of
achievement.
4. An individualist orientation supports the notion of capitalism in viewing people as
independent economic actors, as opposed to a collectivist orientation that is more aligned
with a socialist perspective (Burns, 1992, p. 352).
5. After controlling for political ideology and gender, Sweeney (1995) did not find a
significant relationship between rank and DIT P scores. Therefore, we do not control for
the influence of rank on ethical development.
6. Dwyer et al. (2000) examined the dimensionality of the Aranya et al. (1981) professional commitment scale with a broad sample of practicing accountants and concluded
that the 15-item scale could be parsimoniously reduced to a five-item measure. In light of
this research, we performed a principal components, orthogonal rotation factor analysis of
the instrument. Results of the factor analysis indicated that 14 of the 15 items possessed
loadings of 0.40 or greater on a single factor. Item 7 of the instrument, which possessed a
loading of 0.15, was the lone item not contributing to the factor. The resulting eigenvalue
for the 14-item factor was 5.49. The Cronbach alpha for the 15-item measure was 0.88.
Supplemental analyses utilizing the reduced 5-item scale from Dwyer et al. (2000) were also
performed and the results were essentially identical to those incorporating the full scale.
ACKNOWLEDGMENTS
We gratefully acknowledge the helpful comments of the participants in 2001
Annual Meeting of the Accounting, Behavior & Organizations Section, the 2002
Critical Perspectives in Accounting Conference, and the accounting research workshops at the Australian National University and at Washington State University.
REFERENCES
Adler, A., & Aranya, A. N. (1984). Comparison of the work needs, attitudes and preferences of professional accountants at different career stages. Journal of Vocational Behavior (August), 4557.
23
24
Fogerty, T. J. (2000). Socialization and organizational outcomes in large public accounting firms.
Journal of Managerial Issues, 12(Spring), 1233.
Gaffney, M. A., McEwen, R. A., & Welsh, M. J. (1993). Gender effects on commitment of public
accountants: A test of competing sociological models. Advances in Public Interest Accounting,
5, 4573.
Goetz, J. F., Morrow, P. C., & McElroy, J. C. (1991). The effect of accounting firm size and member
rank on professionalism. Accounting, Organizations and Society, 16, 159165.
Harrell, A., Chewning, E., & Taylor, M. (1986). Organizational-professional conflict and the job
satisfaction and the turnover intentions of internal auditors. Auditing: A Journal of Practice
and Theory, 5(Spring), 109121.
Hogg, M. A., & Terry, D. J. (2000). Social identity and self-categorization processes in organizational
contexts. Academy of Management Review, 25(1), 121140.
Hooks, K. L., & Cheramy, S. J. (1994). Facts and myths about women CPAs. Journal of Accountancy,
178(October), 7986.
Hull, R. P., & Umansky, P. H. (1997). An examination of gender stereotyping as an explanation for
vertical job segregation in public accounting. Accounting, Organizations and Society, 22(6),
507528.
Jeffery, C., & Weatherholt, N. (1996). Ethical development, professional commitment, and rule observance attitudes: A study of CPAs and corporate accountants. Behavioral Research in Accounting,
8, 831.
Joreskog, K., & Sorbom, D. (1984). LISREL VI users guide (4th ed.). Mooresville, IN: Scientific
Software.
Kanter, R. (1977). Men and women of the corporation. New York: Basic Books.
Kohlberg, L. (1969). Stage and sequence: The cognitive developmental approach to socialization. In:
D. A. Goslin (Ed.), Handbook of Socialization Theory and Research (pp. 347480). Chicago:
Rand McNally.
Lampe, J., & Finn, D. (1992). A model of auditors ethical decision process. Auditing: A Journal of
Practice & Theory (Suppl.), 121.
Larson, M. S. (1977). Rise of professionalism: A sociological analysis. Berkley: University of California
Press.
Maupin, R. J. (1993). How can womens lack of upward mobility in accounting organizations be
explained? Group and Organization Management, 18(June), 132152.
Maupin, R. J., & Lehman, C. R. (1994). Talking heads: Stereotypes, status, sex-roles and satisfaction
of female and male auditors. Accounting, Organizations and Society, 19, 427437.
Norris, D. R., & Niebuhr, R. E. (1983). Professionalism, organizational commitment and job satisfaction
in an accounting organization. Accounting, Organizations and Society, 9, 4959.
Ponemon, L. A. (1992). Ethical reasoning and selection-socialization in accounting. Accounting,
Organizations and Society, 17, 239258.
Ponemon, L. A., & Gabhart, D. (1993). Ethical reasoning in accounting and auditing. Vancouver,
Canada: Canadian General Accountants Research Foundation.
Porter, L. W., Steers, R. M., Mowday, R. T., & Boulian, P. V. (1974). Organizational commitment,
job satisfaction, and turnover among psychiatric technicials. Journal of Applied Psychology,
59(October), 603609.
Pratt, J., & Beaulieu, P. (1992). Organizational culture in public accounting: Size, technology, rank,
and functional area. Accounting, Organizations and Society, 17, 667684.
Rest, J. R. (1979). Development in judging moral issues. Minneapolis, MN: University of Minnesota
Press.
25
Rest, J. R. (1986). Moral development: Advances in research and theory. New York: Prager Press.
Rest, J. R. (1993). Guide for the dening issues test. Version 1.3. Minneapolis, MN: University of
Minnesota.
Rest, J., Narvaez, D., Bebeau, M. J., & Thoma, S. J. (1999). Postconventional moral thinking: A
neo-kohlbergian approach. New Jersey: Lawrence Erlbaum Associates.
Rest, J., Thoma, S. J., & Edwards, L. (1997). Designing and validating a measure of moral judgment:
Stage preferences and stage consistency approaches. Journal of Educational Psychology, 89(1),
528.
Schroeder, R. G., & Imdieke, L. F. (1977). Local-cosmopolitan and bureaucratic perceptions in public
accounting firms. Accounting, Organizations and Society, 1, 3945.
Shaub, M. (1994). An analysis of factors affecting the cognitive moral development of auditors and
auditing students. Journal of Accounting Education, 12, 126.
Shaub, M., Finn, D., & Munter, P. (1993). The effects of auditors ethical orientation on commitment
and ethical sensitivity. Behavioral Research in Accounting, 5, 145169.
Siegal, P., Blank, M., & Rigsby, J. (1991). Socialization of the accounting professional: Evidence of the
effect of educational structure on subsequent auditor retention and advancement. Accounting,
Auditing and Accountability Journal, 4, 5870.
Sorenson, J. E. (1967). Professional and bureaucratic organization in the public accounting firm. The
Accounting Review, 42(July), 553565.
Sorenson, J. E., & Sorenson, T. C. (1974). The conflict of professionals in bureaucratic organizations.
Administrative Science Quarterly (March), 98106.
Street, D. L., Schroeder, R. G., & Schwartz, B. (1993). The central life interests and organizational
professional commitment of men and women employed by public accounting firms. Advances
in Public Interest Accounting, 5, 201229.
Sweeney, J. T. (1995). The moral expertise of auditors: An explanatory analysis. Research on Accounting Ethics, 1, 213234.
Sweeney, J. T., & Fisher, D. G. (1998). An examination of the validity of a new measure of moral
judgment. Behavioral Research in Accounting, 10, 138158.
Sweeney, J. T., & Fisher, D. G. (1999). Politics, faking, and self-presentation: How valid is the P score
of the Defining Issues Test? Research on Accounting Ethics, 5, 5175.
Tajfel, H., & Turner, J. C. (1985). The social identity theory of intergroup behavior. In: S. Worchel &
W. G. Austin (Eds), Psychology of Intergroup Relations (2nd ed., pp. 724). Chicago: NelsonHall.
Watts, R. L., & Zimmerman, J. L. (1986). Positive accounting theory. Englewood Cliffs, NJ: PrenticeHall.
Wheeler, R., Felsig, R. M., & Reilly, T. (1987). Large or small CPA firms: A practitioners perspective.
CPA Journal (April), 2933.
AN ANALYSIS OF GROUP
INFLUENCES ON GOING
CONCERN AUDITOR JUDGMENTS
Sunita S. Ahlawat and Timothy J. Fogarty
ABSTRACT
Studies that have indicated that the processing of audit evidence results in
judgment bias may be the result of the study of individual decision-making.
Building on work that suggests important differences between individual
and group decision-making, this paper evaluates decision-making attributes
of audit groups. Experienced auditors from ofces of Big-Five rms in the
U.S. served as the participants in an experiment involving the going concern
judgment. Results show that recency does affect the judgments of individual
auditors but disappears as an important effect when groups make judgments.
Group responses are less extreme and exhibit greater condence than those
of individuals.
INTRODUCTION
The descriptive theory of belief updating proposed by Hogarth and Einhorn (1992)
posits that the order in which evidence is received has a significant and predictable
influence on a persons final judgment. Most of the attention generated by this
discovery has focused around recency effects. Recency refers to the tendency to
place a greater weight on evidence received later in a sequence. Accordingly, an
over-reliance on information presented last may occur. A number of experimental
Advances in Accounting Behavioral Research
Advances in Accounting Behavioral Research, Volume 6, 2751
2003 Published by Elsevier Ltd.
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06002-2
27
28
studies utilizing various conditions suggest that significant recency effects exist in
accountants and auditors belief revisions (e.g. Asare, 1992; Ashton & Ashton,
1988; Dillard et al., 1991; Pei et al., 1992; Trotman & Wright, 1996; Tubbs
et al., 1990). However, recent research has questioned the prevalence of recency
in auditing. Cushing and Ahlawat (1996) suggested that such effects may not be
common in audit practice. Other studies also have produced evidence that recency
effects do not always occur, or occur only under certain circumstances (Kennedy,
1993; Messier & Tubbs, 1994; Trotman & Wright, 1996).
This paper builds on the growing recognition that contextual factors (e.g.
accountability, cognitive involvement, experience, and task realism) might
mitigate judgment bias in audit judgment. Another potential factor is group
influence. Many auditing situations involve either formal or informal group
consultation (Gibbins & Emby, 1985). For example, a team of audit staff and
seniors typically conduct audit fieldwork. The group expands as managers and
partners review this work prior to the issuance of an audit report. However, the
growing recognition that cognitive heuristics and biases in auditors judgments
can lead to different outcomes, including different types of audit reports (e.g.
Asare, 1992), has developed with little consideration of group influences.
This research investigates the potential for group processes to overcome
weaknesses in accountants judgment. In addition to the recency bias, this paper
also examines the related attributes of decision confidence and belief revision
that vary between audit groups and individual auditors. This research finds
fundamental differences between groups and individuals in their exposure to
recency effects, the nature of their belief revision processes, and their confidence in
decisions. Four subsequent sections are employed. The first develops the literatures surrounding group decision-making and judgment biases as a prelude
to stating the research hypotheses. The second describes the empirical study.
The last two sections present the results and discuss their implications and
limitations.
29
these more generic aspects, groups also were found to influence decision-making.
Although individuals come to the group with some degree of pre-discussion
preferences and unique decision-relevant information that continue to influence
group decisions (Winquist & Larson, 1998), the group resists reduction to the
sum of its members. Groups are believed to produce substantively different
decisions than individuals (Hill, 1982; Miner, 1984). The improved accuracy of
groups that has been reported in many areas may be attributable not only to the
increased perspectives contributed by members, but also to the heightened caution
as consensus processes tend to eschew extreme solutions (Myers & Lamm,
1976). Although the balance of evidence suggests net gains for group decisions
over those of individuals, a full explanation of their origin remains elusive. The
extent that groups may be effective at reducing the random error associated with
individual choice, may depend on the effectiveness with which feedback can be
incorporated. Group advantages may also center on the reduction of individual
variability. However, the importance of these conditions varies with the context
of the decision.
30
31
Hypotheses
The studies discussed above suggest that the tools that enhance cognitive involvement can mitigate order effects. Group decision-making can serve to enhance
effort and involvement. Group assistance can also be useful in lessening task
demands. Groups have collective experience to draw from, whereas individuals
work alone. Studies in social psychology have found that livelier interaction
among group members was associated with superior performance (e.g. Valacich
& Schwenk, 1995). Interacting groups also reduced belief perseverance (Wright
et al., 1990). These findings suggest that the interaction process itself may have
a positive effect on judgment.
Two aspects of group process could contribute to superior performance. The
group tends to broaden the information set that is brought to bear upon a choice
(Stasser, 1992). This information set includes perspectives on what factual data
means and what limitations it possesses. Group processes also reduces individual
inconsistency or extremity (Schultz & Reckers, 1981). As information exchange
between members occurs, group interaction becomes a corrective function when
individual members have initially incomplete or biased information (Stasser &
Titus, 1985) and are encouraged to alter opinions in order to reach a collective
judgment (Stasser & Davis, 1981).
The complexities of some audits make group processes even more salient.
Auditors are aware of the importance of group work and the need to share and
integrate expertise (Schultz & Reckers, 1981). The audit requires considerable
knowledge about industries and competitive factors in order to ascertain the consequences of account balance fluctuations. Fisher and Ellis (1990) suggested that
social pressures created by the group interaction process would moderate extreme
or divergent views held by group members as they work to accommodate each
others views. In an audit setting, groups may be useful in preventing anecdotal
experience about certain business conditions from being overly generalized.
32
33
about their decision because it takes into account a wide set of perspectives on
importance. Lower confidence would be inconsistent with the social pressures
that support the participatory consensus formation around the groups choice. As
such, the group interaction process may lead to higher group confidence compared
to the individual members pre-group confidence (Sniezek & Henry, 1989, 1990).
The greater confidence may also reflect individuals recognition that groups can
potentially recognize, evaluate, and process more information than individuals.3
In an accounting study, Bloomfield et al. (1996) showed that interaction that
inspired group confidence contributed to group performance. In a different vein,
Allwood and Granhag (1996) found that groups inspired not only confidence,
but also realistic confidence.
The level of confidence is particularly important for the going concern decision
made by auditors. The evaluation of business survival is inherently oriented toward
the future and therefore is more uncertain than most auditing decisions. Since the
going concern decision has distinct adverse consequences for the client, high levels
of confidence are called for to withstand the client resistance that is likely to result.
Accordingly, the following hypothesis will be considered:
H2. Audit groups will exhibit greater confidence than individual auditors about
going concern decisions.
Research over the last thirty years has identified many reasons to depart from
the belief that the direction of influence in decision-making is symmetrical.
Human beings are not bound to strict mathematical consistency when dealing
with information that points to one conclusion relative to information that leads
to an opposite result. Pivoting around a baseline (zero), positive movements and
negative cues of equal magnitude have often been shown to be processed in a
qualitatively differently way. However, the reasons that individuals are influenced
by these frames of reference are imperfectly understood (Newman, 1980).
If group-based reasoning is capable of integrating more information and wider
perspectives, it also may be capable of altering the tendency to treat categories of
cues in ways that are inconsistent with Bayesian logic. The more varied experiences
available to the group as input to their decision may work against the tendency
to over-weigh the negative or the positive. If framing effects are psychological in
nature, forcing them into open discussion may have the effect of exposing their
inconsistency. In other words, there may be more balance in how groups react to
positive and negative types of information than there would be in how individuals
react to that same information.
Auditing has been described as the attempt to confirm a series of interrelated
hypotheses about the clients accounting records (Church & Schneider, 1993).
Evidence that the accounts are correct as stated therefore can be logically
34
THE EXPERIMENT
An experiment was designed to test the hypotheses in a context where auditors
are asked to evaluate a clients ability to continue as a going concern. This type
of context has been employed frequently in prior studies of recency effects in
audit judgment. The specific task in the experiment involves making a series of
judgments about a firms going-concern status and a recommendation about the
type of audit report to issue.
The experiment was conducted in the offices of the participating international
public accounting firm over a four-week period. In each office, arrangements were
made for subjects to participate as individuals or as members of three-person
groups. Judgments were made privately by individuals or collaboratively in
groups. Although the assignment of participants to conditions was random, group
composition was subject to member availability at the pre-established time for
35
the exercise.4 The only qualifying stipulation was that participants were primarily
engaged in the auditing activities of the firm and that they had at least two years
of experience. A researcher distributed and collected all materials in person. For
groups, the researcher was present outside the meeting room for the duration of
the deliberations. Individuals completed the task in their offices, but without the
physical proximity of the researcher.
36
37
return the task to the appropriately labeled envelope, and to seal the envelope at
the end of each task.
In Task 1, participants were first asked to provide their general threshold level
for substantial doubt, such that a modified audit opinion would be recommended
for any entity whose likelihood for continued existence fell below the threshold
level. This established, in quantified terms, participants baseline threshold for
substantial doubt before they considered the hypothetical client in particular.
Group members had to agree to a single baseline. The scale used for pinpointing
participants threshold levels ranged from 0 to 100, with endpoints labeled
certain not to continue (0) and certain to continue (100).
Participants then dealt with case-specific questions. They were asked to: (1)
assess the likelihood of the clients continued existence through the end of the
current fiscal year; (2) recommend the type of audit report to be issued; and (3)
indicate their confidence in the audit report recommended. A 0100 scale with
end points labeled certain not to continue (0) and certain to continue (100)
measured this for each subsequent likelihood judgment. A similar 0100 scale
with end points labeled not confident at all (0) and very confident (100) was
used to elicit participants confidence level. The audit report categories were Unqualied, Modied, and Disclaimer. Under U.S. auditing standards, the modified
opinion would be appropriate if there were significant doubt about the entitys
continuation (AU 341, AICPA, 1990). At this point, participants did not know that
they would receive additional information or have an opportunity to revise their
previous judgments. In addition to familiarizing the participants with the clients
overall operations and financial conditions, Task 1 allowed them to set their own
decisional anchor points.
Task 2 of the case sequentially presented six additional pieces of evidence.
Three of the evidence items were classified as Contrary with regard to the going
concern status of the hypothetical company. Contrary information is defined as
any evidence or issue that raises doubts about the entitys ability to continue in
existence. Specifically, the contrary items related to: (1) the upcoming expiration
of a patent that had consistently generated approximately 25% of total sales;
(2) the departure of one of the companys key sales executives; and (3) the
non-renewal of the companys line of credit. The other three evidentiary items
could be considered Mitigating in nature, since they might quell traditional
auditor going concern doubts. The mitigating factors were: (1) the receipt of a
favorable marketing research report on a new product line; (2) the successful
deferment of an account payable over a three-year period; and (3) a successfully
concluded contract negotiation with an employee labor union. Following the
presentation of each of these pieces of evidence, participants were asked to
provide a revised assessment of the likelihood that the client would continue in
38
existence through the end of current fiscal year. After providing the last of these
assessments, participants were again asked to recommend the type of audit report
to be issued and to indicate their confidence in the appropriateness of that report.
The six items were presented in two orders. In the condition labeled MMMCCC
on Fig. 1, the three mitigating factors (MMM) were presented first, followed by the
three pieces of contrary information (CCC). The order of evidence was reversed
in the second condition, labeled CCCMMM. The variation in the order of cues
was the recency manipulation. Each of these items was presented on a new page
contained in an envelope. Participants were asked to complete a new 0100 scaled
sealed assessment of the hypothetical companys continuation as a going concern
before examining the next item of evidence. After the last piece of evidence was
revealed, participants were again asked about their confidence about the opinion
type they recommended, with a question identical to that used in Task 1.
Task 3 of the case required all participants to complete a questionnaire regarding
their background and auditing experience. Since these questions concerned their
individual attributes, all participants, even those that had worked in groups for
Tasks 1 and 2, were asked to work alone on Task 3.
Task 4 obtained data for a manipulation check. Nine pieces of evidence (including the six items presented in the experiment) were used to check respondents
perceptions. They were asked to classify these nine items as contrary, mitigating,
or neither, in relation to a going concern question. Individuals that had worked
in groups for Task 1 and Task 2 also performed this task collectively in keeping
with the intent to study the difference between groups and individuals.5
Participants
Ninety-one auditors from a Big-Five CPA firm participated in the experiment. Of
the 91 auditors, 49 were managers, and 42 were seniors. There were 21 groups,
each consisting of one manager and two seniors. The 28 people who worked as
individuals were all managers. This design feature was motivated by a desire to
have at least one experienced individual in each decision-making unit.6 Table 1
presents auditor experience by rank and treatment conditions.
On average, managers had 8.45 years of experience (range 515 years) while
seniors had 3.26 years of experience (range 25 years). The sample of individuals
had, on average, more experience (7.93 years) than auditors in the group
condition (5.24 years). However, the groups had managers with more experience
(9.19 years) as members. The extent to which group members had previous
experience working with each other on actual audit engagements was not available
information.
39
Rank
Decision Unit
Group
Information
CCCMMM
Manager
Senior
9.09 (1.38)
3.36 (0.85)
3.27 (1.79)
0.68 (0.99)
Individual
Ordera
MMMCCC
9.30 (3.09)
3.15 (0.99)
139.5 (122.91)
26.4 (17.27)
3.60 (3.89)
1.15 (1.27)
Information Order
CCCMMM
8.23 (1.09)
102.3 (69.63)
0.38 (2.87)
MMMCCC
7.67 (1.40)
105.67 (70.12)
2.47 (1.99)
a Respondents
Most participants indicated that, as members of audit teams, they had been
involved in engagements in which an opinion other than unqualified was either
seriously considered (81 of 91), or actually issued (63 of 91). This suggests that
participants were familiar with non-standard audit reports in the real world of
audit practice. Of the 63 who had been on audits in which a going concern opinion
was issued, 42 were managers and 21 were seniors.
Experiment Design
Participants were assigned to one of four experimental conditions according to a
2 (decision unit) 2 (order of evidence) design. Thus, the four treatment conditions for the first hypothesis were: Individual, CCCMMM; Individual, MMMCCC;
Group, CCCMMM; and Group, MMMCCC. The dependent variable for the first
hypothesis (H1) and the third hypothesis (H3) was the change in the assessed
likelihood of the clients continued existence. The change was measured based on
assessments made after the initial review of the case in Task 1 (labeled J0 ) and
after the review of all six additional items of evidence in Task 2 of the experiment
40
RESULTS
Descriptive Results
The results of the manipulation check in Task 4 were very satisfactory. Participants
overwhelmingly reacted in the expected direction. Only 3 (1.02%) of the 294
possible cases (6 items each from 28 individuals and 21 groups) were incorrectly
classified. Regardless of this small misclassification, participants always revised
their probability assessment in the expected direction (downward in response to
contrary information and upward in response to mitigating factors) to the evidential
facts during Task 2.
The average likelihood judgments (J0 J6 ) are reported in Table 2. The average
initial judgment (J0 ) by individuals (68.92 points) and groups (69.05 points) was
not significantly different ( p > 0.10). Table 2 shows how each subsequent informational unit altered the progressive going concern estimation in the predicted
direction. The average downward belief revision for contrary information was
39.16 points. The average upward belief revision was 15.82 points for mitigating
information. This magnitude difference is consistent with prior findings that
auditors are particularly sensitive to disconfirming evidence (Ashton & Ashton,
1988; McMillan & White, 1993). The average downward revision for contrary
information was less for groups (31 points) than for individuals (45 points).
Similarly, the average upward revision for mitigating information was 11 points
for groups and 19 points for individuals. Consistent with the literature that suggests
that groups function to taper extreme member positions, group responses were less
polarized than individual responses in both the positive and the negative direction
in this audit context.
Tests of Hypotheses
The first hypothesis specified that the groups would exhibit less recency effects
than individuals. A 2 (decision unit) 2 (order) ANOVA was conducted with
percent change cumulative belief revision (J 6 J 0 )/J 0 as the dependent variable.
Treatment Conditions
Decision Unit
Group (N = 11)
Group (N = 10)
Individual (N = 13)
Individual (N = 15)
a Respondents
Mean (Standard Deviation) of Initial (J0 ) and Revised (J1 Through J6 ) Likelihood Assessments
Ordera
J0
J1
J2
J3
J4
J5
J6
CCCMMM
MMMCCC
CCCMMM
MMMCCC
69.54 (22.63)
68.50 (16.67)
66.92 (21.27)
70.67 (14.12)
59.54 (24.54)
71.50 (18.86)
41.92 (22.03)
76.53 (14.89)
47.72 (26.77)
73.50 (15.47)
34.85 (16.62)
76.73 (13.23)
38.82 (27.76)
78.50 (14.35)
23.46 (16.88)
81.00 (8.70)
42.73 (26.49)
64.50 (17.55)
36.92 (16.40)
60.00 (8.45)
45.91 (25.18)
54.20 (22.75)
52.07 (20.38)
41.80 (15.36)
51.36 (23.88)
47.00 (21.24)
52.84 (18.76)
34.27 (14.28)
in the CCCMMM (MMMCCC) condition received three items of contrary (mitigating) evidence, followed by three items of mitigating
(contrary) evidence.
41
42
df
Order
Decision unit
Order Decision-unit
Residual
1
1
1
45
Order
CCCMMM
MMMCCC
t
p
Mean Square
0.464
0.033
0.278
0.051
Sig. of F
9.085
0.654
5.432
0.004
0.423
0.024
Group
0.1921
0.5180
0.2921
0.3133
4.13
0.000
0.20
0.847
The results are presented in Panel A of Table 3. The significance of the order
variable (F = 9.085, p < 0.01) shows that recency effects are present in auditors
going concerns decisions. More importantly however, the results reveal a significant interaction (F = 5.43, p < 0.05) between order and decision unit. This result
suggests that judgments were not only influenced by the order in which evidence
was evaluated, but also by whether judgments were made individually or in groups.
The decision unit does not have a direct effect and is important only in terms of
altering the impact of order effects. This suggests that groups act as a debiaser
in eliminating recency in auditor going concern judgments. H1 is supported.
Another test of recency among individual auditors shows that individuals in
MMMCCC condition made a greater average downward adjustment in their
going-concern likelihood judgments (from 70.67 to 34.27, a change of 36.40
points) than individuals in CCCMMM (from 66.92 to 52.84, a change of 14.08
points). This difference in average belief-revisions was significant (t = 3.96,
p < 0.001). In contrast to the individual results, likelihood judgments of audit
groups exhibited no recency. Here, the average downward adjustment was 21.50
points (from 68.50 to 47.00) for the MMMCCC condition, and almost identical
18.18 (from 69.54 to 51.36) points for the CCCMMM condition. This difference
was not significant (t = 0.47, p > 0.65).8 Hence, as expected, groups mitigated
the recency effect. These results also support H1.
The second hypothesis asserted a relationship between decision unit and going
concern judgment confidence. Specifically, audit groups were predicted to have
greater confidence in their going concern decisions. For these purposes, decision
43
df
Mean Square
Sig. of F
Order
Decision unit
Order Decision-unit
Residual
1
1
1
45
138.641
923.415
0.025
22.587
0.623
4.149
0.000
0.434
0.048
0.992
Decision Unit
Individual
Group
t
p
Average Confidence
Initial
Final
63.57
75.57
71.25
80.23
2.27
0.028
2.25
0.029
confidence at the end of the case was used as the dependent variable. Final
confidence is important because it reflects the processing of all the information
in the case, either by groups or individually. Table 4 offers an ANOVA to test the
second hypothesis. Information order and decision unit are included as possible
effects upon final confidence consistent with H2. The significance of decision unit
at p < 0.05 suggests that groups have higher levels of confidence.9 The failure of
order effects, and the interaction between order and decision unit, to be significant
suggests that only how the decision-making unit was structured influenced
confidence.
Although H2 pertains to the existence of group differences, the change in
confidence that occurred during the experiment was also considered. Groups
exhibited significantly higher initial confidence than individuals (t = 2.27,
p < 0.03). A 2 2 ANCOVA with final confidence as the dependent variable,
initial confidence as the covariate, and decision unit and order as the independent
variables was conducted. In results not shown, the initial confidence covariate
was significant ( p < 0.05). Neither of the two main effects nor their interaction was significant. This suggests that the differential confidence in the final
decision was driven by the initial differences, and not by the differential processing of information. Nonetheless, groups maintained a significant difference
in confidence over individuals throughout the entire process of belief revision.
Groups begin more confidently and stay that way, as further information is made
known about relevant events. However, the group does not progressively become
significantly more confident. The confidence difference appears to adhere to
44
Individuals
Groups
45.21 (13.44)
19.18 (18.66)
31.09 (17.05)
11.33 (15.19)
3.13
1.57
0.003
0.122
the mere existence of the group, rather than its continued information handling
abilities.
The final hypothesis concerns different processing by groups and individuals
of the confirmatory and mitigating information. In the test of H3, the six
opportunities provided to participants to revise their probability beliefs were
distinguished into contrary and mitigating types. As shown in Table 5, there
is a significant difference between individual and group responses to contrary
information (t = 3.13, p < 0.01), with individuals reacting more severely. This
is consistent with H3a. No significant differences exist between audit groups
and individual auditors when presented with mitigating information (t = 1.57,
p > 0.12). This does not support H3b.
Other Analyses
In Hypothesis H1, the dependent variable was the revision of the assessment of the
likelihood that the client firm will continue as a going concern. As Asare (1992)
points out, it is also important to learn whether the differences in audit judgments
induced by the recency effect are likely to lead to differences in substantive audit
decisions. Accordingly, an additional analysis was performed to examine whether
judgment differences were sufficient to influence the audit report decisions in this
particular case setting.
Table 6 reports the recommended audit opinion of participants in each of the
four treatment conditions, both at the initial stage (Task 1) of the experiment, and
after reviewing all six additional items of information (the conclusion of Task 2).
Since none of the groups or individuals selected the disclaimer of an opinion
recommendation at any point in the experiment, the audit opinion variable was
binary. At the initial point, individuals are no more likely to recommend a
modified opinion (2 = 0.92, p > 0.50). However, individuals show a stronger
tendency to switch to a modified opinion during the course of the case. When final
decisions are considered, individuals are more likely than groups to recommend a
45
Individual
Individual
Groups
Groups
Order
CCCMMM
MMMCCC
CCCMMM
MMMCCC
Initial Opinion
Final Opinion
Unqualified
Modified
Unqualified Modified
13
15
11
10
8
11
9
7
5
4
2
3
4
2
6
4
9
13
5
6
49
35
14
16
33
Opinion Chosen
Initial
Final
Unqualified
Modified
Unqualified
Modified
Individuals Unqualified
Modified
18a
1
2
7a
4a
2
3
19a
Groups
14a
2
5a
9a
1
11a
a Indicates
Unqualified
Modified
modified opinion (2 = 5.029, p < 0.05). In results not shown, individuals in the
CCCMMM condition tended to recommend more unqualified and fewer modified
opinions than individuals in MMMCCC condition at the end of the experiment.
This comparison, however, is not significant (2 = 2.24, p > 0.05). A comparison
of the distribution of final recommended opinions to the distribution of initial
opinions shows that 4 of 8 individuals in the CCCMMM group changed their
recommendation from unqualified to modified, while 9 of 11 in the MMMCCC
condition changed from unqualified to modified. A much less severe pattern
existed for groups. Only 6 of 21 groups (3 in each order condition) changed
their recommendation from unqualified to modified. However, neither of these
comparisons is significant (2 = 0.962, p > 0.05 and 2 = 0.829, p > 0.05 for
individuals and groups, respectively). Contrary to the expected effect of recency
on audit opinions, the number of modified opinions increased in both individual
and group CCCMMM conditions. Although revisions of belief toward modified
opinions may align with the aforementioned heightened sensitivity of auditors
to adverse news, these results also suggest possible differences between binary
(unqualified, modified) and continuous (percentage probability) outcomes.10
46
47
groups tend to sustain, but not significantly increase, their confidence advantage
over individuals. This suggests that the advantages of the group mode in an audit
setting occur early in the deliberative process. The fact that the confidence of
groups did not increase over time also may indicate that this collective mode is
not necessarily prone to overconfidence.
The results suggest that one of the main differences that groups may offer is
their willingness to reduce extreme reactions to particular pieces of information
that push toward extreme solutions. In the going concern situation, further evidence
of financial distress would logically make the going concern question more salient.
However, the contribution toward this conclusion for groups is relatively small.
Groups appear to be more willing to suspend judgment or to put each additional
piece of information in a broader context. Individuals demonstrate more sensitivity
to bad news by making larger belief revisions. This difference between groups
and individuals is not observed for information that tended to lessen the going
concern problem. Individuals did not react more strongly to facts that suggested
that the hypothetical business would remain financial viable. Further research is
needed to test possible reasons that the two decision units processed good news
and bad news differently.
The results should redirect the attention of auditing organizations and academic
accountants to group dynamics. Groups appear to process information in ways less
affected by its order. Groups are also more confident about decisions and less likely
to overreact to bad news about a client. Auditing firms should be comfortable
about the ability of groups to avoid recency bias but be somewhat concerned about
the tendency to perhaps react too little to going concern issues. In light of recent
sudden corporate bankruptcies, the latter tendency needs to be guarded against.
This research did not attempt to evaluate the importance of degrees of
confidence. The superior confidence of groups does not necessarily imply that
groups made more technically correct decisions about the going concern status of
the hypothetical client. This hypothetical nature of the client prevents any proof
of superiority. A necessary prelude to the confidence that constituents might
have about auditing outcomes is the confidence that auditors themselves have
in auditing inputs. Nonetheless, subsequent research should be directed at the
specific value of confidence in auditing judgments.
The findings of this study are subject to certain limitations. One stems from
the unavailability of data regarding the extent to which group members actually
had experience working together on previous engagements. The effectiveness
of group processes may depend on such experience, as individuals learn to
systematically respect or discount the judgments of others. The importance of
working histories of groups may not be as high in auditing as in other business
settings. As firms get larger and centralize control over their human resources,
48
individual assignments become less predictable and stable. No attention was given
to hierarchical differences within the participants that were assigned to groups.
In the attempt to ensure sufficient going concern expertise, auditors of different
ranks were mixed in the groups. No evidence exists on the question of whether
participants of higher rank dominated group decisions. A more systematic attempt
to isolate the power of more highly ranked individuals would have been necessary
to shed light on this question.11 Another potential limitation stems from the
fact that auditors in the group condition are more experienced than auditors in
the individual condition. Although the groups also included auditors with lesser
experience than those that worked individually, an experience effect may have
resulted if the more experienced group member dominated the group decisions.
NOTES
1. The professional nature of the work mitigates the fact that these groups often consist
of individuals at different levels within the organization. However, the empirical regularities
created by this professionalism need further investigation.
2. The expected ability of groups to make better-informed decisions does not take
into account situations where individuals first make judgments and then enter groups for
the reevaluation of the decision. This may cause groups to move towards more extreme
positions, as shown by Marxen (1990).
3. Group confidence might be lowered by cases where individuals strongly disagreed
with group positions. Therefore, the expectation that group confidence will be higher than
individual confidence implicitly asserts that these situations will be rare. This study does
not measure the degree to which satisfaction is related to confidence.
4. Group composition could be very important to the dynamics of group decisionmaking. Since this research could not tightly control the composition dimension,
interpersonal issues such as charisma and persuasiveness could not be measured. On
more objective dimensions such as experience and rank, a suitable mixture of people was
achieved. See Table 1 and the discussion of participants in the Results section.
5. The researcher did not inquire about the decision processes of the groups after the
experiment was completed. Investigating this in a way that did it justice would require
another study.
6. This choice on group composition creates an alternative interpretation about the extent
of influence lower level employees can have on higher ones. See Graen and Uhl-Bien (1995).
7. The measure J0 J6 was also examined in raw change terms. Since no differences in
the substantive results occurred, these were not shown.
8. Other tests were conducted to clarify the interpretation of the results presented in
Table 3. An ANCOVA with experience as the covariant (p > 0.05) was considered. A
significant order/decision-unit interaction (F = 4.674, p < 0.05) again resulted. This
suggests that these findings are not attributable to an experience effect. Another analysis
used J6 as the dependent variable, J0 as the covariant, and order and decision-unit as the
independent variables. This model captures belief revision in a different way by more
explicitly controlling for the initial anchoring point (J0 ). It also shows results similar
49
to those that are reported above. Specifically, the interaction between order effects and
decision-unit was significant (F = 4.35, p < 0.05). Another covariant that could be
important is the threshold for substantial doubt. The point at which the decision-maker is
confronted with a reportable going concern issue may present a matter independent from
the quantifiable belief revision variable. Using the probability estimate for this general
threshold specifically collected from the participants in Task 1 as a covariant, the order
effect/decision unit interaction term was again significant (F = 4.89, p < 0.05). The
results suggest the acceptance of the first hypothesis. Audit groups making going concern
decisions are less prone than individual auditors to recency effects.
9. As shown in Panel B of Table 4, this relationship was also analyzed using t-tests.
The results show that the difference between the final confidence of individuals (71.25)
and groups (80.23) was significant (t = 2.25, p < 0.03). This results is consistent with the
expectation in H2.
10. The bottom portion of Table 6 reports whether the participants recommended opinions were consistent with the final probability ratings and (J6 ) with their initial threshold
judgment provided at the beginning of Task 1, apart from the consideration of case materials.
An auditors opinion type decision was considered consistent if the likelihood rating was
below the threshold judgment, and a modified report was chosen. Alternatively, consistency
could also be achieved with the recommendation that the opinion be unqualified if likelihood
was above the given threshold. Table 6 reports the results of these comparisons. In total,
only 7% (3 of 42) group recommendations of audit opinions were inconsistent. A nearly
twice as large 14% (8 of 56) of the individual recommendations were inconsistent. An even
more telling process unfolds when initial and final likelihood positions are differentiated.
Groups become more consistent to their original threshold over time. Initially, 90% of the
groups are consistent. This increases to 95% consistency after the last piece of information
has been processed. Individuals become less consistent. The percent of individuals that are
consistent changes from 89 to 82% over the course of the decision-making.
11. Conversations with practitioners about this did not reveal any consistent practice.
Some firms had a more hierarchical approach than others almost to the point of resting
this decision on the engagement partner after the other auditors had collected the relevant
information and suggested an outcome. Other firms had a more participatory process
wherein the decision cascaded from the lower levels to the top.
REFERENCES
Allwood, C. M., & Granhag, P. (1996). Realism in confidence judgments as a function of working in
dyads or alone. Organizational Behavior and Human Decision Processes, 64, 277289.
American Institute of Certified Public Accountants (1990). Statement on auditing standards No. 59:
The auditors consideration of an entitys ability to continue as a going concern. (AU 341) New
York, NY: AICPA.
Anderson, C. A., & Sechler, E. (1986). Effects of explanation and counter-explanation on the development and use of social theories. Journal of Personality and Social Psychology, 50, 2434.
Asare, S. K. (1992). The auditors going-concern decision: Interaction of task variables and the
sequential processing of evidence. The Accounting Review, 67, 379393.
Ashton, A. H., & Ashton, R. (1988). Sequential belief revision in auditing. The Accounting Review,
63, 623641.
50
Bloomfield, R., Libby, R., & Nelson, M. (1996). Communication of confidence as a determinant
of group judgment accuracy. Organizational Behavior and Human Decision Processes, 6,
287300.
Chow, C., McNamee, A., & Plumlee, D. (1987). Practitioners perceptions of audit step difficulty and
criticalness: Implications for audit research. Auditing: A Journal of Practice and Theory, 6,
123133.
Church, B. (1991). An examination of the effect that commitment to a hypothesis has on auditors
evaluations of confirming and disconfirming evidence. Contemporary Accounting Research, 7,
513534.
Church, B., & Schneider, A. (1993). Auditor generation of diagnostic hypotheses in response to a
superiors suggestion: Influence effects. Contemporary Accounting Research, 10, 333350.
Cushing, B., & Ahlawat, S. (1996). Mitigation of recency bias in audit judgment: The effect of documentation. Auditing: A Journal of Practice & Theory, 16, 134146.
Dillard, J. N., Kauffman, N., & Spires, E. (1991). Evidence order and belief revision in management
accounting decisions. Accounting, Organizations and Society, 7, 619633.
Fisher, B. A., & Ellis, D. (1990). Small group decision-making: Communication and the group process.
New York, NY: McGraw-Hill.
Gibbins, M., & Emby, C. (1985). Evidence on the nature of professional judgment in public accounting.
In: A. R. Abdel-khalik & I. Solomon (Eds), Auditing Research Symposium (pp. 181212).
Champaign, IL: University of Illinois.
Graen, G. B., & Uhl-Bien, M. (1995). Relationship-based approach to leadership: Development of
leader-member exchange (LMX) theory of leadership over 25 years: Applying a multi-level
multi-domain perspective. Leadership Quarterly, 6, 219247.
Hill, G. W. (1982). Group versus individual performance: Are n + 1 heads better than one? Psychological Bulletin, 19, 517539.
Hogarth, R. M., & Einhorn, H. (1992). Order effects in belief updating: The belief adjustment model.
Cognitive Psychology, 24, 155.
Kennedy, J. (1993). Debiasing audit judgment with accountability: A framework and experimental
results. Journal of Accounting Research, 31, 231245.
Luus, C. A. E., & Wells, G. (1994). The malleability of eyewitness confidence: Co-witness and perseverance effects. Journal of Applied Psychology, 79, 714723.
Marxen, D. (1990). A behavioral investigation of time budget preparation in a competitive audit environment. Accounting Horizons, 4, 4757.
McMillan, J., & White, R. (1993). Auditors belief revisions and evidence search: The effect of
hypothesis frame, confirmation bias, and professional skepticism. The Accounting Review, 68,
443465.
Messier, W., & Tubbs, R. (1994). Mitigating recency effects in belief revision: The impact of audit
experience and the review process. Auditing: A Journal of Practice & Theory, 14, 5772.
Miner, F. (1984). Group versus industrial decision-making: An investigation of performance measures, decision strategies and process. Organizational Behavior and Human Performance, 39,
112124.
Myers, D., & Lamm, H. (1976). The group polarization phenomenon. Psychological Bulletin, 82,
602627.
Newman, D. (1980). Prospect theory: Implications for information evaluation. Accounting, Organizations and Society, 5, 217230.
Pei, B. K., Reed, S., & Koch, B. (1992). Auditor belief revisions in a performance auditing setting:
An application of the belief-adjustment model. Accounting, Organizations, and Society, 17,
169183.
51
Reckers, P. M. J., & Schultz, J. (1993). The effect of fraud signals, evidence order, and group-assisted
counsel on independent auditor judgment. Behavioral Research in Accounting, 5, 124144.
Schultz, J. J., & Reckers, P. (1981). The impact of group processing on selected audit disclosure
decisions. Journal of Accounting Research, 19, 482501.
Sniezek, J. A., & Henry, R. A. (1989). Accuracy and confidence in group judgment. Organizational
Behavior and Human Decision Processes, 43, 128.
Sniezek, J. A., & Henry, R. (1990). Revision, weighting, and commitment in consensus group judgment.
Organizational Behavior and Human Decision Processes, 45, 6684.
Solomon, I. (1987). Multi-auditor judgment/decision-making research. Journal of Accounting
Literature, 6, 125.
Stasser, G. (1992). Information salience and the discovery of hidden profiles by decision-making
groups? A thought experiment. Organizational Behavior and Human Decision Processes,
52, 156181.
Stasser, G., & Davis, J. (1981). Group decision-making and social influence: A social interaction
sequence model. Psychological Review, 88, 523551.
Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision-making: Biased
information sampling during discussion. Journal of Personality and Social Psychology, 48,
14671478.
Tetlock, P. (1983). Accountability and the perseverance of first impressions. Social Psychology
Quarterly, 46, 285292.
Trotman, K., & Wright, A. (1996). Recency effects: Task complexity, decision-mode, and task-specific
experience. Behavioral Research in Accounting, 8, 175193.
Tubbs, R., Messier, W., Jr., & Knechel, W. (1990). Recency effects in the auditors belief-revision
process. The Accounting Review, 65, 452460.
Valacich, J. S., & Schwenk, C. (1995). Devils advocacy and dialectical inquiry effects on face-to-face
and computer-mediated group decision-making. Organizational Behavior and Human Decision
Processes, 63, 158173.
Vance, R., & Biddle, T. (1985). Task experience and social cues: Interactive effects on attitudinal
reaction. Organizational Behavior and Human Performance, 35, 252265.
Weiss, H., & Shaw, J. (1979). Social influences in judgments about task. Organizational Behavior and
Human Performance, 24, 126140.
White, S., Mitchell, T., & Bell, C. (1977). Goal setting, evaluation apprehension and social cues as
determinants of job performance and job satisfaction in a simulated organization. Journal of
Applied Psychology, 52, 665673.
Winquist, J., & Larson, J. (1998). Information pooling: When it impacts group decision-making. Journal
of Personality and Social Psychology, 74, 371378.
Wright, E., Luus, C., & Christie, S. (1990). Does group discussion facilitate the use of consensus
information in making causal attribution? Journal of Personality and Social Psychology, 59,
261269.
Zarnoth, P., & Sniezek, J. (1997). The social influence of confidence in group decision-making. Journal
of Experimental Social Psychology, 33, 345367.
53
54
INTRODUCTION
In 2000, state and local governments in the U.S. generated over $1.2 trillion
in revenues; they also spent over $1.1 trillion, accounting for over 9% of the
U.S. gross domestic product (28% of gross domestic product when the federal
government is included) (OMB, 2001). The magnitude of this economic activity
accentuates the need for proper oversight of the sources and uses of the funds,
including the audits of state and local governments. Despite the extent to which
state and local government activity impacts the economy, relatively little behavioral auditing research has been conducted on the effectiveness and efficiency of
the auditors employed by these entities. This study addresses the issue of auditor
effectiveness by empirically testing the professional judgments of state auditors
in a context-rich environment; specifically, it examines the subjective assessment
of sample evidence by state auditors.
Sampling is one area where the evaluation of evidence may be largely
affected by subjective differences in auditors judgments. The auditing standards
explicitly state that in a variables sampling context, the auditor should project
the misstatement results of the sample to the items from which the sample was
selected (AICPA, 2001, AU350.26).1 However, in addition to the quantitative
task of projecting sample misstatements, the standards also note that auditors
should consider the qualitative aspects of the misstatements (AICPA, 2001,
AU350.27), and that the actions that might be taken in light of the nature and
cause of particular misstatements is left to the discretion of the auditor (AICPA,
2001, AU350.06). Thus, some discord exists as to whether misstatements should
always be projected; and if not, under what conditions they should be isolated. If
an auditor inappropriately isolates misstatements found in a sample, the likelihood
of a non-representative, or biased, estimate of the account balance being tested
increases. More specifically, failure to project sample misstatements generally
results in an underestimation of the aggregate misstatement in the underlying
population, thereby increasing the auditors risk of incorrect acceptance. In the
case of state auditors, this implies a failure to satisfy an essential element of public
control and accountability.
The extent to which state auditors do not project sample misstatements of
account balances and the potential consequences of inappropriately isolating
misstatements is an important research topic. State auditors often conduct financial
statement audits; the results of which are used in a variety of ways, including the
allocation of resources among programs and personnel, monitoring compliance
with fiscal laws, and even bond ratings. This study focuses on non-sampling
risk,2 and extends existing literature in three ways. The first contribution is the
55
56
57
58
misstatements in audit populations would prompt auditors to isolate an irregularity rather than project it to the underlying population being tested. However,
auditing standards suggest that irregularities or intentional misstatements warrant
additional consideration when uncovered. In fact, the standards specifically state:
Generally, an isolated, immaterial error in processing accounting data or applying accounting
principles is not significant to the audit. In contrast, when fraud is detected, the auditor should
consider the implications for the integrity of management or employees and the possible effect
on other aspects of the audit (AICPA, 2001, AU312.08).
59
a more conservative approach than isolation; this is consistent with the findings
of Dusenbury et al. (1994) who found that intentional misstatements, in the
presence of containment information, were more likely to be projected than
unintentional misstatements. Increased attention to intentional misstatements may
be particularly warranted in the case of governmental auditors since generally
accepted governmental auditing standards state that the threshold for audit risk
may be lower in governmental audits than in audits of commercial entities (GAO,
1994, 4.9), and because various legal and regulatory requirements faced by
governmental auditors may require reporting on any intentional misstatement,
regardless of its materiality. Thus, the following research hypothesis is proposed:
H1. The propensity of state auditors to project intentional sample misstatements
to the underlying population being tested will be greater than their propensity
to project unintentional misstatements.
Systematic Versus Non-systematic Misstatements Hypothesis
Misstatements, whether intentional or not, may occur systematically or nonsystematically. Systematic misstatements can be defined as those that are likely to
be repeated because of some characteristic(s) associated with a transaction or class
of transactions. Systematic misstatements may occur frequently or infrequently
depending on the persistence of the underlying cause. However, the presence
of a systematic error in an audit sample, by definition, implies that other errors
may be present in the underlying population due to the same causal condition.
Thus, normatively, systematic misstatements discovered in a sample should be
projected to the entire population. The following research hypothesis related to
auditors behavior in the evaluation of sample findings is proposed:
H2. The propensity of state auditors to project systematic sample misstatements
to the underlying population being tested will be greater than their propensity
to project non-systematic misstatements.
EXPERIMENTAL METHODOLOGY
Experimental Task
To test the hypotheses, a series of sampling cases (see Appendix) that incorporated
the experimental manipulations was developed. These cases enabled both across
scenarios and within scenario analysis. Burgstahler and Jiambalvos (1986)
cases served as a basis for comparability with other studies; however, precise
60
replication of cases used in these other studies was not tractable given the
manipulations employed. In addition, although comparisons can be made between
the results of this study and other studies on sample projections, the pressures and
incentives encountered by governmental auditors are believed to vary from those
encountered by external auditors. Until research addresses environmental factors,
perceptions of differences are all that distinguish the governmental auditor from
the non-governmental auditor.
The experimental instrument included four treatment cases, each representing
a different account balance (e.g. sales and accounts receivable) and sampling
situation; thus, we are able to capture the four treatments of the 2 2 design in
which we manipulate two independent variables: (1) the type of misstatement as
either intentional or unintentional (INT); and (2) the nature of the misstatement
in terms of potential recurrence, operationalized as being either systematic or
non-systematic (SYS).4 Each of the four cases should be projected according to
the guidance provided by SAS No. 39, Audit Sampling. When noted in the cases,
the employees responsible for the misstatements were intentionally kept at lower
levels (e.g. clerical employees or warehouse workers) to reduce the saliency of the
individuals involved, especially for manipulations containing intentional misstatements. The dollar amounts of the misstatements were made immaterial since most
misstatements discovered by auditors are not individually material (Elder & Allen,
1998). In addition, keeping the materiality of the misstatements constant (i.e.
immaterial) enhanced control over the manipulations being tested by minimizing
potential confounding effects from the materiality of the misstatements.
The cases used were pretested at a chapter meeting of the Institute of Internal
Auditors. Most of the internal auditors in this chapter were governmental auditors,
and no feedback was received that indicated a problem understanding any of
the case scenarios. In addition, the results of the pretest suggested that the
experimental manipulations worked as intended.
This study is based on a repeated measure design in which each subject received
every possible combination of the 2 2 manipulation of INT and SYS. Each
combination was incorporated randomly into one of the four different treatment
scenarios. Each case scenario in the experimental instrument was included on an
individual page and participants were requested not to return to a scenario after it
was completed. The presentation of the case scenarios was randomized to minimize
potential order effects. After reading each scenario, participants made a decision as
to whether they would project the sample to the account population being tested or
isolate the sample result from the population. Subjects were then asked to complete
a ten point Likert-type scale, which measured the comfort level of their decision.
Subjects were instructed to consider each scenario independently and assume that
the samples were selected at random from the populations being tested.
61
Fig. 1. Illustration of Data Analysis Across Cases (Direct Comparison of the Four
Experimental Manipulations Without Taking the Individual Case Scenarios into Consideration). The Two Manipulated Variables were: (1) Intentional or Unintentional Misstatement;
and (2) Systematic or Non-systematic Misstatement. The Four Cases Scenarios Involved
Misstatements in Sales, Inventory, Receivables, and Unknown Receipts.
The differences in the auditors decisions are analyzed both across all cases and
by individual case, as illustrated in Fig. 1. The analysis across the four treatment
cases tests the experimental conditions of the 2 2 manipulation of INT and SYS,
with each subject receiving one each of the four conditions in the 2 2 design.
In the analysis by individual case, all like cases (e.g. all sales scenarios, Case 1 in
Appendix) received by the participating auditors are tested in the aggregate.
A major difference between the experimental design used in this study and
the research designs of most other studies investigating the isolation/projection
decisions of auditors (e.g. Burgstahler & Jiambalvo, 1986; Dusenbury et al.,
1994; Hermanson, 1997) is that in this study, the effects of the two independent
variables are isolated in the four combinations of the 2 2 design. The other
studies tested factors that affect auditors decisions by observing differences
62
Subjects
A total of 100 experimental instruments were distributed to governmental auditors
from ten different state audit departments, representing all regions of the United
States. Seventy-eight were completed and returned. Department managers of the
participating states administered the distribution of instruments to members of
their audit departments, and collected and returned the completed instruments to
the authors. Each participant was assured complete anonymity.
As Table 1 indicates, 65 of the 78 participants (83%) had some professional
certification (e.g. CGFM, CPA), with the CPA designation being the most common
(62% of the participants). Further analysis shows that only seven participants
without some professional certification are from states hiring entry level audit staff
from a variety of educational backgrounds (i.e. the state auditors are not required
to have an accounting education or minimum number of accounting hours). Thus,
a maximum of 7 subjects may have been responding to the instrument as newly
hired audit staff without accounting knowledge (8.9%).5 The appropriateness
of the participating auditors for the experimental task was further indicated by
their response to a question asking the auditors how frequently they use sampling
procedures on audit engagements, with 80% being the mean response. In addition,
only four participants indicated that they did not use sampling or they did not
respond to this particular inquiry. Analyses of the data excluding these four
auditors indicate no significant differences from the results reported below.
At first glance, the case context might seem inappropriate for the governmental
environment investigated (e.g. it is company policy to . . . and a review of
the companys internal audit workpapers reveal . . .); however, the context
is appropriate because auditors of state or local government (SLG) financial
63
Number of
Subjects
Central
North Central
Northeast
Northwest
South
Southeast
15
11
7
15
15
20
Total
78
Professional
Certification
CPA
CGFM
CIA
Other
None
Number of
Subjectsb
48
21
3
7
13
Panel B: Means
Item
Mean
Standard
Deviation
Frequency of sampling
% of audit engagements
79.9
25.8
26.3
73.3
30.7
30.7
a States
b Does
64
65
Parameter
Estimate
Odds Ratio
Wald 2
10.71
1.59
24.90
p-Value
0.0011
0.2077
0.0001
Model
2
p-Value
27.47
0.0001
0.660
64.28
0.0001
0.741
a The
dichotomous dependent variable is the auditors decisions with regard to sample misstatement
findings, defined as 0 = Isolate or 1 = Project. The independent variable INT is manipulated as an
intentional or unintentional misstatement, and SYS is manipulated as a systematic or non-systematic
misstatement. INTCK and SYSCK refer to the participants assessments of intentional and systematic
misstatements.
scenario. To the extent that one or more of the scenarios would produce a different
response in the dependent variable (the decision of isolating or projecting the
sample misstatement), the analysis would be biased against finding a significant
result; thus, collapsing the scenarios into a single group is a conservative approach
to the data analysis.
Table 2 presents the results of the logistic regression with all of the cases collapsed into one group. Panel A shows the results using the manipulated variables
INT and SYS. In Panel B, the manipulated explanatory variables INT and SYS
have been replaced in the logistic regression model with INTCK and SYSCK,
the subjects assessments of whether the misstatements were intentional or
systematic, respectively. In both across treatment models, the chi-square statistics
are significant at the 0.01 level, suggesting the models are good predictors of
the auditors propensity to isolate or project sample misstatements. In addition,
goodness of fit for the logistic regression models was obtained by the c statistic,
which is somewhat analogous to the coefficient of multiple correlation (Kane
et al., 1996).7 In both models, the c statistic is greater than 0.65.
The analysis across treatments does not support H1 regardless of whether INT
or INTCK is included in the regression models, indicating that no difference exists
in the auditors isolation/projection decisions whether the sample misstatements
are intentional or not.
The independent variable SYS, which is used to operationalize the manipulation
of systematic misstatements, is highly significant as indicated in Table 2. As a
66
67
Projected
Isolated
85 (54%)
75 (48%)
71 (46%)
81 (52%)
160 (51%)
152 (49%)
102 (65%)
58 (37%)
54 (35%)
98 (63%)
160 (51%)
152 (49%)
51 (65%)
51 (65%)
27 (35%)
27 (35%)
34 (44%)
24 (31%)
44 (56%)
54 (69%)
160 (51%)
152 (49%)
73 (45%)
79 (54%)
156 (51%)
152 (49%)
113 (72%)
39 (27%)
44 (28%)
106 (73%)
152 (50%)
150 (50%)
65 (77%)
48 (66%)
19 (23%)
25 (34%)
20 (27%)
19 (27%)
54 (73%)
52 (73%)
152 (50%)
150 (50%)
68
Parameter
Estimate
Odds Ratio
0.26
9.00
12.64
21.65
1.55
2.12
1.63
1.58
Wald 2
p-Value
Model
2
p-Value
4.79
5.82
13.41
0.0287
0.0158
0.0003
20.72
0.0001
0.778
2.10
2.24
5.63
0.1475
0.1345
0.0177
9.58
0.0083
0.696
11.95
10.21
14.81
0.0005
0.0014
0.0001
33.87
0.0001
0.827
15.25
8.41
20.99
0.0001
0.0037
0.0001
40.42
0.0001
0.872
0.24
0.78
2.43
0.6207
0.3766
0.1164
3.01
0.2225
0.610
2.65
0.07
9.75
0.1475
0.1345
0.0177
11.61
0.0030
0.700
0.29
1.09
0.95
0.5913
0.2974
0.3291
2.33
0.3113
0.596
5.25
1.03
13.13
0.0220
0.3107
0.0003
15.80
0.0004
0.747
dichotomous dependent variable is the auditors decisions with regard to sample misstatement
findings, defined as 0 = Isolate or 1 = Project. The independent variable INT is manipulated as an
intentional or unintentional misstatement, and SYS is manipulated as a systematic or non-systematic
misstatement. INTCK and SYSCK refer to the participants assessments of intentional and systematic
misstatements.
69
70
Overall, the analysis of the individual case scenarios indicates that auditors
isolation/projection decisions are not significantly affected by whether sample
misstatements are intentional or not; thus, the within case analyses does not
support H1. In addition, the results suggest that auditors are more likely to
project systematic sample misstatements to the underlying population than they
are non-systematic misstatements. While the finding is largely applicable in
the logistic regression models containing the variable SYS, the results are even
stronger when the subjects assessments of systematic misstatements, SYSCK, is
included in the regression models.
Discussion of Findings
Across case treatments, intentional misstatements were not projected by the
auditors more frequently than they were isolated, failing to support H1. Tests
performed using the within case analyses also suggest that overall, the auditors
isolation/projection decisions are not influenced by whether or not the sample
misstatements were intentional.
The finding that intentional misstatements are generally not isolated contradicts
the representative heuristic (Kahneman & Tversky, 1972) and prior empirical
evidence that suggests the uniqueness perception of misstatements is highly significant in auditors decisions to isolate or project misstatements (e.g. Burgstahler &
Jiambalvo, 1986; Hermanson, 1997). In practice, intentional misstatements occur
infrequently relative to unintentional errors. Application of the representative
heuristic suggests that intentional misstatements should be isolated more than they
should be projected because auditors, having seen few, if any, intentional misstatements, may not have a category of irregularity attributes in memory. Authoritative
standards appear to counteract the heuristic; generally accepted governmental
auditing standards directs the state auditors to . . . design the audit to provide
reasonable assurance of detecting material misstatements resulting from noncompliance with provisions of contracts or grant agreements that have a direct and
material effect on the determination of financial statement amounts (GAO, 1994,
4.13). In addition, government auditors apply the AICPAs generally accepted
auditing standards, including SAS No. 99, Consideration of Fraud in a Financial
Statement Audit.
The analyses performed both across and within case treatments suggest that
auditors tend to project systematic misstatements more often than they isolate
them, providing support for H2. While this finding was highly significant for
the analyses using the systematic manipulation (SYS), the results were even
stronger when tests were run based on the subjects perceptions of systematic
71
misstatements (SYSCK). This is evidenced by the overall higher odds ratios and
Wald 2 values in Tables 2 and 4.
CONCLUDING COMMENTS
In this study, two factors posited to affect governmental auditors sample
projection decisions were tested, whether sample misstatements are intentional
and/or systematic. The studys research design allowed for the testing of these two
independent variables both across case scenarios and within case scenarios. Prior
studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson,
1997) found evidence of factors affecting the auditors decisions as to whether
or not sample errors should be projected to the population from which they were
drawn, however, those findings were aggregated across cases and the impact of
the factors were not examined within specific case scenarios (i.e. the studies did
not manipulate variables within case scenarios).
In analyses performed both across and within case treatments, the results of
the study indicate that the states auditors did not generally project intentional
misstatements more frequently than unintentional misstatements. However, the
results suggest that auditors isolation/projection decisions are significantly influenced by whether or not the sample misstatements were systematic; specifically,
auditors tend to project systematic misstatements more often than they isolate
them, providing support for H2.
This study also breaks new ground by bringing state auditors into the existing
research performed on auditors projection decisions regarding the evaluation
of sample findings. No other study investigating auditors evaluation of sample
findings use governmental auditors in their empirical tests. Prior research
has focused exclusively on external auditors whose pressures and incentives
are perceived to differ from those of the state auditors that participated in
this study. For example, external auditors have greater litigation risk than do
governmental auditors, while governmental auditors may have lower thresholds
for audit risk and materiality due to various legal and regulatory reporting
requirements.
One unexpected finding of the study is the frequency with which the experimental cases were isolated, especially considering that every manipulation in each of
the four treatment cases, normatively, should have been projected. Across all four
treatment cases, auditors isolated 49% of the sample misstatements. Even when
auditors perceived the misstatements to be systematic, approximately one-quarter
of the sample misstatements were not projected to the underlying populations
from which they were drawn. This finding may indicate that state auditors do not
72
73
but it involves a computer system malfunction. The lack of uniformity in the operationalization of systematic misstatements is a weakness of the study. However, the
results were analyzed using both the initial manipulations of systematic misstatements and the participating auditors assessments of whether the misstatements
were systematic, and in both analyses, the systematic manipulations are almost
all highly significant in explaining the auditors isolation/projection decisions.
Future research could address how auditors recognize and interpret the systematic
nature of misstatements and how that affects the auditors decision processes.
The case scenarios were set up in random order to minimize potential order
effects. Once the order of the scenarios had been selected for each participant, a
specific manipulation of the two independent variables was assigned to each of the
four treatment cases in a manner that insured every participant received each of the
four combinations of the 2 2 design (as illustrated in Fig. 1). While the process
of randomizing the research instrument in this manner should have minimized any
order effects, our ability to test for order effects was limited given that 28 different
combinations of the research instruments were distributed. Tests conducted that
compared the results of the different instrument combinations did not indicate the
presence of any order effects. In addition, ad hoc measures were developed that
compared the decisions among the different instruments in multiple ways (e.g.
compared the results based on which the case was presented first without regard
to the remaining order). These tests are admittedly imprecise; however, no effects
resulting from the order of the case presentation were noted and the randomization
of both the cases and treatment combinations should have minimized potential
order effects. Nevertheless, the low power of the tests for order effects is a limitation
of the study.
Finally, the use of state auditors as the subject pool limits the comparability of
this study to others that used non-governmental auditors as subjects. While both
governmental and non-governmental auditors must decide whether to isolate or
project sample misstatements to the population being evaluated, the experimental
manipulations may have affected the state auditors isolation/projection decisions
differently than they would have affected non-governmental auditors. Future
research should investigate the differences in audit environments between governmental and non-governmental employers and the impact of those differences
on the actions of the auditors.
NOTES
1. Generally accepted governmental auditing standards (GAGAS) incorporate AICPA
standards relevant to financial statement audits unless the General Accounting Office (GAO)
excludes them by formal announcement (GAO, 1994, p. 32).
74
2. Auditing standards (AICPA, 2001, AU350.11) divide the risk that a sample may be
non-representative of the population into sampling risk and non-sampling risk. Sampling
risk is the inherent risk of sampling that arises simply because less than the entire population
is examined. Non-sampling risk consists of risks not due to the sample selected but instead
involves risks associated with evaluating the sample, such as an auditors failure to recognize
exceptions in the sample selected, and the auditors inappropriate or ineffective application
of audit procedures.
3. Only one study conducted on auditors isolation/projection decisions (Wheeler et al.,
1997) used a complete design. They used a full 3 2 design to test the impact of containment
information on auditors sampling decisions and did not test either factor (intentional or
systematic misstatement) investigated in this study. In addition, Wheeler et al. used a single
case scenario in their study whereas we use four different case scenarios.
4. In this study, the delineation between systematic and non-systematic may be more
precisely described as more systematic and less systematic, because almost every
misstatement will have certain characteristics that could be construed as systematic.
5. Analyses of the data excluding the seven auditors who may potentially lack a
background in accounting indicate no significant differences from the reported results.
6. Because the manipulation of whether a misstatement is intentional is rather welldefined, analyses were also conducted that excluded the participants that initially failed
the INT manipulation check. The results were substantially similar to those presented in
the paper.
7. The c statistic is derived by comparing the number of paired responses (of
observed and predicted responses) in the data set. It is defined by the equation: c =
(nc + 0.5(t nc nd))/t, where t is the total number of pairs with different responses,
nc is the number of concordant response pairs, nd is the number of discordant response
pairs, and t nc nd is the number of ties between the response pairs.
8. The odds ratio is calculated by exponentiating the parameter estimates (variable coefficients) using the natural log (Stokes et al., 1995). For example, if the parameter estimate
is 1.2528, then the odds ratio is 3.50 (e1.2528 = 3.500).
9. The data were also analyzed using repeated measures analysis of variance (ANOVA)
by combining the auditors isolate/project decision with the comfort of their decision. The
resulting analysis yielded very similar results to the logistic regression presented.
ACKNOWLEDGMENTS
We appreciate the helpful comments of Richard Dusenbury, Randy Elder,
David Gilbertson, Julia Higgs, Bill Hopwood, Dennis OReilly, Steve Wheeler,
participants at the 1999 Southeast Regional AAA and 2000 Auditing Section
Midyear meetings, two anonymous reviewers, and the editor.
REFERENCES
Akresh, A., & Tatum, K. (1988). Audit sampling dealing with the problems. Journal of Accountancy
(December), 5864.
75
American Institute of Certified Public Accountants (AICPA) (2001). AICPA Professional Standards
as of June 30th, 2000 (Vol. 1). New York, NY: AICPA.
Anderson, B. H., & Maletta, M. (1994). Auditor attendance to negative and positive information: The effect on experience-related differences. Behavioral Research in Accounting (6),
120.
Ashton, A. H. (1991). Experience and error frequency knowledge as potential determinants of audit
experience. The Accounting Review (April), 218239.
Ashton, A. H., & Ashton, R. H. (1988). Sequential belief revision in auditing. The Accounting Review
(October), 623641.
Burgstahler, D., & Jiambalvo, J. (1986). Sample error characteristics and projection of error to audit
populations. The Accounting Review (April), 233248.
Dusenbury, R., Reimers, J., & Wheeler, S. (1994). The effect of containment information and error
frequency on projection of sample errors to audit populations. The Accounting Review (January),
257264.
Elder, R. S., & Allen, R. D. (1998). An empirical investigation of the auditors decision to project
errors. Auditing: A Journal of Practice and Theory (Fall), 7187.
General Accounting Office (GAO) (1994). Government Auditing Standards: 1994 Revision.
Washington, DC: Comptroller General of the United States.
Green, S. L. (1992). Behavioral research in governmental and nonprofit accounting: An assessment of
the past and suggestions for the future. Research in Governmental and Non-prot Accounting
(7), 5378.
Hermanson, H. M. (1997). The effects of audit structure and experience on auditors decisions to isolate
errors. Behavioral Research in Accounting, Suppl. (9), 7693.
Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology (July), 430454.
Kane, G. D., Richardson, F. M., & Graybill, P. (1996). Recession-induced stress and the prediction of
corporate failure. Contemporary Accounting Research, 13(2), 631642.
Kida, T. (1984). The impact of hypothesis-testing strategies on auditors use of judgment data. Journal
of Accounting Research (Spring), 332340.
Libby, R. (1985). Availability and the generation of hypotheses in analytical review. Journal of
Accounting Research (Autumn), 648667.
Libby, R., & Frederick, D. M. (1990). Experience and the ability to explain audit findings. Journal of
Accounting Research (Autumn), 348367.
MacDonald, E. (2000). Whats Wevenue? Auditors Miss a Fraud and SEC tries to put them out of
business scam at California Micro was well-hidden, says lawyer for Coopers duo CFOs
misleading resume. Wall Street Journal (January 6), A1.
Office of Management and Budget (OMB) (2001). A citizens guide to the federal budget, scal year
2002. Washington, DC: U.S. Government Printing Office.
Random House Websters College Dictionary (1991). New York, NY: McGraw-Hill.
Stokes, M. E., Davis, C. S., & Koch, G. G. (1995). Categorical data analysis using the SAS system.
Cary, NC: SAS Institute.
Trotman, K. T., & Sng, J. (1989). The effect of hypothesis framing, prior expectations and cue
diagnosticity on auditors information choice. Accounting, Organizations and Society, 14(5/6),
565576.
Wheeler, S., Dusenbury, R., & Reimers, J. (1997). Projecting sample misstatement to audit populations: Theoretical, professional, and empirical considerations. Decision Sciences (Spring),
261278.
76
APPENDIX
The treatment cases included in the experimental instrument are given below.
Cases 14 represent the four combinations of the complete 2 2 design that
tests two sample misstatement manipulations: intentional or unintentional misstatement and systematic or non-systematic misstatement. The unintentional and
non-systematic misstatement manipulations are italicized first, followed by the
manipulations for intentional and systematic misstatements that are also italicized
but in parentheses.
Case 1 (Sales)
Sales Account No. 77491 was understated by $945.16. It was determined that
a temporary clerical employee, who worked during a two week period in April,
mistakenly (deliberately) misfooted sales invoices for the account. The clients
controller indicated that this was the only temporary employee (one of 25
temporary employees) used to process sales transactions.
Case 2 (Inventory)
During a physical inventory observation, it was discovered that inventory item No.
245-0672 (cleaning chemicals) was understated by 23 items valued at $50 each.
Further investigation revealed that a warehouse employee temporarily placed
the items in the breakroom to restock the companys supplies closet (temporarily
placed the items in the breakroom with the intent of taking them home for his personal use) (the breakroom is adjacent to the companys supplies closet). A review
of the companys internal audit workpapers for the last two years, which report
on periodic surprise inventory test counts, revealed no similar instances (revealed
several similar instances) in which inventory was improperly segregated by
warehouse workers.
Case 3 (Receivables)
Receivables Account No. 16788 was overstated by $59. The misstatement was
discovered when the auditor compared the price on the selected sales invoice to the
clients approved master price list in effect at the date of the sale. An investigation
into the matter revealed that a salesperson overcharged the customer for the item
77
when she inadvertently read the price of the next item on the master price list (to
increase her sales commission). The clients accounting system was temporarily
down when the item was ordered and the transaction had to be manually processed.
When the system is operating, it cannot process transactions (it allows overrides
of transactions) if the price of the item is not within the approved master price
range. It was estimated that the system was down 35% of the time during the year.
INTRODUCTION
While many agree that source credibility is important to lending decisions, how
negative source credibility impacts lender decisions is less understood. Some suggest that loan structure (i.e. collateral and covenants) can be used to compensate
for negative source credibility (e.g. Mather, 1999; Oldham, 1998), while others
Advances in Accounting Behavioral Research
Advances in Accounting Behavioral Research, Volume 6, 7994
2003 Published by Elsevier Ltd.
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06004-6
79
80
maintain that loan officers should not trade off perceived weaknesses in source
credibility with tighter structure. The risk of attempting to counterbalance flawed
character with loan structure is too great; a safer approach would be to avoid a
business relationship than to trust the applicants financial representations (e.g.
Pace & Simonson, 1977).
Research on whether lenders compensate for perceived weakness in source
credibility by imposing tighter loan structure requires joint study of loan approval
and loan structure decisions. However, the literature on how loan officers react to
negative source credibility has focused on loan approval (e.g. Beaulieu, 1994) or
loan structure (Mather, 1999), but not both. Thus, the primary contribution of this
paper is to determine whether the tradeoff exists.
Source credibility was manipulated in the experiment to be either positive
(suggesting a credible source) or negative (suggesting a non-credible source).
Data were collected using a computerized process-tracing program, which
collected information on decision effort, perceptions of the credibility of projected
accounting information, loan approval/denial, and loan structure. The results
indicate that loan officers will deny loans to less credible clients rather than
restructure the conditions of the loan, and that they will not structure loans more
restrictively regardless of whether they were in the positive or negative character
condition or whether they approved or denied the loan.
81
82
treated as a more subtle, complex, and practical issue than is done in most prior
research and participants are given more freedom to judge source credibility.
Prior research in lending has found that source credibility affects loan approval
(Beaulieu, 1994) and loan structure (Mather, 1999) judgments. However, loan
approval and loan structure are not separate, independent judgments even though
they have been examined separately. To more completely understand the effect of
source credibility on lending decisions, both judgments need to be simultaneously
examined. Doing so provides a more comprehensive understanding of how
loan officers react to negative source credibility, and in particular, whether
they compensate for negative source credibility by restrictive loan structure or
whether they simply deny the loan request at the outset. This is our basic research
question.
This research question is important because it focuses on shortcomings in the
current literature and helps resolve the debate on how loan officers react to negative
source credibility. Framing the research question in terms of a tradeoff between
loan approval and structure allows us to investigate whether Mathers findings
(1999) that source credibility affects loan structure would hold if loan officers
were permitted to deny loans. Similarly, while Beaulieu (1994) documented that
more loan officers denied loan applicants with negative source credibility than
those with positive source credibility, there is no evidence whether loan candidates
with negative source credibility who were approved received more restrictive loan
structures than those who were denied or those who had positive source credibility.
If lenders do not structure approved loan candidates with negative source credibility
more restrictively then there is no consequence to candidates with negative source
credibility that would protect lenders.
Loan Approval/Denial and Loan Structure
Commercial lending experts recommend that loan officers evaluate source credibility, in the form of a character judgment, as soon as contact with a prospective
borrower has been made. If character is not of sufficient quality, then analyzing
credit further or considering alternative loan structures may not be worthwhile.
This preliminary character judgment is the first hurdle of lending (McDonald &
McKinley, 1981; Pace & Simonson, 1977). Stephens (1980) confirmed that loan
officers want information about the applicant before examining the details of the
loan. This position can also be inferred from Eisenreich (1981, p. 9):
Since the majority of information will come from the borrower . . . the lender must have confidence in the raw material of the judgment. If not or if critical facts cannot be verified, the lender
cannot make the decision. It would be a gamble rather than a calculated risk.
83
The above direct quote conflicts with the advice offered by other commercial
lenders cited earlier (Eisenreich, 1981; McDonald & McKinley, 1981; Pace &
Simonson, 1977). It seems to advocate both screening borrowers of questionable
credibility and using loan structure to work with them. While prudently this should
be the exception rather than the rule, loan officers may use the exception to rationalize loan approvals.
Which reaction is more likely to occur is an open issue. Beaulieu (1994)
found that character had a significant main effect on loan officers loan decisions
(approval or denial) and that it interacted with accounting information to affect
both decisions and estimates of risk of nonpayment. Specifically, loan decisions
and risk estimates responded significantly to a change in the strength of accounting information when character was positive, but not when it was negative.
Participants in Beaulieus study were told to assume, in a loan application case,
that structure of the proposed loan would be determined by the banks policy
at competitive terms and that collateral would be available to meet the banks
guidelines for that type of loan. They had no opportunity to adjust loan covenants
or collateral. In contrast, Mather (1999) instructed his subjects that loans had
already been approved, so that only the loan structure task was required. Under
these conditions, Mather found that loan officers set more restrictive loan structure
when credibility was unknown than when it was positive.
An objective of the current study is to help to resolve the debate by providing
evidence as to whether lenders simply deny a loan (H1) consistent with Beaulieu
(1994) or select collateral and covenants levels to compensate for weaknesses in
source credibility (H2) consistent with Mather (1999). Essentially, H1 and H2
are competing hypotheses. Because the guidance in the literature is at odds, the
hypotheses are stated in the null form.
84
H1. There will be no difference in the proportion of loan officers who will
approve loans when character of the borrower is positive than when character
is negative.
H2. There will be no difference in proposed loan structure between loan officers
receiving negative and positive character information.
Process Effort
Loan officers make a critical decision regarding how much effort to expend
when they evaluate a loan candidate. Rosman and Bedard (1999) find evidence
that lenders will structure loans more restrictively when they expend less effort.
However, Rosman and Bedard do not consider the relationship between effort
and loan structure restrictiveness in light of weaknesses in a potential borrowers
character.
When character is perceived to be weak but not entirely non-credible, the lender
may pour more effort into the file to check on the initial negative impression
of character and to relate character judgments to other information provided,
especially accounting information. This possibility is motivated by the fact that
initial impressions of character and personality can be incorrect (Korem, 1997).
That is, loan officers may consider approving a loan if no aspect of presentation
in the financial statements encourages caution, even though assessments of
managements credibility raise doubts about their character.2 Increasing decision
effort in such situations reduces concerns raised by initial negative character
judgments that do not push loan officers past a threshold where they feel that they
must deny loans. Increased processing effort, as a response to negative (but not
extremely so) character information, is consistent with Shaubs (1996) finding
that auditors lacking trust in a client will recommend more work in their audit
plans. It is also consistent with Beaulieu (2001), in which recommended evidence
collection was negatively related to a CFOs integrity.
The other option available to loan officers when character judgments are
sufficiently negative is to deny loans because such credits do not clear the
first hurdle of commercial lending (Pace & Simonson, 1977). This implies
that information processing will be terminated quickly when the character of
borrowers is so negative that they are considered non-credible.
Options one and two (checking initial impressions of character and relating
it to accounting information, and denying loans without checking) require more
and less processing effort, respectively, than an average or baseline credit with
positive character information. It may not be obvious to loan officers whether the
85
METHOD
Procedure
Decision process and outcome data were collected using Search Monitor, which is a
computerized process-tracing program (Biggs et al., 1993; Brucks, 1988; Rosman
& Bedard, 1999). Search Monitor is interactive, menu-driven software that presents
case materials to participants and captures a complete trace of selected processes
including cue acquisition, acquisition order, and time to examine cues.
Subjects were advised at the beginning of the Search Monitor task that a
commercial loan applicant was seeking a loan package that included short- and
medium-term financing. The case used in this study integrates the case materials
used by Beaulieu (1994), which validated the source credibility measures, and
Rosman and Bedard (1999), which validated the realism of the lending task and
related measures.
The loan applicant, a manufacturer of chemical products, was briefly described,
including the contact person with the firm, its CFO. Further information about the
firm was accessed via a menu having six categories of financial and qualitative
data: profitability, inventory turnover, liquidity, and financial leverage & capital
structure (financial); and management and industry & product (qualitative). Each
of the four categories of financial data consisted of three ratios (and the dollar
values of numerators and denominators), divided into historical (years 2, 1
and 0) and projected (years +1 and +2) information. Case information indicated
that the historical information was given a clean audit opinion, while no opinion
had been expressed regarding the projected figures.
For example, the following menu was presented to participants who selected
profitability information.
(1)
(2)
(3)
(4)
(5)
(6)
86
The order of the six cues was randomized differently each time a participant
returned to the menu. Participants could move both within each of the six
categories of information and between categories as they wished. When they
indicated that they had finished selecting and viewing information, they were
given a series of screens to register their recommendations about the loan.
Approval or denial of the loan was requested, assuming an interest rate set at one
percentage point above prime, followed by loan structure recommendations.3
Participants who recommended denial were told that although they did not
recommend approval, they had been asked to provide input on how to structure the
loan in the event that the loan committee recommended approval. This step was
necessary so that H2 could be examined. That is, even if a loan did not pass the initial
character judgment hurdle (see Pace & Simonson, 1977, discussed earlier), this
step ensures a test of the tradeoff between structure and character that is suggested
by some of the literature, including positive accounting theory. Combined, H1 and
H2 provide a stronger test of the two competing points of view that have been
expressed in the literature.
Four loan structure recommendations were requested (see below). Twelve
responses were provided for each, corresponding to ranges of percentages that
varied, depending on the item.4,5
(1) Percentage of loan principal for which an equivalent amount of assets will be
collateralized.
(2) Level of profitability (ratio of net income to average equity) to be maintained.
(3) Level of liquidity (ratio of cash flows to fixed cash commitments) to be maintained.
(4) Level of leverage (ratio of total liabilities to equity) to be maintained.
The loan structure recommendations were followed by a question asking
participants to indicate confidence in their structure judgments on a nine-point
scale. Finally, two questions asked participants to rate the credibility of historical
financial information and managements financial projections, also on nine-point
scales.
The character information used in this study was adapted from Beaulieu
(1994), which contains a complete description of the development and validation process. As shown in Table 1, character was manipulated between-subjects in
two places in the Search Monitor program. First, either positive or negative character information regarding the CFO was provided in an introductory screen and was
seen by all participants in either condition of the experiment. Second, participants
could select more information about the CFO via the management information
menu. Those selecting the additional information received either a positive or
negative description, depending on the condition to which they had been assigned.
87
Positive Character
Negative Character
Introductory screen
viewed by all
participants
a The
sentences in italics were rated as neutral, not providing information about character, in Beaulieu
(1994). They were not written in italics in Search Monitor.
Participants
Twenty-five bankers representing 11 banks in New England participated in the
study. There were no statistically significant differences between the 14 bankers
in the positive source credibility condition and the 11 bankers in the negative
condition on the following dimensions: years in banking, education level, and
loan size experience. On average, the 25 participants had 17.8 years of banking
experience (range of 826 years). All but three of the bankers had a college
education. The bankers had experience with loans that ranged from $1,000,000
88
RESULTS
Manipulation Check
The potential for source credibility to impact the perception of the credibility of
accounting projections is important because projected accounting information is a
standard component of loan applications (Danos et al., 1989), and is not audited.
This type of credibility judgment is different than other credibility judgments that
are made in equity markets, because the latter are objective assessments of the
accuracy of management forecasts (e.g. Hirst et al., 1999). In contrast, source
credibility in the lending context is a subjective consideration of the prior behavior
of management that is made because there is no objective public record of management forecast accuracy. The credibility of projected unaudited information is a
judgment that precedes loan approval and loan structure and is used to assess the
success of the manipulation.
Mean source credibility ratings of projected accounting information were
evaluated on a nine-point scale (1 = low, 9 = high). Subjects rated the credibility
of projected information to be higher in the positive condition than in the negative
condition (5.43 vs. 4.18, t = 1.63, p = 0.06, one-tailed). Credibility of the
historical, audited financial information was also judged on the same nine-point
scale. The mean ratings were 6.27 in the negative character condition and 7.14
in the positive condition (t = 1.13, p = 0.27).6 Therefore, any effects of
the manipulation of information about the CFOs character on loan decisions,
structure recommendations, and processing effort result from changes in the
credibility of projected, rather than historical, accounting information.
Hypothesis Testing
H1 investigated whether loan officers would simply deny loans if they become
sufficiently concerned about character and source credibility. All loan officers
given the positive character information about the CFO approved the loan (100%
of 14), as did 8 of the 11 given the negative version (73%). The 2 statistic is 4.34
( p = 0.037). Thus, the null hypothesis is rejected. H2 investigated whether loan
officers would adjust loan structure to compensate for negative source credibility.
Table 2 reports the four mean loan structure recommendations (collateral and
89
Negative Character
Positive Character
t-Statistic ( p)
10.5
10.9
0.60 (0.56)
3.6
7.5
5.0
3.1
8.7
6.1
0.82 (0.42)
1.08 (0.29)
0.84 (0.41)
16.1
17.9
0.94 (0.36)
a As
in Rosman and Bedard (1999), each of the four scales (one for collateral and three for covenants)
consisted of twelve responses. Each response represented a range of percentages, for example 1020%
of assets collateralized, but the ranges of percentages differed among the four scales. For all four scales,
response 1 indicated 0% and 12 indicated the maximum percentage. Thus, the maximum total score of
collateral and covenants possible is 48 (4 12).
b These scores have been converted as described in Note 4.
90
Table 3. Additional Analyses: Mean Structure Recommendations for those Who did not Deny the Loan.a
Within Negative Condition
Deny
(n = 3)
t-Statistic ( p)
Negative
Character (n = 8)
Positive
Character (n = 14)
t-Statistic ( p)
10.3
11.0
0.66 (0.53)
10.3
10.9
0.66 (0.53)
3.5
8.1
4.7
3.7
6.0
5.7
0.19 (0.85)
1.21 (0.26)
0.40 (0.70)
3.5
8.1
4.7
3.1
8.7
6.1
0.64 (0.53)
0.53 (0.60)
0.96 (0.35)
16.4
15.3
0.35 (0.73)
16.4
17.9
0.72 (0.48)
26.6
26.3
0.09 (0.93)
26.6
28.8
0.85 (0.40)
a As
in Rosman and Bedard (1999), each of the four scales (one for collateral and three for covenants) consisted of twelve responses. Each response
represented a range of percentages, for example 1020% of assets collateralized, but the ranges of percentages differed among the four scales. For all
four scales, response 1 indicated 0% and 12 indicated the maximum percentage. Thus, the maximum total score of collateral and covenants possible
is 48 (4 12).
b These scores have been converted as described in Note 4.
Did Not
Deny (n = 8)
91
Positive Character
18.3
14.3
6.352.6
21.3
7.5
10.735.2
3.64
0.032
20.1
8.4
733
21.2
10.5
947
1.55
0.490
and 10.5, respectively (F = 1.55, p = 0.490). Thus, the variance of effort choices
was similar with respect to the quantity of information examined, but not with
respect to time spent examining it.
92
loan officers react to negative source credibility, and they do so by denying loans,
while the majority do not react in terms of the final decisions to approve a loan
or to structure it restrictively. In short, proportionally few loan officers reacted to
negative source credibility, but when they did, they denied loans rather than accept
the loan and handle their concerns with loan structure.
In hindsight, these results mirror the reaction of the stock analyst community
to Enron. Those analysts who doubted Enron a year before its bankruptcy were
few and far between, but they did so by using their assessment of source credibility as the lens through which to analyze the numbers. Enrons management
was notorious for dealing arrogantly with analysts and being unable to produce
financial information. This created an environment of distrust in which patterns of
transactions that were questionable could be pieced together. The advice of one
analyst who sold Enron stock short was simple: Test what a company says; dont
take it at face value. In other words, it is necessary to assess the credibility of the
source of the information in order to be able to understand the information itself
(Bailey, 2001, p. F1).
As is true of experimental research, the ability to generalize results both
to other tasks and other financial statement users (in commercial lending and
elsewhere) is limited. In particular, although the indicators of character used in
this experiment have been validated in other research (Beaulieu, 1994, 1996),
subtle changes in the apparent financial strength of firms, task or context may
encourage financial statement users to select other signals of source credibility.
Other sources of credibility, especially external audits, may become relatively
more or less important, depending upon task and context. For example, concerns
about accounting for intangible assets may upset the current balance of users
reliance upon source credibility vs. credibility derived from audits. Our objective
is to encourage thought and research about this balance, and about the type of
credibility information that different users employ.
NOTES
1. Hirst et al. (1999) did not explain to participants how forecast accuracy was calculated.
2. An example of a presentation that encourages caution is writing off all bad debts in a
single period, making it difficult to chart profitability (Ruth, 1987).
3. We do not examine pricing, that is to charge interest sufficiently above prime rates to
accommodate even the worst credit risks. It is difficult for loan officers in the United States
to price-protect themselves, because the commercial lending market is very competitive
and there is as little as a two-point spread separating prime from high-risk borrowers
(Emmanuel, 1989).
4. Consistent with Rosman and Bedard (1999), collateral was represented to the lenders
on a 12-point scale, which ranged from 0% to more than 100% in 10% increments.
93
Profitability ranged from 0% to more than 50% of the ratio of net income to average
equity, identified in 5% increments. Liquidity ranged from 0% to more than 150% of
the ratio of cash flows to fixed cash commitments, in 15% increments. Leverage ranged
from 0% to more than 70% of the ratio of total liabilities to equity, in 7% increments.
The upper bounds differ due to variation in the normal range of these ratios. The leverage
covenant was converted to a revised measure (i.e. 13 x, where x is the value selected
by the participant) so that the direction of each scale was similar.
5. In contrast, Mather (1999) asked subjects to make judgments as to the number of
covenants they would seek and how tightly they would be imposed. However, the nature of
the covenants was not specified.
6. A potential concern regarding the experiment is that some participants may not have
seen all of the character information. As explained in Table 1, two facts in each condition
of the experiment were viewed only if selected. If a number of participants did not select
the additional screen about the CFO, the strength of the character manipulation would not
have been consistent. Ten of the 11 participants in the negative condition and 13 of 14 in
the positive condition accessed the optional CFO information. In total, 23 of 25 participants
investigated the CFO, evidence that the character manipulation was consistent across conditions, and that character and source credibility were important to the participants. Both
participants who did not access the additional character information, one in the negative
condition and one in the positive condition, approved the loan.
ACKNOWLEDGMENTS
The authors thank Jean Bedard, Karla Johnstone, Marlys Lipe, Inshik Seol, Kathy
Wilkicki and two anonymous reviewers.
REFERENCES
Bailey, S. (2001). Right on the money. The Boston Globe (December 5th), F1.
Beach, L. R., Mitchell, T., Deaton, M., & Prothero, J. (1978). Information relevance, content and source
credibility in the revision of opinions. Organizational Behavior and Human Performance, 21,
116.
Beaulieu, P. (1994). Commercial lenders use of accounting information in interaction with source
credibility. Contemporary Accounting Research, 10(Spring), 557585.
Beaulieu, P. (1996). A note on the role of memory in commercial loan officers use of accounting and
character information. Accounting, Organizations and Society, 21(August), 515528.
Beaulieu, P. (2001). The effects of judgments of new clients integrity upon risk judgments, audit
evidence, and fees. Auditing: A Journal of Practice & Theory (Fall), 8599.
Biggs, S., Rosman, A., & Sergenian, G. (1993). Methodological issues in judgment and decisionmaking research: Concurrent verbal protocol validity and simultaneous trace of process. Journal
of Behavioral Decision Making, 6, 187206.
Brucks, M. (1988). Search monitor: An approach for computer-controlled experiments involving consumer information search. Journal of Consumer Research, 15, 117121.
94
Coleman, D., & Irving, G. (1997). The influence of source credibility attributions on expectancy theory
predictions of organizational choice. Canadian Journal of Behavioural Science, 29(April),
122131.
Danos, P., Holt, D., & Imhoff, E. (1989). The use of accounting information in bank lending decisions.
Accounting, Organizations and Society, 14, 235246.
Eisenreich, D. (1981). Credit analysis: Tying it all together Part I. Journal of Commercial Bank
Lending (December), 213.
Emmanuel, C. (1989). Limiting exposure to fraudulent financial reporting. The Journal of Commercial
Bank Lending (September), 1627.
Gotlieb, J., & Sarel, D. (1991). Comparative advertising effectiveness: The role of involvement and
source credibility. Journal of Advertising, 20(1), 3845.
Grewal, D., Gotlieb, J., & Marmorstein, H. (1994). The moderating effects of message framing and
source credibility on the perceived price-risk relationship. Journal of Consumer Research,
21(June), 145153.
Hirst, D. E., Koonce, L., & Miller, J. (1999). The joint effect of managements forecast accuracy and
the form of its financial forecasts on investor judgment. Journal of Accounting Research, 37,
101123.
Kelley, H. (1972). Attribution in social interaction. Morristown, NJ: General Learning Press.
Korem, D. (1997). The art of proling: Reading people right the rst time. Richardson, TX: International
Focus Press.
Maines, L. (1990). The effect of forecast redundancy on judgments of a consensus forecasts expected
accuracy. Journal of Accounting Research, 28(Suppl.), 2947.
Mather, P. (1999). Financial covenants and related contracting processes in the Australian private debt
market: An experimental study. Accounting and Business Research, 30(1), 2942.
McDonald, J., & McKinley, J. (1981). Corporate banking: A practical approach to lending.
Washington, DC: American Bankers Association.
Oldham, J. (1998). The killer character component. The Secured Lender, 54(November/December),
6266.
Pace, E., & Simonson, D. (1977). The four hurdles of lending. The Journal of Commercial Bank
Lending (March), 1015.
Rosman, A., & Bedard, J. (1999). Lenders strategy selection in loan structure decisions. Journal of
Business Research, 8394.
Ruth, G. (1987). Commercial lending. Washington, DC: American Bankers Association.
Shaub, M. (1996). Trust and suspicion: The effects of situational and dispositional factors on auditors
trust of clients. Behavioral Research in Accounting, 8, 154174.
Stephens, R. (1980). Uses of nancial information in bank lending decisions. Ann Arbor, MI: UMI
Research Press.
Watts, R., & Zimmerman, J. (1986). Positive accounting theory. Englewood Cliffs, NJ: Prentice-Hall.
95
96
1. INTRODUCTION
Arthur Levitt, while chair of the Securities and Exchange Commission (SEC),
announced a focus on firms that manage earnings (Levitt, 1998). He unfolded an
action plan to address earnings management. Initiatives included better accounting
practices, standards and interpretative guidelines, stricter SEC focus on earnings
management, a review of audit practices, and a call for a cultural change in the
business world regarding the acceptance of earnings manipulations. While the
SEC can address most of these concerns with better standards and practices,
changing the culture of business is more complex. It involves changing the behavior of individuals. Research needs to be conducted that addresses why individuals
manage earnings. Such research is important to future accounting practices.
The purpose of this study was threefold. First, earnings management was
experimentally examined in a managerial accounting setting. Previous empirical
research has examined earnings management at the corporate level indirectly
through the analysis of financial results.1 Researchers typically study discretionary
management decisions (i.e. write down of impaired assets) via publicly available
information and infer whether earnings management has occurred based on a
comparison of actual financial results to some expectation (Rees et al., 1996;
Zucca & Campbell, 1992). Rather than taking this approach in identifying
earnings management behavior, this study behaviorally examines whether bonus
plans influence managers decisions.
The second purpose of the study was to investigate earnings management at
the divisional level rather than the overall corporate view, looking at what occurs
within the firm.2 A survey by Buck Consultants of Fortune 1000 companies
found that 61% of U.S. companies offer variable compensation plans below the
executive level, and another 27% are considering them (Wilson, 2001). This
increase in bonus type plans creates greater incentive for earnings management.
Earnings management occurs at the corporate level due, in part, to managers
efforts to achieve incentive compensation based targets (Watts & Zimmerman,
1978). Schipper (1989) states, Clearly, compensation schemes and divisional
managers private information create a potential incentive to manipulate internal
managerial accounting reports. If performance of managers at the lower levels
of the firm is also measured based on these types of targets, then the possibility
97
exists that earnings management could occur at these levels. Managers could use
various means to manipulate earnings, from writing off low value inventory items
to controlling the timing of shipments to customers. The outcome of some of
these methods could be buried in the results of normal operations and therefore
might not be obvious at the corporate level. Alternatively, the consolidation of
this manipulated divisional income could result in significantly greater earnings
management at the corporate level than previously estimated. This division
level earnings management could be a potential intervening variable, which
has led to conflicting results in at least one published earnings management
study.3
The third purpose was to examine the effects of framing on earnings management. Subjects were presented with information pertinent to a discretionary
managerial decision from both a negative and positive viewpoint. Kahneman
and Tversky (1979) theorize that the way information is framed can impact
decision-making. This study looks at the potential impact of the information
frame on the decision to write off inventory.
The results support both earnings management and framing hypotheses.
Findings suggest that management accountants are more apt to write off inventory
when: (1) their personal wealth is unaffected; and (2) information is framed
negatively. An important contribution of this research is the fact that information
framing can have an impact on the earnings management decision. The probability
of writing off inventory was higher, although insignificant, for participants with
negatively framed information, even though their personal wealth decreased, than
those with positively framed information who were not eligible for a bonus. The
management accounting implication of these results is that managers decisions
could be influenced by the way information is presented.
This paper is organized in the following manner. Background and hypotheses
are developed and presented in Section 2. The research design and methodologies
used to test the hypotheses are presented in Section 3. Results are shown in
Section 4 and finally, Section 5 presents contributions and implications for further
study.
98
these decisions affect a firms cash flows and reported net income. The following
hypotheses examine the decision making behavior of managers.
Fig. 1. Predicted Outcomes Based on Bonus Maximization Theory. Note: Adapted from
Healy (1985).
99
100
a positive manner, the manager is expected to be less likely to classify the item
as obsolete and not write it off. Conversely, when information about the item is
presented from a negative viewpoint, the item is more likely to appear obsolete and
be written off.
101
102
103
was set at $1,502,000, just above the threshold, so that an inventory write-off
would reduce net income below the threshold, eliminating the managers bonus.
With the negative initial reference point, a statement was included that infers
early results indicate that actual income will be lower than budgeted income.
Net income was set at $1,400,000, well below the threshold, to remove any
possibility that the threshold could be achieved. This operationalization simulated
a situation where the manager had the opportunity to reduce current years
104
earnings with no impact on personal wealth and improve prospects for subsequent
years.
The italicized line in the second paragraph of the scenario shown in Appendix
indicates where these statements were placed in the survey instrument with the
negative initial reference point shown in brackets. The first independent variable,
INCOME, was a result of the manipulation of this initial reference point. Analysis
of this variable was conducted as a between-subjects design.
3.2.2. Selection of the Discretionary Decision
Previous research has hypothesized that many types of discretionary decisions are
used to manage earnings at the corporate level. Studies have examined the timing
of recognition of extraordinary items (Barnea et al., 1976; Ronen & Sadan, 1975;
Walsh et al., 1991), write down of impaired assets (Zucca & Campbell, 1992), the
provision for bad debts (McNichols & Wilson, 1988) and non-recurring charges
(Elliott & Shaw, 1988). All have found support for earnings management at the
corporate level.
Hepworth (1953) suggests that the inventory valuation process can be used as a
less obvious method of income smoothing. In a study of business unit managers,
Guidry et al. (1999) tested an inventory model of earnings management along with
two other previously tested models. Evidence of earnings management was found
to be the strongest in the analysis of the inventory reserve account. They suggest
that this occurs due to information asymmetry that exists between these managers
and upper level management related to inventory valuation.
The decision to write off inventory involves considerable management discretion. Accounting Research Bulletin 43 (FASB, 1992) addresses the inventory
write-off in the following manner:
Thus, in accounting for inventories, a loss should be recognized whenever the utility of goods
is impaired by damage, deterioration, obsolescence, changes in price levels, or other causes.
The measurement of such losses is accomplished by applying the rule of pricing inventories at
cost or market, whichever is lower (Stmt. 5, Para. 8).
105
managers across the firm (all trying to achieve income targets) decide to write off
small valued inventory items, the write-offs have the potential to be material in
the aggregate. In addition, they are probably the most common method of writing
off inventory (Hepworth, 1953). Therefore, immaterial inventory write-offs were
selected as the discretionary decision in this study.
To operationalize this inventory write-off, the value of the inventory item
involved was set at $15,000. A number of factors were considered when setting the
dollar level of the potential discretionary decision. The amount was set at about
4% of inventory, considered immaterial in value. An immaterial value was chosen
for a number of reasons. First, if written off, the amount would be buried in cost
of goods sold. Therefore it would not be obvious to outsiders, and probably not be
detected by auditors. These decisions would be the type described by Hepworth
(1953). Second, most managers would be expected to act conservatively and
write off the amount. Therefore, differences in decision-making would basically
be due to either bonus implications and/or information framing. In addition to
materiality considerations, if the write-off would take place, the amount is large
enough to cause the income level to fall below budget expectations for those
receiving information that income was above the threshold. Obviously, for those
below the threshold, the write-off would have no impact on bonuses this year.
3.2.3. Expected Payoffs
The independent variables were designed with specific expected payoffs in mind.
In the case where income is greater than budget, estimated net income was
specifically established at a level at which the write-off of the inventory item
would result in net income falling below budgeted levels, thus eliminating the
managers bonus. Where income is less than budget, since budgeted net income
had not been achieved, the write-off of inventory would have no impact on the
bonus. Figure 4 indicates the values of these expected payoffs in the year depicted
106
in the scenario. These payoffs are based on the bonus equal to 0.5% of plant net
income ($7,500 if budgeted net income is met). This dollar value was selected to
approximate bonus compensation for plant controllers.6
The only subjects to receive a bonus in the current year were those with income
greater than budget who did not write off the inventory. Since their expected
payoff is $7,510 ($1,502,000 at 0.5%), this group had the greatest opportunity
cost from writing off the inventory. Based on this payoff, these subjects were
expected to be the least likely to write off inventory, in accordance with the bonus
maximization theory.
3.2.4. Framing Operationalization
In general, previous framing research suggests that management may consider
variables that may be unrelated to the actual decision at hand (Johnson et al.,
1991; Kahneman & Tversky, 1984; Lipe, 1993; OClock & Devine, 1995; Puto,
1987). As part of the decision to write off inventory, managers in this study were
presented with various pieces of information about the current status of inventory.
If framing has been found to affect the decision-making process, it could also play
a role in the decision to write off obsolete materials. Each subject was given an
inventory statement to review. The second independent variable, INVENTORY,
was operationalized as a statement about inventory expressed in either a positive
frame or a negative frame. The italicized sentence in Appendix that is part of the
item description indicates the frame. The negative frame is shown in brackets. The
manipulation of the inventory information presentation frame was also conducted
as a between-subject design.
3.2.5. Statistical Analysis
These two independent variables, INCOME and INVENTORY, result in subjects
being assigned to one of four possible treatments shown previously in Fig. 2. The
dependent variable selected for this experiment measured the percent likelihood
that the subject would recommend the write-off of inventory (PROBWO). A 2 2
ANOVA and/or the KruskalWallis Multiple Comparison Tests were utilized to
conduct the analyses of the hypotheses.
3.3. Pretesting
Prior to mailing the survey instruments, two pretests were conducted to provide
evidence for content validity as well as to improve the experimental task. The
first pretest was conducted during the monthly meeting of a local IMA chapter.
Comments provided by the participants were incorporated to improve the scenario
107
before the second pretest was undertaken. In general, participants of the initial
test found the scenario to be incomplete. Specifically, they requested additional
information on the relationship between the value of the write-off and total
inventory. Participants also inquired if the parts could be resold as replacement
parts. A line was added that indicated that no such market existed. The second
pretest was conducted during the monthly meeting of another IMA chapter five
months later. Again, all comments were considered and minor grammatical
changes were made to the experiment. The responses from these 38 pretest
participants have not been included in the final sample.
3.4. Procedure
Dillmans Total Design Method (1978) was employed in the design and mailing
of the questionnaires. Each envelope and cover letter was printed with the
individuals name and address to make the request more personal. Questionnaires
were numerically coded to determine which subjects had responded to the
mailing. The cover letter indicated that this coding was for mailing purposes
only and individual responses would not be associated with names of subjects.
Participants were asked to complete the experiment and were provided with a
stamped, self-addressed envelope for its return.
4. RESULTS
4.1. Response Rate
Table 1 indicates the number of responses. There were 242 (24.2%) responses
from the initial mailing. A second mailing was sent to the non-respondents; an
Table 1. Description of Questionnaire Responses.
Total respondents
Returned to sender
Returned incomplete
Non accountants
Total usable
PP
NP
PN
NN
Total
85
0
6
14
65
86
0
4
10
72
105
2
6
8
89
115
3
6
16
90
391
5
22
48
316
108
additional 149 questionnaires were returned, increasing the overall response rate
to 39.1%. Of the total 391 responses received, 27 were returned either unanswered
or incomplete. Another 48 were from non-accountants. The remaining 316 were
used for the analyses.
4.2. Test for Non-response Bias
Tests for non-response bias were conducted on the final sample of 316 participants.
Mean responses to the participants probability of write-off question from the first
mailing were compared to those of the second. KruskalWallis tests indicated no
significant differences between the two mailings (t > 2 = 0.236). t-Tests were
conducted for years of experience, number of certifications, firm type, type of
degree, and years on the current job. No significant differences were noted.
4.3. Manipulation Checks
The manipulation of the earnings management situation was tested utilizing the
response to the question, Did you achieve the operating budget prior to the
inventory write-off decision? This insured that the subjects knew the position
of estimated net income relative to the budget. Approximately 86% of the
respondents answered the manipulation check for the operating budget correctly.7
The success of the manipulation of the inventory frame was confirmed by analyzing the subjects response to the following question, How risky do you feel it
is for the inventory to remain on the books? Subjects were asked to respond on a
7 point Likert-type scale with Very Risky and Not Risky at opposite ends. A
MannWhitney Test found significant differences between the mean of the positive
(3.88) and negative (3.49) inventory frames at the 5% probability level indicating
that the frame manipulation had succeeded ( p = 0.0458).
4.4. Demographics
Table 2 provides overall information about the respondents in this study. The
respondents held positions in a fairly diversified number of industries with the
46% of respondents employed by manufacturing firms. Subjects employed by
service-oriented firms composed the next largest group (12.7%), followed by those
from public accounting firms (10.4%). The remaining subjects (30.0%) worked
in a variety of environments from banking, retailing, non-profits, consulting, to
distribution.
109
Number of Respondents
Percent of Total
148
40
33
8
12
11
6
8
50
46.9
12.7
10.4
2.5
3.8
3.5
1.9
2.5
15.8
316
100.0
8
185
122
1
2.5
58.6
38.6
0.3
316
100.0
149
124
39
3
1
47.2
39.3
12.3
0.9
0.3
316
100.0
20
56
44
100
95
1
6.3
17.7
13.9
31.7
30.1
0.3
316
100.0
The respondents were well educated with over 97.5% holding a bachelors
degree. An additional 38.6% held advanced degrees. More than half the group
possessed some form of certification. Thirty-nine percent held one certification,
while 13.5% had obtained two or more. The most common certifications were the
Certified Public Accountant (CPA) and the Certified Management Accountant
(CMA).
110
Total
Positive
Negative
57.5 (4.12)
n = 65
63.13 (3.54)
n = 89
67.64 (3.92)
n = 72
76.88 (3.50)
n = 90
60.5
n = 154
73.1
n = 162
63.1
n = 137
69.9
n = 179
111
income target was met, the probability of writing off inventory was 57.5% for
the positive inventory frame compared to 67.64% for the negative frame. The
average likelihood of write-off when the income target was met was 63.1%. In
the condition where income targets were not met, respondents who received the
positive inventory frame indicated that there was a 63.13% likelihood that they
would write off inventory, where the negative frame indicated a 76.88% likelihood.
The average likelihood of write-off when the income target was not met was
69.9%. The results can also be viewed from the inventory frame. For the positive
frame, the average likelihood of inventory write-off was 60.5%. For the negative
frame, the average likelihood of write-off was 73.1%.
DF
Sum-Squares
F-Ratio
Prob > F
Income
Inventory
Interaction
Error
1
1
1
312
11875.56
3780.09
374.19
1091.31
10.88
3.46
0.34
0.0010
0.0627
0.5582
Total
315
357089.80
112
Positive Inventory,
Positive Income (A)
Negative Inventory,
Positive Income (B)
Positive Inventory,
Negative Income (C)
Negative Inventory,
Negative Income (D)
0.0000
1.9465**
0.8606
4.0379*
1.9465
0.0000
1.2152
2.0507*
0.8606
1.2152
0.0000
3.4575*
4.0379*
2.0507*
3.4575*
0.0000
Medians
113
114
information is framed negatively, managers are more likely to write off inventory
even when there would be an adverse effect on their income, than when information is framed positively and their income would be unaffected by their decision.
However, since the KruskalWallis z-value comparing the difference between
the means of these two groups was not significant (z = 1.2152), it is impossible
to support H3c.
4.8. Discussion
Previous earnings management research has been conducted empirically at the
corporate level inferring managerial actions from financial data. This study
examines the theory in a behavioral setting at a divisional or plant level. As
predicted by the results of empirical research, earnings management appeared to
have occurred. Plant managers in this scenario were more apt to write off inventory
if its negative impact on income did not unfavorably affect their bonus. These
subjects had already missed their income targets, and therefore their bonuses, so
they risked little from a personal financial perspective by the write-off of the item.
Subjects who were above budgeted levels (and still had a bonus at stake) were
more cautious about their willingness to write off the inventory and therefore
miss their income target and risk losing their bonus. The results strongly support
the empirical research that suggests that some managers manipulate earnings by
expensing costs in fiscal years where net income expectations are not realized.
The impact of the inventory frame creates the potential for interesting research.
Many might have suggested that the results of the test of earnings management
would be a given. However, the probability of a write-off is actually higher
when the information is framed negatively and the executive would lose his/her
115
bonus by taking the write-off (mean = 67.64), than when the information is
framed positively and there is no chance the executive would receive a bonus
(mean = 63.13). While the difference in results is not statistically significant
and could be the result of random fluctuation, it does have interesting behavioral
implications. The differences in information frames were designed to be subtle
and not necessarily meant to mislead the reader. If these small changes in
information presentation could yield differences in decision making in a situation
where outcomes were more or less expected, how could the information frame
in other less obvious decisions be impacted? This suggests that the impact of
framing could possibly be important in other accounting decisions as well.
Managers receive and communicate information about decisions every day. If the
frame impacts the decision making in such a seemingly predictable decision as in
the earnings management situation, it has the potential to impact other decisions.
Are managers aware of the potential impact of framing on their decision making?
What should they be alert to in the decision-making analysis?
116
Another interesting outcome was the effect of the frame on the write-off decision.
It is interesting to note how something as simple and as indirect as the frame of
the information presentation (inventory frame) could have a significant impact on
results. These results raise the question of what other behavioral factors could
influence managers decisions to manage earnings and provides a basis for future
research into the effects of framing on discretionary managerial decisions.
NOTES
1. Burgstahler and Dichev (1997), Wu (1997), Cahan et al. (1997), Rees et al. (1996),
Healy (1996), Amir and Livnat (1996), Bernard and Skinner (1996), Dechow et al. (1996),
Dechow et al. (1995) are just a few of the most recent examples.
2. Schipper (1989) states that although there is a potential incentive for earnings
management at the divisional level, research in that area is sparse to non-existent.
3. White (1970) found no evidence of earnings management.
4. Watts and Zimmerman (1978) suggest that political costs and debt violations also
affect managers motivations to manipulate earnings. These factors would most likely
impact earnings management at the corporate level. The current research examines earnings
management at the plant level, and does not explicitly test for these other factors.
5. Student members or members reporting their employment status as retired were
excluded from the population.
6. Based on 1998 salaries and total compensation reported by Schroeder and Reichardt
(1999).
7. ANOVA tests were conducted on the sample excluding those individuals who
answered this question incorrectly. Results did not differ greatly from the entire test
sample. The p-value for the variable INCOME was p = 0.0005 for this group and 0.0010
for the full sample; for INVENTORY p = 0.0671 and 0.0627 respectively.
ACKNOWLEDGMENTS
We would especially like to thank Elizabeth Cole, Tim Fogarty, Pete Poznanski,
Ray Stephens and Linda Zucca for their helpful comments and the assistance
and for the support received from the Institute of Management Accountants. We
gratefully acknowledge the financial support received from the Research Council
of Kent State University.
REFERENCES
Amir, E., & Livnat, J. (1996). Multiperiod analysis of adoption motives: The case of SFAS No. 106.
The Accounting Review, 71(4), 539553.
117
Ayers, S., & Kaplan, S. E. (1993). An examination of the effect of hypothesis framing on auditors
information choices in an analytical task. Abacus, 29(2), 113131.
Barnea, A., Ronen, J., & Sadan, S. (1976). Classificatory smoothing of income with extraordinary
items. The Accounting Review, 52(2), 110122.
Beeler, J. D., & Hunton, J. E. (2002). Contingent economic rents: Insidious threats to audit independence. Advances in Accounting Behavioral Research, 5, 2150.
Bernard, V. L., & Skinner, D. J. (1996). What motivates managers choice of discretionary accruals?
Journal of Accounting and Economics, 22(13), 313325.
Burgstahler, D., & Dichev, I. (1997). Earnings management to avoid earnings decreases and losses.
Journal of Accounting and Economics, 24(1), 99126.
Cahan, S. F., Chavis, B. M., & Elemendorf, R. G. (1997). Earnings management of chemical firms in
response to political costs from environmental legislation. Journal of Accounting, Auditing &
Finance, 12(1), 3765.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1995). Detecting earnings management. The Accounting
Review, 70(2), 193225.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1996). Causes and consequences of earnings
manipulations: An analysis of firms subject to enforcement actions by the SEC. Contemporary Accounting Research, 13(1), 136.
Dillman, D. A. (1978). Mail and telephone surveys The total design method. New York, NY: Wiley.
Elliott, J. A., & Shaw, W. H. (1988). Write offs as accounting procedures to manage earnings. Journal
of Accounting Research, 26(Suppl.), 91119.
Financial Accounting Standards Board (1992). Original pronouncements accounting standards
Volume II. Norwalk, CT.
Guidry, F., Leone, A. J., & Rock, S. (1999). Earnings-based bonus plans and earnings management by
business unit managers. Journal of Accounting and Economics, 26(13), 113142.
Healy, P. M. (1985). The effect of bonus schemes on accounting decisions. Journal of Accounting &
Economics, 7(13), 85107.
Healy, P. M. (1996). Discussion of a market-based evaluation of discretionary accrual models. Journal
of Accounting Research, 34(3), 107115.
Hepworth, S. R. (1953). Smoothing periodic income. The Accounting Review (January), 3239.
Johnson, P. E., Jamal, K., & Berryman, R. G. (1991). Effects of framing on auditor decisions. Organizational Behavior and Human Decision Processes, 53(2), 75105.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263291.
Kahneman D., & Tversky, A. (1984). Choices, values and frames. American Psychologist (April),
341350.
Levitt, A. (1998). The numbers game (September 28th). New York, NY: NYU Center for Law and
Business.
Lipe, M. G. (1993). Analyzing the variance investigation decision: The effects of outcomes, mental
accounting and framing. The Accounting Review, 68(4), 748764.
McNichols, M., & Wilson, G. P. (1988). Evidence of earnings management from the provision for bad
debts. Journal of Accounting Research, 26(Suppl.), 131.
OClock, P., & Devine, K. (1995). An investigation of framing and firm size on the auditors going
concern decision. Accounting and Business Research, 25(99), 197201.
Puto, C. P. (1987). The framing of buying decisions. Journal of Consumer Research, 14(3), 301315.
Rees, L., Gill, S., & Gore, R. (1996). An investigation of asset write-downs and concurrent abnormal
accruals. Journal of Accounting Research, 34(3), 157169.
118
Ronen, J., & Sadan, S. (1975). Classificatory smoothing: Alternative income models. Journal of
Accounting Research, 3(4), 133149.
Rutledge, R. W. (1995). The ability to moderate recency effects through framing of management
accounting information. Journal of Mathematical Economics, 11(2), 2740.
Schipper, K. (1989). Commentary on earnings management. Accounting Horizons, 3(4), 91102.
Schroeder, D., & Reichardt, K. (1999). IMA 98 Salary Guide. Strategic Finance, 8(20), 2841.
Shields, M. D., Solomon, I., & Waller, W. S. (1987). Effects of alternative sample space representation
on the accuracy of auditors uncertainty judgments. Accounting, Organizations and Society,
12(4), 375385.
Walsh, P., Craig, R., & Clarke, F. (1991). Big bath accounting using extraordinary items adjustments:
Australian empirical evidence. Journal of Business Finance and Accounting, 18(2), 173189.
Watts, R., & Zimmerman, J. (1978). Towards a positive theory of the determination of accounting
standards. Accounting Review, 53(1), 112134.
White, G. E. (1970). Discretionary accounting decisions and income normalization. Journal of
Accounting Research, 8(2), 260273.
Wilson, T. B. (2001). Whats hot and whats not: Key trends in total compensation. Compensation &
Benets Management, 17(2), 4550.
Wu, Y. W. (1997). Management buyouts and earnings management. Journal of Accounting, Auditing,
and Finance, 12(4), 373389.
Zucca, L. J., & Campbell, D. R. (1992). A closer look at discretionary write-downs of impaired assets.
Accounting Horizons, 6(3), 3041.
APPENDIX
Scenario
You are the plant accountant for a Cleveland area plant of the Spring Wire Company.
The responsibilities of your position include the processing of payroll, payments
to vendors (accounts payable), inventory accounting, preparation of budget and
estimates, and analysis of actual plant operating results. All members of the plant
staff (including yourself) are given a bonus contingent on achieving or exceeding
the plants operating budgeted net income of $1,500,000. If the budgeted operating
income is achieved, 0.5% of the current years net income will be paid to you in the
form of a bonus. (e.g. if net income is $1,510,000, your bonus would be $7,550.)
It is January 1, and you have received estimated net income for the year of
$1,502,000 [$1,400,000]. In past years, these early results have proved to be
accurate, with few unexpected adjustments made after this date.
You have one last chance to review the status of your inventory that was taken on
December 31st to determine if any potentially obsolete inventory items should be
written off. You are presented with the following information from the Inventory
119
and Materials Manager (also a staff manager) concerning the inventory item in
question.
Part Number PX23415 is sold to computer manufacturers. It has a current inventory of 5,000 units on hand with a total current inventory value of $15,000. Your
plants total inventory including Part Number PX23415 is $350,000. The demand
for this product is 15% of last years demand [Industry sales of this product have
demonstrated an 85% decline in both volume and dollar amounts in the last year].
The inventory turnover ratio for this item has declined substantially from the
prior year. Of the original market for the product, about 20% of your competitors
remain [Approximately 80% of your competitors in the market for this product
have ceased production and sales]. No sales occurred during the months of
November or December for your company. Because of the nature of this product,
the potential for this part to be sold in the replacement parts market does not exist.
(1) Please indicate the percent probability in your opinion that this inventory will
be sold. (0100%)
(2) Please indicate the percent probability that you would write off Part Number
PX23415 from inventory. (0100%)
For questions 3 through 5, place an X on the box that best indicates your opinion.
(3) How risky do you feel it is for the inventory to remain on the books?
Very Risky
Not Risky
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(4) Indicate on the scale below your perception of what is occurring in the
marketplace to the demand for this part?
Significantly Decreased
No
Change
Not Important
(1) (2) (3) (4) (5) (6) (7)
(6) Did you achieve the operating budget prior to the inventory write-off decision?
Yes or
No
121
122
INTRODUCTION
This study is motivated by todays competitive business environment that requires
individuals to give their attention to many areas, all of which compete for their time.
Recent innovations in management accounting control systems, such as Kaplan
and Nortons (1992, 1996) Balanced Scorecard, reflect this situation and attempt
to influence individuals to balance their time among multiple areas through the
establishment of goals, incentives, and accounting systems. While a great deal of
research has been conducted regarding the effects of incentives and goal difficulty
in relation to a single task (cf. Bonner, Hastie, Sprinkle & Young, 2000; Camerer
& Hogarth, 1999; Cameron & Pierce, 1994; Jenkins, Gupta, Mitra & Shaw, 1998;
Wood & Locke, 1990), very little is known about the effects of these variables
on behavior in relation to accomplishing multiple tasks (Ashford & Northcraft,
in press; Locke & Latham, 1990) as addressed by the Balanced Scorecard.
Research into the effects of incentives and goal difficulty on behavior within
a Balanced Scorecard framework is needed for several reasons. Foremost is the
fact that the kinds of incentive structures that are possible when multiple tasks are
involved have received scant attention in the literature. For instance, incentives
associated with a Balanced Scorecard can be structured so that rewards are
received only after meeting the goals in all areas. Or, Balanced Scorecard areas
can be decoupled so that rewards are provided after meeting goals associated with
individual areas. Furthermore, achieving the goals in one area may be easy while
it may be very challenging in another. The combinations of these possibilities add
a level of complexity to the Balanced Scorecard environment that has received
scant attention in the existing literature. For these reasons, Ashford and Northcraft
(in press) call for more research into decision-making when multiple tasks
compete for an individuals time and attention. The use of the Balanced Scorecard
as a management tool has increased the need for this research.
Naylor, Pritchard and Illgen (1980) posit a theory, hereafter NPI theory,
suggesting that when individuals are faced with multiple objectives, how they
allocate their time among the areas that compete for their time is more important
to achieving overall satisfactory results than the total amount of time spent
working on all of their goals. This distinction has been termed, direction of effort
versus level of effort (Blau, 1986, 1993). Because the many studies that examine
goal difficulty and incentives typically use only a single goal and single task,
they address only level of effort. The effects of incentives and goal difficulty on
direction of effort remain largely unexplored.1
As a practical issue, organizations would benefit from a better understanding
of how incentives and goal difficulty interact to influence how individuals expect
to use their time among their areas of responsibility. The Balanced Scorecard
123
System (Kaplan & Norton, 1992, 1996) is based on the premise that overall
performance is improved when goals in all areas are reached together. Failure
on one dimension cannot be completely compensated by success in others.
Conceptually then, organizations may desire to reward individuals only when
they achieve satisfactory performance in all of the Balanced Scorecard areas.
One finding of goal research, however, is that while challenging goals generally
motivate more effort than easy goals (Wood & Locke, 1990), unattainable goals
often do not and can sometimes have large negative consequences (Fatseas &
Hirst, 1992; Lee, Locke & Phan, 1997; Mowen, Middlemist & Luther, 1981;
Wright, 1992). This being the case, basing rewards on areas coupled together via
a comprehensive control system may produce unintended consequences when
information suggests that goals in one or more areas are unattainable. Research
is clearly needed to answer these types of practical questions.
A theoretical justification for this study is that, for Balanced Scorecard systems
to work, they must affect the plans of individuals. Without premeditated, goaldirected planning, individuals do not control their environments but are controlled
by them. This notion is consistent with the idea that closely related constructs like
goal commitment, goal motivation, and intentions affect goal-related performance
(Locke, Latham & Erez, 1988). If the Balanced Scorecard does not motivate
individuals sufficiently to alter their plans about where they will spend their time,
arguing that they are committed to it is difficult (cf. Naylor & Illgen, 1984, p. 98)
or achieving its objectives is unlikely. Hence, this study builds on the theoretical
foundation from prior studies by looking at the time planning decisions of
the subjects.
Studies that examine the effects of planning and intentions on performance
generally conclude that these variables have a stronger effect than most other
variables. For instance, Chesney and Locke (1991) find that identifying an
appropriate strategy for completing a complex task in the initial planning stage
has a greater effect on performance than does goal difficulty. Early, Wojnaroski
and Prest (1987) find that planning is positively associated with performance
in both the laboratory and the field. In a study by Cotton and Tuttle (1986),
intentions predicted subsequent behavior more reliably than any other variable
they identified in the literature. McAllister, Mitchell and Beach (1979) find
that individuals who planned to spend more time on a task actually did spend
more time on it and thus conclude that intentions are positively related to
performance.
Also from a theoretical viewpoint, this research extends the findings of many
goal studies that employ tasks having a production-line orientation to a context that
more closely resembles those encountered by individuals in management roles.
Managers operate in environments that inherently place many demands on their
124
time at once. Although prior goal research has examined complex as well as simple
tasks (Chesney & Locke, 1991; Wood & Locke, 1990; Wood, Mento & Locke,
1987), subjects have typically worked towards only a single goal. Settings characterized by single objectives are more characteristic of unskilled or process-oriented
jobs not management level positions. On the other hand, the task of allocating
ones individual time and attention between various demands is highly consonant
with what managers do. That is, a managers time is his most valuable and scarce
resource and how that resource is allocated likely makes the most difference
to what gets accomplished (Miodonski, 1999; Plack, 2000). Few studies have
looked into factors that influence time allocation between tasks in a managerial
context.
Investigating the difficulty of the goal is also important in a Balanced Scorecard
framework. Information about goal difficulty is an integral, if not a necessary,
component to the successful achievement of most important goals (Wood &
Locke, 1990) and is a major rationale for the existence of the Balanced Scorecard.
Simply put, having a goal without also having the ability to assess ones position
relative to it, is not much of a goal. Notwithstanding, only a very small portion
of the goal literature examines behavior in a setting where information about the
level of goal difficulty in one area permits subjects to shift their time to or from
other relevant, work-related areas. Yet, this is exactly what is possible within a
Balanced Scorecard system.
125
and, therefore, are less motivating than easy or challenging goals (Fatseas &
Hirst, 1992; Lee, Locke & Phan, 1997; Mowen et al., 1981; Wright, 1992). Erez,
Gopher and Arzi (1990) partially extend these conclusions to multiple tasks and
find that proportionately more attention is allocated to more difficult tasks. To the
extent that these findings generalize to time planning decisions by individuals,
they suggest that individuals will plan to spend more time on challenging goals
and less time on easy or unattainable goals.
126
127
128
incentives are based on goal attainment in each area separately than when incentives are based on goal attainment for all areas as a set.
H4. Shifts in time from Balanced Scorecard areas that information indicates are
unattainable to areas that information indicates are challenging will be greater
when incentives are based on goal attainment in each area separately than when
incentives are based on goal attainment for all areas as a set.
Note that H1 and H2 combine to suggest that a managers time allocation will
follow an inverted U-shaped function in relation to goal difficulty for a single
Balanced Scorecard area. Furthermore, as a result of shifting time to and from
competing areas, the time allocated to these other areas will resemble a righted
U-shaped function in relation to the goal difficulty of the single target area (holding
goal difficulty constant for the other competing areas). Figure 1 expresses this
relationship and is consistent with several models of motivation beginning as
early as Atkinson (1958). H3 and H4 imply that both U-shaped functions will be
flatter when incentives are provided only when goals are achieved in all areas in
comparison to when incentives are based on goal attainment for each area of the
Balanced Scorecard individually.
METHOD
Experimental Design
A decision-making experiment was conducted in which participants were
randomly assigned to one of six cells in a 2 3 design. Incentive structure was
manipulated between subjects at two levels by making the likelihood of promotion
and receiving a 20% bonus contingent upon achieving goals in four Balanced
Scorecard areas either: (1) individually; or (2) as a set. Goal difficulty was
manipulated between subjects at three levels as being: (1) easy; (2) challenging
but attainable; or (3) not attainable.
129
Oils domestic marketing and oil refining division (Kaplan, 1997a, b). All subjects
received the same four areas and goals.
The participants were informed that they were being considered for a promotion
and that the corporation offered a performance bonus of up to 20% of their salary,
both of which were linked to their goals. About half of the participants were
informed that the likelihood of promotion and bonus depended upon how many
goals you achieve while the remaining participants were informed that their
promotion and bonus depended upon achieving all four goals together. This
constituted the incentive structure manipulation with one groups bonus based
on achieving goals in individual areas and the other groups bonus based on
achieving goals in the entire set of Balanced Scorecard areas. Thus, all subjects
were provided with a possibility to achieve the same reward; only the manner in
which the incentive was structured varied between groups.
Goal difficulty was manipulated for the Customer area by providing the subjects
with reliable feedback suggesting that the Customer goal is easily attainable,
challenging but attainable, or not attainable, depending on their experimental
condition. This resulted in three conditions: Easy, Challenging, and Unattainable.
Goal difficulty was held constant for the other three areas (Financial, Internal
Business, and Learning & Growth) at a challenging but attainable level for all
participants.
All participants were informed that they could work as many hours per week
as they wished and that they were free to allocate their work hours as they saw
fit except that they must spend 15 hours per week on tasks unrelated to their four
goals.2 The rest of their time at work was to be devoted to achieving the goals in the
four areas. The participants were then asked how they would allocate their hours
at work to achieve the goals in each of the four Balanced Scorecard areas. Thus,
the hours-per-week the subjects intended to work were collected for each goal
resulting in planned time to spend on Customer, Financial, Internal Business, and
Learning & Growth goals. The sum of these four responses is the total goal related
hours per week. To measure the relative amounts of time allocated to achieving the
various goals, the difference in time allocated to the (manipulated) customer area
and the average time allocated to the three other areas was computed. Positive numbers reflect more time to the customer goal in relation to the average time allocated
to the other three goals. Negative numbers reflect more time to the three competing
goals, on average, than to the customer goal. Hence, this measure reflects the
relative emphasis that the subjects placed on the manipulated goal in comparison
to other challenging goals that are competing for their time.3
After the dependent measures were collected, subjects responded to a goal difficulty manipulation check in which they selected the goal difficulty information
that they received in the case from among the three possibilities. Likewise, in
order to check the incentive manipulation, the participants selected the incentive
130
manipulation statement that they received in the case. Next, the participants
were asked two questions regarding the valance of the incentives and their
effort-to-performance expectancy. The valance question asked how attractive the
bonus and promotion was using a nine point Likert scale anchored by 1 = very
unattractive and 9 = very attractive. The effort-to-performance expectancy
question asked the subjects to rate how likely they would be to accomplish all four
goals if they exerted maximum effort using a nine point Likert scale anchored by
1 = very unlikely and 9 = very likely.4
Finally, the participants were asked to provide demographic information. The
data were gathered during regularly scheduled classes. Participation was voluntary
and anonymous and the experiment took about 15 minutes to complete.5
Participants
One hundred and ninety-three Professional MBA students participated in the
study. Of these, 10 provided incomplete responses and 18 failed one or both
manipulation checks and were deleted leaving 165 usable responses. Because the
professional MBA requires an undergraduate degree in business, work experience,
and is a 12-month program, the subjects were quite homogenous and well qualified
to perform the task. About 26% of the participants were women. The typical
participant was 30.3 years old with 7.8 years of work experience. On average,
the participants had supervised a maximum of 22 people. The participants tended
to agree with the statement, My advancement and/or compensation at work is
contingent upon achieving a goal or goals . . . in the same four areas used in
the experimental materials. On a scale of 1 = disagree and 7 = agree, the mean
response for this question is 4.7 for the customer area, 4.9 for the financial area,
5.3 for the internal business process area, and 4.9 for the learning and growth area.
Approximately 42% of the subjects stated that they are paid a bonus in addition to
their salary. These data suggest that the subjects have had exposure to the kinds of
goals, incentives and issues in the decision task and that the participants were, as
a result of their education and work experience, capable of providing meaningful
responses.
RESULTS
Preliminary Analysis
The demographic variables were tested to determine if systematic differences exist
across cells. Chi-square tests show no significant differences across treatment
131
conditions (all p-values > 0.10) for any of the three categorical demographic
variables: gender, educational degree program, and current compensation plan.
Separate 2 3 ANOVAs were conducted for each continuous demographic
variable: the number of years of work experience, the maximum number of
individuals supervised, and whether compensation at work was contingent upon
achieving a goal or goals in each of the four business areas. Incentive structure at
two levels and goal difficulty at three levels served as the independent variables.
The ANOVA results show no significant differences (all p > 0.10) for any
continuous demographic variable across cells. Thus, results from the analysis
of the demographic data suggest that randomization was effective and that the
subjects are homogenous across treatment conditions.
The attractiveness of the incentives and the expectancy of accomplishing
the goals in all four areas (given maximum effort) should not differ between
incentive structures. The data support this proposition in that the attractiveness
of the incentives based on each area separately (mean = 8.11) does not differ
from the attractiveness of the incentives based on achieving the goals in all areas
(mean = 8.28, t = 1.0441, p = 0.2980). Incentive structure was not predicted to
affect goal challenge but to interact with goal challenge to affect motivation. Consistent with this notion, the expectancy of accomplishing the goals in all four areas
when incentives were based on each area (mean = 6.94) does not differ significantly from when incentives were based on all areas (mean = 7.24, t = 0.8874,
p = 0.3762).
The expectancy of accomplishing all goals, however, should differ by goal
difficulty so that the expectancy should decrease with goal difficulty. The results
generally support this proposition in that easy and challenging goals (means 7.84
and 7.81, respectively) are seen as more likely to be accomplished (p = 0.0001)
than unattainable goals (mean = 5.78).
The total amount of time the subjects planned to work in a week, as shown in
Table 1, was not affected by incentives or goal difficulty. The finding that subjects
do not adjust their workweek for incentives or goal difficulty is consistent with
Naylor, Pritchard and Illgen (1980) who assert that total work effort is stable
across most conditions other than those associated with individual differences.
Hypotheses Testing
The first hypothesis predicts that individuals will shift time from goals that
information indicates are easy to goals that information indicates are challenging.
Recall that goal difficulty was manipulated only for the customer goal and that
goal difficulty was held constant (i.e. challenging but attainable) for the other
132
df
0.14
0.8714
1
2
1.75
0.74
0.1882
0.4771
5
159
0.66
0.6521
Easy
Challenging
Incentive average
Incentive Structure
Each Goal Evaluated
Separately
38.73 (7.12)
30
40.68 (7.51)
25
37.74 (6.57)
27
37.23 (7.04)
26
38.79 (6.21)
28
38.69 (9.12)
29
three goals. To measure the relative amounts of time allocated to achieving the
various goals, the difference in time allocated to the (manipulated) customer
area and the average time allocated to the three other areas was computed.
The hypothesis predicts that the difference in time allocations should be larger
(more positive) when the customer goal is challenging compared to when
it is easy.
The hypothesis was tested using a 2 2 ANOVA with the difference in time
allocated between the customer goal and the average of the other three goals as the
dependent measure. Goal difficulty (easy versus challenging but attainable) and
incentive structure (separate versus set) served as the independent variables. As
can be seen from Panel A in Table 2, the main effect for goal difficulty is highly
significant (F = 33.82, p = 0.0001). When the customer goal is challenging, then
all four goals are challenging. In the situation where all goals are challenging,
Panel B of Table 2 shows that the subjects allocated more time to the customer
area than to the other areas (mean difference = +1.77 hours) possibly reflecting
a bias towards taking care of customers or a belief that this area requires a greater
133
df
1
1
1
33.82
2.32
0.01
0.0001
0.1304
0.9243
3
104
12.19
0.0001
Panel B: Means
Goal Difficulty
Incentive Structure
Average
2.51
2.44
3.69
1.10
Incentive average
0.04
1.29
3.10
1.77
a The
difference is calculated as the time allocated to the customer area less the average time allocated
to other areas so that positive numbers reflect more relative time spent in the manipulated customer
area.
time commitment. In contrast, the subjects shift the time they plan to spend
accomplishing the three other challenging goals (mean difference = 3.10 hours)
when the customer goal is easy. Hypothesis 1 is strongly supported.
The second hypothesis predicts that individuals will shift time from areas that
information indicates the goals are unattainable to areas that information indicates
the goals are challenging. The second hypothesis was tested in a like manner
to H1 using a 2 2 ANOVA with the difference in time allocated between the
customer area and the average of the other three areas as the dependent measure.
For this test, goal difficulty (challenging versus unattainable) and incentive
structure (separate versus set) served as the independent variables. As can be seen
from Panel A in Table 3, the main effect for goal difficulty is highly significant
(F = 10.14, p = 0.0019). This result is modified, however, by a significant goal
difficulty by incentive interaction as discussed below.
Overall results for H1 and H2 support the prediction that subjects shift the
time they are willing to spend from one area of responsibility to another due to
goal difficulty as described in Fig. 1. Figure 2 shows the results from the study
134
df
10.14
0.0019
1
1
2.96
8.93
0.0882
0.0035
3
104
7.44
0.0001
Incentive Structure
Average
Panel B: Means
Goal Difficulty
Challenging
Unattainable
2.44
4.07
1.10
0.90
Incentive average
0.82
1.00
1.77
1.59
a The
difference is calculated as the time allocated to the customer area less the average time allocated
to other areas so that positive numbers reflect more relative time spent in the manipulated customer
area.
in the same graphic form as Fig. 1. Recall that the information indicated that all
non-customer goals (i.e. goals from competing areas) are challenging. As can be
seen, individuals react to goal difficulty information by shifting their time from
areas associated with easy goals to those associated with challenging goals, and
from unattainable goals to challenging goals in a manner that supports our overall
prediction.
H3 and H4 suggest that incentive structure modifies the relationship between
goal difficulty and planned time leading to the prediction that the interaction terms
reported in Tables 1 and 2 should be significant. As can be seen in Panel A of
Table 2, incentive structure does not interact with goal difficulty (F = 0.01,
p = 0.9243) thus failing to support H3. Hence, the data do not suggest that
incentive structure modifies the amount of time subjects plan to spend on Balanced
Scorecard areas associated with easy versus challenging goals.
Panel B of Table 3 shows that when incentives are based on each goal separately,
information indicating that the customer goal is unattainable caused individuals
135
136
those areas differ in terms of whether their associated goals are unattainable versus
challenging or easy.
Supplemental Analysis
Predicting separate differential effects for the manipulations on the time allocated
to each of the three non-customer goals is not possible. Nevertheless, in the spirit
of the studys main premise that individuals consider all areas of the Balanced
Scorecard together as they plan their time, supplemental analysis of these data
is reported. Panel A of Table 4 shows the results of a MANOVA in which hours
allocated to the Financial goal, the Internal Business goal, and the Learning and
Growth goal are dependent variables with goal constituting a within subject variable. Customer goal difficulty at three levels (easy, challenging, and unattainable)
and incentive structure at two levels (separate versus set) served as the independent
variables. As can be seen from the table, the analysis shows a three-way interaction
between goal, customer goal difficulty, and incentive structure (F 318,4 = 2.44,
p = 0.0468) making the interpretation of other effects difficult.
Some insights into the interaction of these variables are possible by examining
the mean hours allocated towards attaining each of the three non-customer
goals, as well as hours allocated to the customer goal, as shown in Panel B of
Table 4. Consider first the case in which incentives are based on achieving each
goal separately. Here, the time allocated to the customer (manipulated) goal
follows the predicted inverted-U shaped pattern (Fig. 1) and the time allocated to
each competing goal generally follows the predicted righted-U shaped pattern. As
hours are shifted to and from the customer goal according to its difficulty, the hours
are spread relatively consistently across the three competing goals. In contrast,
consider the case in which incentives are based on achieving all goals as a set. Here,
no perceptible difference in time allocation occurs between the challenging and
unattainable conditions across any of the four goals. That is, when the customer
goal is challenging, the subjects allocated 10.1 hours to this goal and the like figure,
10.3 hours, when the goal is unattainable. When the customer goal is challenging
versus unattainable, hours allocated to the other three goals correspond closely
as well: financial goal = 8.6 and 9.6; internal business goal = 9.9 and 9.9; and
learning and growth goal = 8.6 and 8.9, respectively. These observations suggest
that the pattern of results shown in Fig. 2 is driven by the condition in which
incentives reward each goal separately a conclusion that is consistent with H4.
This incentive structure more closely resembles those used in the prior studies
upon which the predictions were based (in contrast to incentives in which rewards
are received only after achieving an entire set of distinct, competing goals).
Table 4. Time Allocated to Balanced Scorecard Areas Other than the Customer Area.
Panel A: MANOVA
Source
df
2
1
2
3.03
2.71
1.20
0.0513
0.1015
0.3040
2
4
2
4
6.18
2.11
0.71
2.44
0.0023
0.0794
0.4940
0.0468
Easy
Challenging
Unattainable
Average
Customer
Financial
Internal
Business
Learning &
Growth
Customer
Financial
Internal
Business
Learning &
Growth
7.8
12.0
6.6
11.5
10.5
9.6
10.5
9.9
11.3
9.0
8.4
11.3
6.7
10.1
10.3
10.4
8.6
9.6
11.2
9.9
9.9
9.5
8.6
8.9
8.7
10.5
10.5
9.5
9.1
9.5
10.3
9.0
137
138
DISCUSSION
Some strengths and limitations to the study should be mentioned before discussing
its findings. The study was conducted in the laboratory using a written exercise
designed to capture the essentials of managers time allocation decisions. As such,
care should be taken when extrapolating the results to other contexts and situations.
On the other hand, the study employs a strong design that contributes to its internal
validity and allows us to examine the proposed causal relationships. It also benefits
from a high level of experimenter control and uses a task that corresponds more
closely to the kinds of tasks performed by managers than many of the previous
goal studies. Furthermore, the materials that the subjects used are based on the
Balanced Scorecard of an actual company. These factors increase the studys
external validity.
This is one of a very few studies to examine the effects of goal difficulty and
the effects of incentive structure in a Balanced Scorecard context where multiple
demands vie for the subjects time. Based on Naylor, Pritchard and Illgens (1980)
NPI theory, we predicted that subjects would shift their time between areas based on
the goal difficulty information they received and as influenced by their incentives.
These predictions were generally supported. Although we found considerable
support that incentive structure and goal difficulty affect how individuals allocate
their time between areas, we found no evidence that either influences the total
amount of time the subjects said they would work to achieve satisfactory results in
all the Balanced Scorecard areas. This also supports NPI theorys assumption that
in a work-related situation, people do not change their total level of effort except
under very unusual situations. Rather, individuals shift their time from easy goals
to more challenging goals and from unattainable goals to challenging goals.
These findings suggest that organizations should consider incentives and
management control variables such as goal difficulty information as ways to
change or refocus individuals time and not as ways to induce more effort. This has
implications for the kinds of effort attributions that are sometimes made during
performance evaluation. Evaluators should be careful about attributing negative
performance to a lack of effort unless they have first ruled out misdirected effort.
It also highlights the importance of receiving timely and accurate information in
order for individuals to appropriately direct their time. The findings also imply
that individuals are sensitive to variables that are under organizational control and
which are susceptible to manipulation. Organizations could improve appropriate
goal directed behavior by making sure that their incentives and reporting systems
focus individuals time on their important organizational goals.
One of the major insights of the study is that individuals react differently to
goal difficulty under different incentive structures. When a particular goal is
139
easy compared to challenging, the time allocated to achieving goals that are
competing for the managers attention does not differ according to incentive
structure. However, when information indicates that one goal is unattainable,
the incentive structure makes a substantial difference. When monetary incentives
are based on the extent to which the subjects met each goal individually, the
subjects shifted approximately 6.51 hours from the area with unattainable goals
to alternative areas. However, when the monetary incentives are provided only
upon achieving the goals in all areas, the subjects did not plan to shift any hours
from the area with unattainable goals to alternative areas. Envisioning situations
in which either result is desirable is certainly possible. If goals are somewhat
arbitrary, in that just missing a goal is still beneficial, then basing rewards on
individual goal achievement could be counter productive. Once missing a goal
becomes obvious, individuals will dramatically decrease their planned effort in
that area and redirect it towards meeting challenging but still attainable goals.
On the other hand, there are situations in which organizations want to discourage
individuals from working on unattainable goals. In this case, they should base
monetary rewards on attaining individual goals rather than all goals.
We note that individuals will plan to spend time working on a goal despite
receiving reliable information that the goal is unattainable. This suggests that
individuals consider more than goal difficulty when planning their time. For
instance, individuals may continue to be psychologically committed to goals
that they have previously accepted despite receiving negative goal difficulty
information. In addition, they may feel a need to justify their actions and believe
that missing a goal is easier to justify if effort has been expended than if one
quits altogether. They may also wish to come as close as possible to achieving
the goal in order to preserve their reputations as best they can coming close
may not be viewed as badly as being way off the mark. Also, individuals know
that in most cases, goal achievement in future periods is tied to the level of effort
exerted this period. Hence, they may be reluctant to completely cease working on
an unattainable goal in order to avoid beginning in a hopeless situation the next
period. These conjectures are fruitful topics for future research.
Together, the findings from the study strongly suggest that when multiple
areas compete for attention, as in the Balanced Scorecard, the way incentives
are structured influences how individuals plan their time between areas rather
than their total level of effort. We have argued that planning ones time to be
successful in multiple areas is a crucial aspect of what individuals, and particularly
managers, do. For these reasons, this study represents an important contribution
to knowledge about ways incentives can be structured in a Balanced Scorecard
framework to help organizations achieve their goals. Hopefully others will find
the approach taken by this study useful in examining these issues.
140
NOTES
1. Effort includes both time and intensity components, however, Larson and Callahan
(1990) argue that individuals are more likely to differentially allocate their time than vary
their intensity between tasks. They argue that individuals groove in to an overall level of
intensity, which they strive to maintain over time.
2. In pretests, subjects were concerned about duties other than those directly tied to
Balanced Scorecard areas. Inclusion of the 15 hours per week on tasks unrelated to their
four goals controls for differences in the amount of time that subjects would otherwise
have assumed needed to be spent on these tasks.
3. One reviewer suggested analyzing the data using proportions rather than difference
scores. The results are equivalent using either method (cf. Tuttle & Harrell, 2001).
4. The manipulation checks were presented with the original case materials and
asked the subjects not to look back. A stronger test would have been to administer the
post-experimental materials separately from the case.
5. A small number of students, which we did not count, chose not to participate. No
monetary incentive was provided.
ACKNOWLEDGMENTS
The authors would like to thank workshop participants at the University of Utah
and the University of South Carolina for their helpful comments.
REFERENCES
Ajzen, I. (1987). Attitudes, traits, and actions: Dispositional prediction of behavior in personality
and social psychology. In: L. Berkowitz (Ed.), Advances in Experimental Social Psychology
(Vol. 20, pp. 163). San Diego, CA: Academic Press.
Ajzen, I., & Madden, T. J. (1986). Prediction of goal-directed behavior: Attitudes, intentions,
and perceived behavioral control. Journal of Experimental Social Psychology, 22(5),
453474.
Anthony, R., & Govindarajan, V. (1998). Management control systems. Homewood, IL: Irwin/McGrawHill.
Ashford, & Northcraft, G. (2002). Robbing Peter to pay Paul: Feedback environments and enacted
priorities in response to competing task demands. Human Resource Management Review,
forthcoming.
Atkinson, J. W. (1958). Motives in fantasy, action, and society: A method of assessment and study.
Princeton, NJ: Van Nostrand.
Awasthi, V., & Pratt, J. (1990). The effects of monetary incentives on effort and decision performance:
The role of cognitive characteristics. The Accounting Review, 65(4), 797811.
Blau, G. (1986). The relationship of management level to effort level, direction of effort, and managerial
performance. Journal of Vocational Behavior, 29, 226239.
141
Blau, G. (1993). Operationalizing direction and level of effort and testing their relationship to individual
job performance. Organizational Behavior and Human Decision Processes, 55, 152170.
Bonner, S. E., Hastie, R., Sprinkle, G. B., & Young, S. M. (2000). A review of the effects of financial incentives on performance in laboratory tasks: Implications for management accounting.
Journal of Management Accounting Research, 12, 1964.
Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review
and capital-labor-production framework. Journal of Risk and Uncertainty, 19(13), 742.
Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis.
Review of Educational Research, 64, 363423.
Chesney, A. A., & Locke, E. A. (1991). Relationships among goal difficulty, business strategies, and
performance on a complex management simulation task. Academy of Management Journal,
34(2), 400424.
Cotton, J. L., & Tuttle, J. M. (1986). Employee turnover: A meta-analysis and review with implications
for research. Academy of Management Review, 11(1), 5570.
Covey, S. R. (1989). The seven habits of highly effective people: Restoring the character ethic. New
York, NY: Simon and Schuster.
Early, P. C., Wojnaroski, P., & Prest, W. (1987). Task planning and energy expended: Exploration of
how goals influence performance. Journal of Applied Psychology, 72, 107114.
Erez, M., Gopher, D., & Arzi, N. (1990). Effects of goal difficulty, self-set goals, and monetary
rewards on dual task performance. Organizational Behavior & Human Decision Processes,
47(2), 247270.
Fatseas, V. A., & Hirst, M. K. (1992). Incentive effects of assigned goals and compensation schemes
on budgetary performance. Accounting and Business Research, 22(88), 347355.
Gollwitzer, P. M., & Bargh, J. A. (1996). The psychology of action. New York, NY: Guilford Press.
Jenkins, G. D., Gupta, N., Mitra, A., & Shaw, J. D. (1998). Are financial incentives related to performance? A meta-analytic review of empirical research. Journal of Applied Psychology, 83(5),
777787.
Kaplan, R. S. (1997a). Mobil USM&R (A): Linking the balanced scorecard. Boston, MA: Harvard
Business School Publishing.
Kaplan, R. S. (1997b). Mobil USM&R (B): New England sales and distribution. Boston, MA: Harvard
Business School Publishing.
Kaplan, R. S., & Norton, D. P. (1992). The balanced scorecard: Measures that drive performance.
Harvard Business Review (JanuaryFebruary), 7179.
Kaplan, R. S., & Norton, D. P. (1996). Translating strategy into action: The balanced scorecard. Boston,
MA: Harvard Business School Publishing.
Komaki, J. L., Coombs, T., & Schepman, S. (1996). Motivational implications of reinforcement theory.
In: R. M. Steers, L. W. Porter & G. A. Bigley (Eds), Motivation and Leadership at Work (pp.
3452). New York, NY: McGraw-Hill.
Larson, J. R., Jr., & Callahan, C. (1990). Performance monitoring: How it affects work productivity.
Journal of Applied Psychology, 75(5), 530538.
Lee, T. W., Locke, E. A., & Phan, S. H. (1997). Explaining the assigned goal-incentive interaction:
The role of self-efficacy and personal goals. Journal of Management, 23(4), 541559.
Libby, R., & Lipe, M. G. (1992). Incentives, effort, and the cognitive processes involved in accountingrelated judgments. Journal of Accounting Research, 30(2), 249273.
Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs,
NJ: Prentice-Hall.
Locke, E. A., Latham, G. P., & Erez, M. (1988). The determinants of goal acceptance and commitment.
Academy of Management Review, 13, 2339.
142
McAllister, D. W., Mitchell, T. R., & Beach, L. R. (1979). The contingency model for the selection of decision strategies: An empirical test of the effects of significance, accountability, and
reversibility. Organizational Behavior and Human Decision Processes, 24(2), 228244.
Miodonski, B. (1999). Time management is key to juggling multiple jobs. Contractor, 46(2), 5.
Mowen, J., Middlemist, R., & Luther, D. (1981). Joint effects of assigned goal level and incentive
structure on task performance: A laboratory study. Journal of Applied Psychology, 66, 598
603.
Naylor, J., & Illgen, D. (1984). Goal setting: A theoretical analysis of a motivation technology. Research
in Organizational Behavior, 6, 95140.
Naylor, J., Pritchard, R., & Illgen, D. (1980). A theory of behavior in organizations. New York, NY:
Academic Press.
Plack, H. (2000). Managing time can be crucial. Baltimore Business Journal, 17(40), 27.
Sprinkle, G. B. (2000). The effect of incentive contracts on learning and performance. The Accounting
Review, 75(3), 299326.
Stone, D. N., & Zeibart, D. A. (1995). A model of financial incentive effects in decision making.
Organizational Behavior and Human Decision Processes, 61(3), 250261.
Tuttle, B., & Burton, F. G. (1999). The effects of a modest incentive on information overload in an
investment analysis task. Accounting, Organizations and Society, 24, 673687.
Tuttle, B., & Harrell, A. M. (2001). The impact of unit goal priorities, economic incentives, and interim
feedback on the planned effort of information systems professionals. Journal of Information
Systems, 15(2), 8198.
Vroom, V. H. (1964). Work and motivation. New York, NY: Wiley.
Wood, R. E., & Locke, E. A. (1990). Goal setting and strategy effects on complex tasks. Research in
Organizational Behavior, 12, 73109.
Wood, R. E., Mento, A. J., & Locke, E. A. (1987). Task complexity as a moderator of goal effects: A
meta-analysis. Journal of Applied Psychology, 72(3), 416425.
Wright, P. M. (1991). Goals as mediators of the relationship between monetary incentives and performance: A review and NPI theory examination. Human Resource Management Review, 1(1),
122.
Wright, P. M. (1992). An examination of the relationships among monetary incentives, goal level, goal
commitment, and performance. Journal of Management, 18(4), 677693.
143
APPENDIX
Sample Decision Case
Columbia Corporation
Assume that you are a unit level manager employed by the Columbia Corporation.
Columbias senior management has identified a competitive strategy that is linked
to goals in four important business areas. All unit managers have the same goals. In
addition, performance measures were developed for each business area as follows:
Area
Customer
Financial
Internal
Business
Learning &
Growth
Example
performance
measures
Mystery
shopper ratings
Return on
capital
employed
(ROCE)
Net margin
Profit per
business unit
Employee
attitude survey
Number of
inventory
stock-outs
Quality
assessment
score
Employee skill
development
Customer
complaints
Feedback
Customer
compliments
Sales &
growth rate
Goal is easily
attainable
Goal is
challenging
but attainable
Goal is
challenging
but attainable
Timely access
to decision
making
information
Goal is
challenging
but attainable
Notice that you have received reliable interim feedback suggesting that the Customer goal is easily attainable and that the other three goals are challenging but
attainable.
Bonus and Promotion: Two items are of particular interest. First, a division
manager is retiring and you are being considered for his replacement. Second,
Columbia provides a performance bonus of up to 20% of your salary.
Both your promotion and bonus depend on how many goals you achieve. The
more goals you achieve the greater your bonus and likelihood of promotion.
Decision: Like most managers, assume that you can work as many hours as you
want and you can allocate the hours as you see fit. Further, assume that during the
144
next performance evaluation period, you must spend 15 hours per week working on
administrative and other responsibilities that are not directly related to achieving
your goals in the four business areas (e.g. personnel issues, travel). Also, assume
that you will devote all your remaining work time towards achieving your goals in
the four business areas. Given the information in the case, please indicate below
how you would allocate your hours at work to achieve the goals in each business
area:
Goal Area
Customer
Financial
Internal business
Learning & growth
Administrative & other
Total work hours
Hours/week
Hours/week
Hours/week
Hours/week
15 Hours/week
Hours/week
INTRODUCTION
In large, decentralized organizations, accounting information often forms the basis
for budget estimates used in strategic planning, in coordinating work between
organizational divisions, and in setting targets used in performance evaluation
(Merchant, 1985). The accuracy of budget estimates is key to the effectiveness of
these short-run and long-run planning activities. Even so, prior research indicates
Advances in Accounting Behavioral Research
Advances in Accounting Behavioral Research, Volume 6, 145169
Copyright 2003 by Elsevier Ltd.
All rights of reproduction in any form reserved
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06007-1
145
146
THERESA LIBBY
budget estimates are rarely accurate (Otley, 1985). The lack of accuracy of budget
estimates may be the result of the managers inability to forecast accurately
operational input-output relationships due to uncertainty inherent in the task.
In addition, the organization may operate in an environment characterized by
uncertainty. The manager may respond by building a buffer against uncertainty in
the environment or in the task into his or her budget estimate (Davila & Wouters,
2000).1
Alternatively, inaccuracy in budget estimates may be motivated by budgetconstrained performance evaluation and reward systems (Jensen, 2001). Results
of several studies in the accounting literature indicate that budget-constrained
performance evaluation systems that emphasize variances in budget-to-actual
results lead to budget gaming (Bart, 1988; Hopwood, 1972; Merchant, 1985;
Walker & Johnson, 1999). One form of budget gaming that has been the focus of
significant study is the creation of budgetary slack (Young & Lewis, 1995).
Budgetary slack is defined as the intentional incorporation of budget amounts
that make the budget easier to attain (Dunk, 1993). Budgetary slack is created
when managers build excess resources into their budgets or knowingly understate
their productive capabilities (Baiman & Evans, 1983; Young & Lewis, 1995).
Budgetary slack is often manifested through overstated expenses or understated
revenues and production plans (Kren & Liao, 1988).
While budgetary slack may play a positive role by facilitating flexibility in
dealing with uncertainty (Cyert & March, 1963; Van der Stede, 2000), this paper
focuses on the alternative negative role budgetary slack plays when budgets
are used to set targets for performance evaluation.2 Budgetary slack created
when budget estimates are intentionally set at a level that is easy to attain can
be detrimental to management control system effectiveness, especially when
responsibility center managers are held accountable for meeting budget targets
and these targets are used to coordinate activities between organizational divisions
and to compensate managers for high performance.
According to Jensen (2001) and Murphy (2000), the typical pay-forperformance compensation contract includes a fixed salary plus a bonus
increasing in performance above a pre-specified budget target. When a manager
is compensated under this type of contract, holds private information about the
productive capability of his/her division and participates in setting his/her own
budget target, incentives for slack creation exist. Consequently, this type of
contract has been labeled slack-inducing (Waller, 1988). A significant stream
of research has developed using the agency framework to test the ability of
other forms of budget-based incentive contracts to encourage managers to reveal
their private information while limiting the amount of budgetary slack managers
create (Baiman, 1982). These types of contracts have been labeled truth-inducing
147
(Waller, 1988). Truth-inducing contracts typically include a penalty for performance that differs from a participatively set budget target (Weitzman, 1976).
Although theoretically sound, truth-inducing contracts are rarely used in practice (Baker et al., 1988), perhaps because the costs of implementation outweigh
their benefits. An alternative to truth-inducing contracts generating slack-reducing
effects would therefore be valuable. The objective of the present study is to
determine whether the utilization of fair contracting processes combined with an
otherwise slack-inducing incentive contract provides a feasible alternative.3
For the purposes of this study, fair contracting processes are defined according
to procedural justice theory (Leventhal, 1980; Lind & Tyler, 1988). Procedural
justice theory suggests organizational members will perceive a process to be fairer
the greater the degree to which the decision-maker creates a positive atmosphere
of cooperation and compromise even when . . . the values, desires and concerns
of the decision-maker and affected parties may not always agree (Hunton, 1996,
p. 650).
This paper describes the results of an experiment in which subjects performed
a production task and received compensation under a budget-based incentive
contract of either the slack-inducing or truth-inducing form. In addition, subjects
received information about a contracting process designed to be either fair or
unfair. Results indicated subjects compensated under the slack-inducing contract
and assigned to the fair contracting process condition created significantly less
budgetary slack than subjects assigned to the unfair contracting process condition.
While subjects compensated under the truth-inducing contract, on average, created
less slack than subjects compensated under the slack-inducing contract, the fairness
of the contracting process had no effect on the amount of slack they created.
The remainder of the paper is organized as follows. In the next section,
hypotheses are developed followed by a description of the experimental design
and experimental method. The results of the statistical analyses are then reported
followed by discussion of the experimental findings, their limitations and their
implications for future research.
148
THERESA LIBBY
accurate budget targets (Demski & Feltham, 1978; Melumad & Reichelstein, 1989;
Namazi, 1985). A major concern of this literature is that participation in setting
budget targets allows for information sharing, but also increases the potential for
the creation of budgetary slack if managers are then compensated based on meeting
or exceeding the budget that was participatively set (Antle & Eppen, 1985). Truthinducing contracts have been constructed to address this problem.
Truth-inducing contracts impose a penalty for misrepresentation, usually
scaled by the difference between budgeted and actual performance, providing an
incentive for subordinates to reveal their private information through the budget
targets they set (Kirby et al., 1991; Weitzman, 1976). The particular form of
truth-inducing contract studied here was developed theoretically by Reichelstein
and Osband (1984) and adapted to the budgeting context by Kirby et al. (1991).
The contract was further adapted by Kirby (1992) to a context in which the
manager selects a budget target and focuses effort on maximizing output to meet
or exceed that target. The contract is of the following form:
H(A, B) = v(B) + w(B)(A B)
subject to
v(B) is increasing and convex (v > 0, v < 0) and w(B) = v (B) for all B.
In this context, H(A, B) represents the managers total compensation, B represents the productivity estimate (or budget target) for the period, and A represents
the actual level of productivity for the period. The managers total compensation
(H ) is therefore made up of an ex ante payment, v(B), and a bonus or penalty,
w(B) (A B), whose value depends on the variance between budget and actual
performance.
The truth-inducing properties of this contract have been tested empirically by
Kirby (1992), Reichelstein (1992), and Chow et al. (2000). While the theoretical
design of the contract relies on the assumption that managers are strict utility
maximizers, Kirby (1992) finds the contract maintains its truth-inducing properties
even when this assumption is relaxed. Reichelstein (1992) reports a successful
application of this contract form by the German Department of Defense. Finally,
Chow et al. (2000) experimentally test several mechanisms designed to motivate
truthful upward communication of private information including this contract
form. They find this truth-inducing contract led to significantly less misrepresentation of private information than a slack-inducing linear profit sharing scheme.
Accordingly, in the context of the current study, individuals compensated under
this form of truth-inducing contract are expected to create a relatively low amount
of budgetary slack. This prediction is stated formally as follows:
H1. Individuals compensated under a truth-inducing contract will create less
budgetary slack than individuals compensated under a slack-inducing contract.
149
150
THERESA LIBBY
151
152
THERESA LIBBY
budgeting processes are unfair; that is, employees who perceive budgeting
processes to be unfair will reciprocate this unfair treatment by acting in their own
rather than the organizations best interests by creating a relatively high amount
of budgetary slack.
Penalty-framed truth-inducing contracts tend to be completely specified in
economic terms at the beginning of the period due to difficulties in enforcing
the penalty after the fact (Luft, 1994). As a result, a shorter-term economic
exchange relationship between the individual and the organization may become
salient under the penalty-framed truth-inducing contract. If so, procedural fairness
becomes less important and employees may then focus on the economic benefits
obtainable in the current period to a greater degree than consideration of any
future benefits that may accrue. Fehr and Gachter (2002) refer to this effect
as a crowding out of agents incentives to voluntarily cooperate. Results of
their experimental study indicate that incentive contracts that include a penalty
for shirking (i.e. the agent provides less than the agreed upon level of effort)
are less efficient than a fixed-fee contract because they discourage agents from
focusing on the longer term employment relationship and therefore, reduce the
agents interest in reciprocating fair treatment. Thus, individuals compensated
under the penalty-framed truth-inducing contract may not respond to the fairness
or unfairness of budgeting processes when selecting a budget target because
economic incentives imbedded in the contract will be most salient to them.
In summary, this review of the literature implies that the relation between
fairness in contracting and budgetary slack creation is moderated by the form
of budget-based incentive contract employed. That is, fairness in contracting
will influence the amount of budgetary slack individuals create when they are
compensated under a slack-inducing, but not a truth-inducing incentive contract.
This line of reasoning leads to the following hypothesis:
H2. When a slack-inducing contract is employed, budgetary slack will be lower
when the contracting process is fair than when the contracting process is unfair;
however, when a truth-inducing contract is employed, budgetary slack will be
low irrespective of the contracting process employed.
153
154
THERESA LIBBY
Truth-inducing contract
(Budget)2
2(Budget)
+
(Actual Budget)
100
100
Subjects assigned the truth-inducing contract were provided a table in which the
total compensation under this contract for various pairs of budgeted and actual
outcomes was calculated. This table is reproduced in Appendix A. All subjects
were given sample budget and actual amounts and asked to calculate the related
compensation that would be received. They then checked these calculations to
ensure that they understood the relationship between their payment, their budget
and their actual performance.
Payment =
Experimental Procedures
Subjects first completed a five-minute practice period to become familiar with the
translation task. They earned a piece rate of one raffle ticket for every three words
correctly translated. At the end of this practice period, the subjects verified their
work and calculated the number of words that they had correctly translated in the
practice period.7 After practicing the task and being informed of the probability
distribution of words of different lengths, but before experimental manipulations
were introduced, subjects recorded their best estimate of next period performance;
that is, their best estimate of the number of words they expected to be able to
155
translate if given another five minutes in which to work. Subjects placed this
completed Best Estimate of Production sheet in an envelope and sealed it.
Subjects kept this sealed envelope with them until the experiment was complete
and consequently, this information was unknown to the researcher until subjects
had completed the experiment.
Subjects then read a description of the incentive contract under which they
were to work and the information about the fair or unfair contracting process.
They provided the researcher, acting as the division manager, with the budget
they wished to use in calculating the number of tickets earned in the work
period. Subjects were told the budget would also be used by the division
manager to co-ordinate production between divisions. Information asymmetry
was controlled at a relatively high level by informing all subjects they were new
to the organization and their manager was therefore unsure of their productive
capability and did not have access to the Best Estimate of Production forms.
Subjects wrote down their budgets and then performed the translation task for
five minutes.
The third part of the experiment involved filling out a post-experimental
questionnaire. The experimental materials were then collected and one week
later, subjects received a performance report and the tickets that they had earned.
Tickets were collected and placed in a container from which one of the subjects
drew a winning ticket in each group. A cash prize of $150 was paid to the winning
subject in each group and the goals of the experiment were discussed.8 These
experimental procedures are summarized in Fig. 1.
156
THERESA LIBBY
Twelve subjects, approximately equally distributed across cells, failed to provide information necessary to calculate budget slack and were therefore dropped
from the final sample. In addition, twenty-eight subjects (twenty-three of whom
were assigned the truth-inducing contract) unexpectedly chose a budget target
higher than their expected future performance. The economic incentives imbedded
in the budget-based contracts used in this study were meant to encourage subjects
157
RESULTS
Manipulation Check for Contracting Process
To ensure subjects assigned the scenarios designed to represent fair and unfair
incentive contracting processes actually perceived these processes to be fair or
unfair respectively, subjects were asked to answer the following questions on
a scale of one (completely unfair) to five (completely fair): How fair would
you judge the procedures used to set the formula on which your earnings were
based? and How fair would you judge the process of setting the budget used
to calculate your earnings? These questions were based on measures reported
in Tyler and Lind (1992).9 Each subjects score was their mean score across the
two questions included in the scale. The overall mean score on this scale was 3.60
(std. dev. = 0.81, Cronbachs alpha = 0.67). Means and standard deviations for
perceived fairness of the contracting process are presented in Table 1, Panel A.
A 2 2 analysis of variance was performed on subjects perceptions of the
fairness of the contracting process (see Table 1, Panel B). Results indicated a
significant difference in subjects perceptions of the fairness of the contracting
process depending on whether they read the scenario describing the contracting
158
THERESA LIBBY
Marginals
Slack-Inducing
Contract
Truth-Inducing
Contract
Marginals
3.71
(0.63)
n = 42
3.41
(0.99)
n = 41
3.55
(0.84)
n = 83
3.77
(0.79)
n = 31
3.56
(0.75)
n = 28
3.67
(0.77)
n = 59
3.73
(0.70)
n = 73
3.47
(0.90)
n = 69
3.60
(0.81)
n = 142
SS
df
MS
Contract type
Contracting process
Contract process
0.30
4.48
0.84
1
1
1
0.30
4.48
0.84
107.46
138
0.78
Error
p
F
0.38
5.75**
1.08
< 0.05.
process designed to be fair or unfair, F(1, 138) = 5.75 ( p < 0.05) but perceptions
of fairness did not differ depending on contract type, F(1, 138) = 0.38, or the
contract by process interaction, F(1, 138) = 1.08, indicating subjects responded
to the manipulation of process fairness as expected.10
In the fair contracting process condition, perceived fairness may also have
manifested itself as a felt social pressure to adhere to the existing norms or culture
of fairness within this organizational division (Naumann & Bennett, 2000).
This perspective may be implied in subjects response to a post-experimental
question asking them to rate the fairness of the work environment. A 2 2
analysis of variance indicated a significant main effect of contracting process on
subjects evaluation of the fairness of the work environment, F(1, 138) = 54.58
( p < 0.001), with subjects in the fair process condition rating the work environment as significantly fairer (mean = 3.34, std. dev. = 1.03) than subjects
in the unfair process condition (mean = 2.35, std. dev. = 1.08). No significant
differences were found based on contract form or the contract by process
interaction.
159
Hypothesis Tests
A 2 2 analysis of variance with adjustment for non-orthogonality (regression
approach) was used to determine the statistical significance of the differences in
the mean amount of slack created in each experimental condition. The one hundred
and forty-two usable observations were included in this analysis. Cell means for
slack created by experimental condition are presented in Table 2 (Panel A) and
Fig. 2. Results of the analysis of variance are presented in Table 2 (Panel B).
Based on previous theoretical and empirical results, H1 predicted subjects
assigned the truth-inducing contract would create less budget slack than subjects
assigned the slack inducing contract. Results of the analysis of variance reveal a
significant main effect of contract type, F(1, 138) = 26.99 ( p < 0.001) indicating
Table 2. Budgetary Slack by Contract Type and Contracting Process.
Panel A: Mean (Standard Deviation) of Budgetary Slack
Slack-Inducing
Contract
Fair contracting process
Truth-Inducing
Contract
2.40
(2.38)
n = 42
3.85
(3.97)
n = 41
0.80
(1.47)
n = 31
0.71
(1.49)
n = 28
SS
df
MS
Contract type
Contracting process
Contract process
193.21
15.85
20.44
1
1
1
193.21
15.85
20.44
Error
987.79
138
7.16
< 0.10.
< 0.001.
df
Significance
4.09
81
p < 0.05
0.06
57
p = 0.94
F
26.99***
2.21
2.86*
160
THERESA LIBBY
DISCUSSION
This study explores the relationship between fair and unfair contracting processes,
budget-based compensation contracts, and the creation of budgetary slack. Prior
research examines the effectiveness of a variety of forms of truth-inducing
contracts in reducing budgetary slack. The current study contributes to the
literature by examining the effectiveness of two specific contract forms when
161
162
THERESA LIBBY
between fair contracting process and the creation of budgetary slack in the
incentive-contracting setting studied here.
NOTES
1. Task and environmental uncertainty are fundamental issues faced by managers
(Thompson, 1967). Task uncertainty refers to the difficulty of the task, its degree of variability and the extent to which successful completion of the task depends on the successful
completion of other tasks (Tushman & Nadler, 1978). Environmental uncertainty has many
dimensions, the most important of which may be the degree to which the organization is connected to and relies on other entities in its environment for information and/or resources and
the extent to which these other entities are undergoing change (Lawrence & Lorsch, 1967).
2. While it may be difficult to distinguish between slack created as a buffer against uncertainty and slack created to game the performance evaluation system in real organizations, the
current study benefits from the control allowed in the laboratory environment. Specifically,
in the laboratory setting described here, uncertainty is held constant across experimental
conditions allowing for an analysis of slack created for budget gaming purposes.
3. While some attention has been paid to the fairness construct in previous accountingrelated studies (Ehlen & Welker, 1996; Hunton & Gibson, 1999; Libby, 1999; Lindquist,
1995; Magner & Welker, 1994; Moser et al., 1995), a search of the literature failed to
indicate any other studies examining the relevance of the fairness construct to the creation
of budgetary slack.
4. Although this task is relatively simple, it is not unlike the typical simple, repetitive
production task for which a piece-rate and/or bonus compensation would be paid in actual
organizations in order to motivate performance. The simplicity of the task means that it is
easily understood by subjects and is easy for them to learn in a relatively short period of
time. Therefore, the task gains in terms of experimental realism what may be lost in terms
of mundane realism.
5. The compensation parameters in both the slack-inducing and truth-inducing contracts
were set based on the results of a pre-test in which average output on the experimental task
for subjects similar in background to those taking part in the experiment was 25 words
translated in five minutes (minimum of 14 words, maximum of 38 words).
6. Voice is a generic term indicating the ability for subordinates to communicate their
interests to their superiors in an organization in order to exert some influence over the
decisions their superiors make (Folger, 1977). Budget participation could be viewed as
a context-specific form of voice defined as the process by which managers communicate
information about their productive capabilities to their superiors in order to influence the
setting of targets in their budget-based incentive contracts (Kren, 1992).
7. Before subjects were paid, their practice period and work period performance was
verified and their total compensation was recalculated.
8. Note that subjects probability of winning the prize is dependent not just on their own
performance, but on the performance of other subjects in the group. Due to the one-period
nature of the experiment and the setting in which the experiment took place, subjects
had no opportunity to collude or act in any strategic way. Also, note that the perceived
attainability of the target is important. If the target had been viewed as unattainable,
163
subjects would have conserved energy by not performing the task and taking the fixed
portion of the payment available under each of the incentive schemes. No subjects took
this strategy indicating that the compensation scheme was motivational and that subjects
viewed the target as difficult, but attainable.
9. This scale also includes questions about outcome fairness. The outcome-related
questions were adapted from Tyler and Lind (1992) as How would you judge the formula
itself that will be used to calculate your earnings for the work period? and How fair would
you judge the budget itself? Perceptions of outcome fairness did not differ depending
on contract type, F(1, 138) = 2.08, process, F(1, 138) = 1.50, or the contract by process
interaction, F(1, 138) = 0.06.
10. As an additional check on subjects perceptions of fairness, subjects were asked to
answer the following question: Think about the information you received about the negotiation process between the workers and managers in this organization that was involved
in setting the earnings formula. On a scale of 1 to 5, where 1 means completely unfair
and 5 means completely fair, how fair would you judge the negotiation process? Results
of a 2 2 analysis of variance indicated a significant main effect of contracting process
on subjects evaluation of the fairness of the negotiation process, F(1, 138) = 39.67,
p < 0.001, with subjects in the fair process condition rating the negotiation process
as fairer (mean = 3.71, std. dev. = 0.86) than subjects in the unfair process condition
(mean = 2.68, std. dev. = 1.09). No significant differences were found based on contract
form or the contract by process interaction.
ACKNOWLEDGMENTS
I would like to thank John Waterhouse, Bill Scott, Duane Kennedy, and Jane
Webster for their guidance in the development and execution of this project. I also
wish to thank Glenn Feltham, Joseph Fisher, Kathryn Kadous, Kevin Kelloway,
Robert Mathieu, Don Moser, Steve Salterio, participants at the 1999 Management
Accounting Research Conference, and the accounting research workshops at
HEC (Montreal) and the University of Alberta for their many helpful comments
and suggestions. I gratefully acknowledge the School of Accountancy, University
of Waterloo and CGA-Canada for their financial support of this project. Data
available from the author upon request.
REFERENCES
Antle, R., & Eppen, G. D. (1985). Capital rationing and organizational slack in capital budgeting.
Management Science (Feb), 163174.
Atkinson, A. (1985). Truth-inducing schemes in budgeting and resource allocation. Cost & Management
(May/June), 3842.
Baiman, S. (1982). Agency research in management accounting: A survey. Journal of Accounting
Literature, 1, 154213.
164
THERESA LIBBY
Baiman, S., & Evans, J. H. (1983). Pre-decision information and participative management control
systems. Journal of Accounting Research, 21, 371395.
Baker, G. P., Jensen, M. C., & Murphy, K. J. (1988). Compensation and incentives: Practice vs. theory.
Journal of Finance, 43(3), 593617.
Bart, C. (1988). Budgeting gamesmanship. Academy of Management Executive, 285294.
Blau, P. (1964). Exchange and power in social life. New York, NY: Wiley.
Chow, C. W. (1983). The effects of job standard tightness and compensation scheme on performance:
An exploration of linkages. The Accounting Review, 58, 667685.
Chow, C. W., Cooper, J. C., & Waller, W. S. (1988). Participative budgeting effects of truth inducing
pay schemes. The Accounting Review, 63, 111123.
Chow, C. W., Hwang, R. N., & Liao, W. (2000). Motivating truthful upward communication of private
information: An experimental study of mechanisms from theory and practice. Abacus, 36(2),
160179.
Cropanzano, R., & Folger, R. (1991). Procedural justice and worker motivation. In: R. M. Staw &
L. W. Porter (Eds), Motivation and Work Behavior (5th ed., pp. 131143). New York, NY:
McGraw-Hill.
Cyert, R. M., & March, J. G. (1963). A behavioral theory of the rm. Englewood Cliffs, NJ: PrenticeHall.
Davila, T., & Wouters, M. (2000). Meeting budgets: Budget emphasis and the release of budgetary
slack. Working Paper: Stanford University, Stanford, CA.
Demski, J., & Feltham, G. (1978). Economic incentives in budgetary control systems. The Accounting
Review, 53, 336359.
Dunk, A. S. (1993). The effect of budget emphasis and information asymmetry on the relation between
budgetary participation and slack. The Accounting Review, 68(2), 400410.
Ehlen, C. R., & Welker, R. B. (1996). Procedural fairness in the peer and quality review programs.
Auditing: A Journal of Practice and Theory, 15(1), 3852.
Eisenberger, R., Fasolo, P., & Davis-LaMastro, V. (1990). Perceived organizational Support and
employee diligence, commitment and innovation. Journal of Applied Psychology, 75,
5159.
Evans, J. H., Hannan, R. L., Krishnan, R., & Moser, D. V. (2001). Honesty in managerial reporting.
The Accounting Review, 76(4).
Evans, J. H., & Sridhar, S. S. (1996). Multiple control systems, accrual accounting, and earnings
management. Journal of Accounting Research, 24(1), 4565.
Fehr, E., & Gachter, A. (2002). Do incentive contracts crowd out voluntary cooperation? Institute for
Empirical Research in Economics, Working Paper No. 34, University of Zurich.
Fehr, E., Klein, A., & Schmidt, K. M. (2001). Fairness, incentives and contractual incompleteness.
CESifo Working Paper No. 445: Center for Economic Studies, Munich.
Folger, R. (1977). Distributive and procedural justice: Combined impact of voice and improvement on
experienced inequity. Journal of Personality and Social Psychology, 35, 108119.
Folger, R., & Cropanzano, R. (1998). Organizational justice and human resource management. Thousand Oaks, CA: Sage Publications.
Greenberg, J. (1986). Determinants of perceived fairness of performance evaluations. Journal of Applied
Psychology, 71(2), 340342.
Greenberg, J. (1987). Reactions to procedural injustice in payment distributions: Do the means justify
the ends? Journal of Applied Psychology, 72(1), 5561.
Hopwood, A. G. (1972). An empirical study of the role of accounting data in performance evaluation.
Journal of Accounting Research, 10, 156182.
165
Hunton, J. (1996). Involving information system users in defining system requirements: The influence
of procedural justice perceptions on user attitudes and performance. Decision Sciences, 27(4),
647671.
Hunton, J., & Beeler, J. D. (1997). Effects of user participation in systems development: A longitudinal
field experiment. MIS Quarterly, 21(4), 359388.
Hunton, J., & Gibson, D. (1999). Soliciting user-input during the development of an accounting information system: Investigating the efficacy of group discussion. Accounting, Organizations and
Society, 24, 597618.
Jennergren, L. P. (1980). On the design of incentives in Soviet firms: A survey of some research.
Management Science (Feb), 193197.
Jensen, M. C. (2001). Corporate budgeting is broken Lets fix it. Harvard Business Review, 79(10),
94101.
Kim, W. C., & Mauborgne, R. A. (1993). Procedural justice, attitudes, and subsidiary top-management
compliance with multinationals corporate strategic decisions. Academy of Management
Journal, 36(3), 502526.
Kirby, A. J. (1992). Incentive compensation schemes: Experimental calibration of the rationality
hypothesis. Contemporary Accounting Research, 8, 374408.
Kirby, A. J., Reichelstein, S., Sen, P. K., & Paik, T. (1991). Participation, slack, and budget-based
performance evaluation. Journal of Accounting Research, 29, 109128.
Kren, L. (1992). Budgetary participation and managerial performance: The impact of information and
environmental volatility. The Accounting Review, 67(3), 511526.
Kren, L., & Liao, W. M. (1988). The role of accounting information in the control of organizations: A
review of the evidence. Journal of Accounting Literature, 7, 280309.
Lawrence, P. R., & Lorsch, J. W. (1967). Organization and environment: Managing differentiation and integration. Boston: Graduate School of Business Administration, Harvard
University.
Leventhal, G. S. (1980). What should be done with equity theory? In: K. J. Gergen, M. S. Greenberg
& R. H. Willis (Eds), Social Exchange: Advances in Theory and Research. NY: Plenum Press.
Libby, T. (1999). The influence of voice and explanation on performance in a participative budgeting
setting. Accounting, Organizations and Society, 24(2), 125138.
Lind, E. A., Kanfer, R., & Earley, P. C. (1990). Voice, control and procedural justice: Instrumental and
non-instrumental concerns in fairness judgments. Journal of Personality and Social Psychology,
59(5), 952959.
Lind, E. A., & Tyler, T. R. (1988). The social psychology of procedural justice. NY: Plenum.
Lindquist, T. M. (1995). Fairness as an antecedent to participative budgeting: Examining the effects of
distributive justice, procedural justice and referent cognitions on satisfaction and performance.
Journal of Management Accounting Research, 7, 122147.
Loeb, M., & Magat, W. (1978). Soviet success indicators and the evaluation of divisional management.
Journal of Accounting Research (Spring), 103121.
Luft, J. (1994). Bonus and penalty incentives: Contract choice by employees. Journal of Accounting
and Economics, 18, 181206.
Magner, N., & Welker, R. B. (1994). Responsibility center managers reactions to justice in budgetary
resource allocation. Advances in Management Accounting (Vol. 3, pp. 237253). Greenwich,
CT: JAI Press.
Melumad, N. D., & Reichelstein, S. (1989). Value of communication in agencies. Journal of Economic
Theory, 47, 334368.
166
THERESA LIBBY
Merchant, K. A. (1985). Budgeting and the propensity to create budgetary slack. Accounting, Organizations and Society, 10(2), 201210.
Moorman, R. H., Blakely, G. L., & Niehoff, B. P. (1998). Does perceived organizational support mediate
the relationship between procedural justice and organizational citizenship behavior? Academy
of Management Journal, 41, 351368.
Moser, D. V., Evans, J. H., III, & Kim, C. K. (1995). The effects of horizontal and exchange inequity
on tax reporting decisions. The Accounting Review, 70(4), 619634.
Murphy, K. J. (2000). Performance standards in incentive contracts. Journal of Accounting and Economics, 30(3), 245278.
Namazi, M. (1985). Theoretical developments of principal-agent employment contract in accounting:
The state of the art. Journal of Accounting Literature, 4, 113163.
Naumann, S. E., & Bennett, N. (2000). A case for procedural justice climate: Development and test of
a multilevel model. Academy of Management Journal, 43(5), 881889.
Otley, D. T. (1985). The accuracy of budgetary estimates: Some statistical evidence. Journal of Business
Finance and Accounting, 12(3), 415425.
Reichelstein, S. (1992). Constructing incentive schemes for government contracts: An application of
agency theory. The Accounting Review, 67, 712731.
Reichelstein, S., & Osband, K. (1984). Incentives in government contracts. Journal of Public Economics, 24, 257270.
Rousseau, D. M., & Parks, J. M. (1993). The contracts of individuals and organizations. In: L. L.
Cummings & B. M. Staw (Eds), Research in Organizational Behavior (Vol. 15). JAI Press.
Thompson, J. D. (1967). Organizations in action. New York: McGraw-Hill.
Tushman, M. L., & Nadler, D. A. (1978). Information processing as an integrating concept in organizational design. Academy of Management Review, 3(3), 613.
Tyler, T. R. (1989). The quality of dispute resolution processes and outcomes: Measurement problems
and possibilities. Denver University Law Review, 66, 419436.
Tyler, T. R., & Lind, E. A. (1992). A relational model of authority in groups. In: L. Berkowitz (Ed.),
Advances in Experimental Social Psychology (Vol. 25, pp. 115191). Academic Press.
Van der Stede, W. A. (2000). The relationship between two consequences of budgetary controls:
Budgetary slack creation and managerial short-term orientation. Accounting Organizations
and Society, 25(6), 609622.
Waller, W. S. (1988). Slack in participative budgeting: The joint effects of a truth-inducing pay scheme
and risk preferences. Accounting, Organizations and Society, 8798.
Waller, W. S., & Chow, C. W. (1985). The self-selection and effort effects of standard-based employment contracts: A framework and some empirical evidence. The Accounting Review, 60(3),
458476.
Walker, K. B., & Johnson, E. N. (1999). The effects of budget-based incentive compensation scheme
on the budgeting behavior of managers and subordinates. Journal of Management Accounting
Research, 11, 128.
Weitzman, M. (1976). The new Soviet incentive model. Bell Journal of Economics (Spring),
251257.
Young, S. M. (1985). Participative budgeting: The effects of risk aversion and asymmetric information
on budgetary slack. Journal of Accounting Research, 23(2), 829842.
Young, S. M., & Lewis, B. (1995). Experimental incentive contracting research in management
accounting. In: R. H. Ashton & A. H. Ashton (Eds), Judgment and Decision-making
Research in Accounting and Auditing (pp. 5575). Cambridge, NY: Cambridge University
Press.
167
APPENDIX A
Sample Payments Under the Truth-inducing Contract
Cells of the table below represent the number of tickets earned under different
combinations of budgeted and actual performance. Diagonal cells were shaded
to emphasize that the maximum payments would be earned when the budgets
subjects selected were equal to their actual performance.
APPENDIX B
Fair and Unfair Contracting Process Scenarios
Fair Process:
You have learned that your supervisor has held this supervisory position for many
years. You have also noted that your supervisor appears to be very popular with
your co-workers. Your supervisors philosophy is that the employees of the division
are the experts when it comes to the work that they do and that much can be learned
from listening to their suggestions.
The formula that is used to calculate your earnings, as was described above, is a
relatively new innovation within this division. The form of the contract was agreed
upon approximately one year ago based on negotiations between representatives
168
THERESA LIBBY
of the employee group and management. The management negotiation group was
headed by your supervisor.
Although you have been told that the negotiation process led to a degree of tension
between your co-workers and your supervisor, your co-workers seem to be fully
supportive of the contract as it now stands. You have been told by one of your
co-workers, whose opinion you respect, that this is mainly due to strong communication between the employee and management groups during the negotiation
process.
You have also noticed that the majority of your co-workers with whom you have
talked about the negotiation process feel that the management team was sincerely
interested in their opinions about the earnings formula. Before the formula was
finalized, the management team performed an informal poll of the employees who
would be affected by it and found that the majority supported it. Whenever an issue
came up on which there was disagreement, the worker and management groups
were able to talk out their differences and come to a satisfactory solution, although
the management group also offered to allow any unresolved issues to be passed on
to an objective third-party decision maker.
Many of the employees of this division have held positions within the division
for many years. While increasing their overall pay is, of course, very important to
your co-workers, providing accurate budgets to management and increasing overall
production efficiency in order to ensure the long-term survival of the organization
also seems to be high on their list of priorities. You have heard four or five of
them say that they would have to be given a pretty large raise in pay before they
would be willing to move to a job in another division mainly due to the positive
atmosphere between employees and managers in this division.
Unfair Process:
You have learned that your supervisor has held this supervisory position for many
years. You have also noted that, although your co-workers are polite and do as the
supervisor asks, he does not seem to be very popular with them. The supervisors
philosophy is that employees should work hard to receive higher pay and leave
all other decisions to him. You have been told that the supervisor feels that his
long-term position as supervisor of the division makes him the best judge of how
the work should be done and he is not really interested in receiving feedback or
suggestions from the employees that he supervises.
The formula that is used to calculate your earnings, as was described above, is a
relatively new innovation within this division. The form of the contract was agreed
upon approximately one year ago based on negotiations between representatives
169
of the employee group and management. The management negotiation group was
headed by your supervisor.
You have been told that the negotiation process led to a great deal of tension between
your co-workers and your supervisor. Your co-workers seem to be quite bitter about
the contract as it now stands. You have been told by one of your co-workers, whose
opinion you respect, that this is mainly due to the lack of communication between
the employee and management groups during the negotiation process.
You have also noticed that the majority of your co-workers with whom you have
talked about the negotiation process feel that the management team appeared to
be completely uninterested in their opinions about the earnings formula. Before
the formula was finalized, the employee group suggested that an informal poll be
taken of employees who would be affected by it to measure their degree of support.
This suggestion was ignored by the management group. Whenever an issue came
up on which there was disagreement, the worker and manager groups found it
difficult to come to a satisfactory solution and generally, the solution was imposed
by the person in charge of the management group, who happens to have been your
supervisor.
Many of the employees of this division have held positions within the division for
only a year or two. Receiving the highest possible earnings at the end of each work
period seems to be of utmost importance to your co-workers. You have heard four
or five of them say that they view their position as only a stepping stone to a
better position within another division of the organization. Increasing production
efficiency and the long-term health of the division by providing accurate budgets
to management does not seem to be high on their list of priorities. A few of your
co-workers have commented that they would not have to be given a very large raise
in pay, or any raise at all, to convince them to move to a job in another division
of the organization where the atmosphere between the workers and the supervisor
was more positive.
1. INTRODUCTION
Research is an integral function of any university and a key determinant of
academic reputation (Baden-Fuller et al., 2000). Primarily, a universitys research
Advances in Accounting Behavioral Research
Advances in Accounting Behavioral Research, Volume 6, 173186
Copyright 2003 by Elsevier Ltd.
All rights of reproduction in any form reserved
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06008-3
173
174
Understanding the factors that impact the level of research (measured in publications) achieved by faculty members is very important from a university perspective.
This is particularly relevant when recruiting new and inexperienced faculty, where
the existing faculty must rely on indicators of possible future publication success
rather than on an observed publication stream. The issue is even more salient given
that Most academics publish very little, or not at all (Demski & Zimmerman,
2000, p. 346). However, research studies in accounting investigating factors that
indicate future publishing output levels are relatively limited (e.g. Cargile &
Bublitz, 1986; Gee & Gray, 1989; Gray & Helliar, 1994; Maranto & Streuly, 1994).
These studies suggest that various factors, such as the institutional setting of the
researcher and possessing a Ph.D., impact the level of research output. A difficulty
with conducting this form of research is that only a relatively small number of
the factors that are likely to influence publication output are observable (e.g.
research interests, Ph.D. qualification). Other relevant factors are likely to be more
difficult to accurately measure (e.g. ability, ambition) (Gray & Helliar, 1994).
This study examines the research behavior of Australian and New Zealand
accounting faculty to determine the characteristics that influence research
productivity. In essence, the study asks what factors will predict the desired
research behavior, namely papers published in quality academic journals. It builds
on the work of Wilkinson and Durden (1998) and Durden et al. (1999) who
measured research outputs in an attempt to enable comparisons of performances
across universities. Those studies served to provide a basis for ranking university
175
departments, but did not seek to explain in any comprehensive sense the observed
differences between individual faculty performances. This study develops a Tobit
model to explain publishing output behavior. The findings indicate that two key
factors contribute to publishing performance holding a Ph.D. qualification
and having an academe-orientation and background rather than an extensive
professional background. Other indicators of publishing productivity were having
stated research interests in the financial accounting, managerial accounting and
auditing fields. This may also reflect a bias in the higher-ranked journals toward
these areas of interest. That is to say, researchers may focus their research efforts in
financial accounting, managerial accounting and auditing because the more highly
ranked journals are more open to accepting research in these areas than in newer
subdisciplines. This is consistent with Daigle and Arnolds (2000) suggestion that
many of the accounting information systems researchers are forced to develop
and promote research interests in other subdisciplines because research in these
other areas (financial, managerial and tax accounting) is more likely to result in
the highly-ranked journal publications required for tenure purposes.
The remainder of the paper is organized as follows. Section 2 develops the
hypotheses in the context of the extant literature. Section 3 outlines the model
development and data analysis. Results are shown in Section 4 and conclusions
and limitations are discussed in Section 5.
2. HYPOTHESIS DEVELOPMENT
Based on an analysis of prior literature several important characteristics appear to
impact research output. First, possessing a Ph.D. impacts research productivity.
Since the Ph.D. comprises by definition an intensive research preparation process,
a positive relationship likely exists between research productivity and possession
of a Ph.D. degree. Arguments about the importance of the Ph.D. are often based
on theories of human capital (Long et al., 1998; Maranto & Streuly, 1994). In this
sense the Ph.D. provides students with higher levels of intellectual capital which
should result in higher levels of research output and career success. This may exist
among graduates from a range of Ph.D. programs rather than being restricted only
to those with high status academic origins (Long et al., 1998). Other research has
also indicated an association between holding a Ph.D. and research productivity
(Gray et al., 1987; Gray & Helliar, 1994). The Australian and New Zealand
context provides an opportunity to further explore the role of the Ph.D. because
only a relatively small proportion of faculty in these two countries hold a Ph.D.
At the time this study was undertaken only 24% of faculty members were Ph.D.
qualified. H1, in the alternate form, is as follows:
176
177
178
179
Table 1. Ratings Derived from Zeff (1996) for Journals in the Sample.
Abacus
Accounting and business research
Accounting and finance
12
12
7
8
2
10
0
Accounting horizons
12
12
12
7
6
3
10
3
7
11
11
6
12
12
3
1
11
11
12
11
11
6
9
3
7
7
where:
years employed = years employed at current institution;
financial, managerial, auditing, tax, theory, education and other = stated faculty
areas of research interest;
U.S. qualified = a dummy variable taking the value of 1 if the highest educational qualification is from a U.S. institution and zero otherwise;
Ph.D. = a dummy variable taking the value of 1 if a Ph.D., DBA or D.Phil.
qualification is held and zero otherwise;
Membership = a dummy variable taking the value of 1 if professional body
membership is held and zero otherwise;
Academe = a dummy variable taking the value of 1 if the individual has less
than 5 years of experience outside academe and zero otherwise.
180
Weighted publications
Years employed at current institution
Panel B: Dummy Variables
Interest
Financial accounting research interest
Managerial accounting research interest
Auditing research interest
Taxation research interest
Accounting education research interest
Accounting theory research interest
Other research interests
Qualifications from US
Ph.D.
Professional membership
Academic orientation
Mean
Standard
Deviation
Minimum
Maximum
716
716
0.29
9.61
0.68
6.77
0
0
5.625
39.000
Frequency
Percent
160
126
84
41
68
17
398
30
172
551
466
22.4
17.6
11.7
5.7
9.5
2.4
55.6
4.2
24.0
77.0
65.1
Since a large number of faculty members had no publications during the period
of measurement, there is a high proportion of zeros in the weighted publications
measure. Thus, while data for the independent variables is available, the data for the
dependent variable is of a censored nature. One possibility would be to estimate the
model via OLS using only those faculty for whom the dependent variable is nonzero. However, as noted by Judge et al. (1988), this results in biased and inconsistent
estimators. A more appropriate approach is to estimate a Tobit regression model.
McDonald and Moffitt (1980, p. 318) identify the Tobit model as assuming an
underlying, stochastic index equal to (X t +u t ) which is observed only when it is
positive, and hence qualifies as an unobserved, latent variable. They express the
stochastic model as follows:
Yt = Xt + ut
Yt = 0
if Xt + ut > 0
if Xt + ut 0
where: t = 1, 2, . . ., N.
A Tobit model is estimated via the SAS LIFEREG procedure, using the normal
probability distribution for the error term. As noted by McDonald and Moffitt
(1980), the estimated regression parameters cannot be interpreted in the usual
sense. They will, however, enable us to ascertain the independent variables that
significantly impact publishing performance. As noted later in the paper, the Tobit
181
1.00000
U.S. Qualifications
0.02101
(0.5747)
1.00000
Membership
Ph.D.
Academe
0.11230
(0.0026)
0.06764
(0.0705)
1.00000
0.04369
(0.2429)
0.15979
(<0.0001)
0.01835
(0.6241)
1.00000
0.01701
(0.6496)
0.06543
(0.0802)
0.13644
(0.0003)
0.11011
(0.0032)
1.00000
Academe
182
Parameter Estimate
p Value
0.67
0.003
0.57
0.57
0.46
0.42
0.38
0.01
0.17
0.11
0.09
1.29
0.55
1.15
606.08
57.02
0.12
18.26
13.63
7.44
2.00
1.30
0.00
2.06
0.22
0.48
114.74
20.07
<0.0001
0.72
<0.0001
0.0002
0.006
0.16
0.25
0.96
0.15
0.64
0.49
<0.0001
<0.0001
this study calls into question the extent to which such individuals will be likely
to achieve quality research outputs, a critical determinant of a universitys
reputation.
H2 (years of employment at current institution) and H3 (membership of
professional body) were not supported. The failure of professional membership
to explain productivity may be related to the fact that a high number of faculty
hold such membership. Faculty may derive significant benefits from such
membership (for example, insurance benefits) such that even faculty with a low
professional interest, may maintain membership. The insignificance of years of
employment is surprising but may indicate that faculty with a strong research
interest maintain that interest over time and may derive sufficient reward within
their own institutions (Gray & Helliar, 1994).
Also of note were the significant coefficients for faculty research interests in
financial accounting, managerial accounting and auditing. This may reflect a bias
in the higher-ranked journals toward these areas of interest (Hasselback et al.,
2000). Some concerns could be raised with respect to the poor performance of
faculty with stated tax research interests. Here, there is a negative, though not
significant, relationship between an expressed interest in taxation and weighted
publications. This may reflect the tendency, particularly in Australia, for tax
faculty to be concentrated in law/business law disciplines. Tax publishing has
183
Zero
Zero to One
One to Two
Without Ph.D.
With Ph.D.
0.64
0.23
0.27
0.37
0.08
0.29
0.01
0.10
accordingly trended toward legal based research rather than empirical accounting
research. This is consistent with comments by Schulman et al. (1996) concerning
the low level of empirical research into the policy implications of tax integration,
a reform that has been implemented in Australia, New Zealand, Canada and the
U.K., along with a range of other countries outside the U.S.. The holding of U.S.
qualifications was also non-significant. This may be the result of the low levels of
individuals holding such qualifications (30 out of 716 faculty).7
As noted earlier, the Tobit model parameters cannot be interpreted in the
same manner as those derived from ordinary least squares. However, the Tobit
model can be used to estimate the probability that an individual with a given
set of characteristics will publish at a certain level. In fact, an entire probability
distribution can be developed for an individual with a given set of characteristics.
For example, consider an individual with 5 years of employment at their current
institution, who has a stated interest in financial accounting, is not a member
of a professional organization and who has less than five years experience
outside the academic environment, is U.S. qualified with no Ph.D., the probability
distribution shown in Table 5, row 1 would arise.8 If, by way of contrast, an
equivalent individual with respect to the stated characteristics is considered but
who also possesses a Ph.D., the probability distribution shown in Table 5, row
2 arises. Thus, the model predicts a higher probability of increased publishing
performance across the board, and a much reduced probability of having no
publications for an individual with a Ph.D. relative to one without.
184
NOTES
1. Employment background was coded as professional for individuals with 5 years or
more experience in a non-academic role, and as academic for those with less than 5 years
experience outside academe.
2. This assessment ignores migration of U.S. citizens already holding Ph.D. qualifications to Australia and New Zealand, about which no a priori belief is held. Further, the
study uses highest qualification from U.S. rather than Ph.D. specifically. A subsequent
test using only U.S. Ph.D. qualification resulted in no qualitative differences.
3. The directory also included part time doctoral teaching assistants and assistant
lecturers, neither of whom would be considered permanent faculty, and were excluded
accordingly. Tutors were also excluded on the basis that their role is explicitly teaching
based, and on grounds that they also tend not to be regarded as permanent faculty.
4. Although the directory is primarily accounting specific, it does include some nonaccounting faculty. Where possible, such faculty were identified and eliminated based on
qualifications, teaching responsibilities and research interests. It is possible, however, that
in some instances non-accounting faculty may not have been identifiable as such and hence
were included. For example, finance faculty listed in the directory that held professional
accounting memberships might not have been clearly distinguishable from accounting
faculty. It is likely, however, that most departments registered only accounting faculty
in the directory and that most non-accounting faculty that were included were identified
and deleted.
5. Limited other deletions were made including the deletion of a dean. Details can be
found in Wilkinson and Durden (1998) and in Durden et al. (1999).
185
6. Only published articles were included. Hence, published book reports and monographs
were excluded from the study.
7. As a further check, this was restricted to U.S. Ph.D. qualifications. The estimated
coefficient was negative but not significant and there was no qualitative change in the other
estimated coefficients.
8. Probabilities are calculated as follows: P(publications W P) = P(Z (W P
t )/) For example, the probability that the individual in Table 5 without a Ph.D. will
publish zero publications is:
P(publications = 0) = P(Z < 0 (1.66455 + 0.00279 5(YEARS)
+ 0.54979(FINANCIAL) + 0.11835(U.S. QUALIFIED)
+ 0.54851(ACADEMIC))/1.15438)
or
ACKNOWLEDGMENTS
The authors wish to thank Peter Westfall for his assistance with the methodological
development. We also thank the editor, Vicky Arnold, and an anonymous reviewer
for helpful comments and suggestions in revising the paper.
REFERENCES
Abdolmohammadi, M. J., Menon, K., Oliver, T. W., & Umapathy, S. (1985). The role of the doctoral
dissertation in accounting research careers. Issues in Accounting Education, 3, 5976.
Baden-Fuller, C., Ravazzolo, F., & Schweizer, T. (2000). Making and measuring reputations: The
research rankings of European business schools. Long Range Planning, 33(5), 621650.
Bairam, E. I. (1996). Research productivity in New Zealand university economics departments,
19881995. New Zealand Economics Papers, 30, 229241.
Beresford, D. R. (2001). Guest editorial: If I could do it over again . . .. The CPA Journal, 71(7), 80.
Blaxter, L., Hughes, C., & Tight, M. (1998). Writing on academic careers. Studies in Higher Education,
23(3), 281295.
Brinn, T., Jones, M. J., & Pendlebury, M. (1996). U.K. accountants perceptions of research journal
quality. Accounting and Business Research, 26(3), 265278.
Brown, L. D., & Huefner, R. J. (1994). The familiarity with and perceived quality of accounting
journals: Views of senior accounting faculty in leading U.S. MBA programs. Contemporary
Accounting Research, 11(1), 223250.
Cargile, B. R., & Bublitz, B. (1986). Factors contributing to published research by accounting faculties.
The Accounting Review, 61(1), 158178.
Daigle, R., & Arnold, V. (2000). An analysis of the research productivity of AIS faculty. International
Journal of Accounting Information Systems, 1, 106122.
Davidson, S. (1957). Research and publication by the accounting faculty. The Accounting Review,
32(1), 114118.
Demski, J. S., & Zimmerman, J. L. (2000). On Research vs. Teaching: A long-term perspective.
Accounting Horizons, 14(4), 343352.
186
Doyle, J. R., & Arthurs, A. J. (1995). Judging the quality of research in business schools: The U.K. as
a case study. Omega International Journal of Management Science, 23(3), 257270.
Durden, C. H., Wilkinson, B. R., & Wilkinson, K. J. (1999). Publishing productivity of Australian
accounting units based on current faculty composition. Pacic Accounting Review, 11(1),
127.
Englebrecht, T. D., Govind, S. I., & Patterson, D. M. (1994). An empirical investigation of the publication productivity of promoted accounting faculty. Accounting Horizons, 8(1), 4568.
Gee, K. P., & Gray, R. H. (1989). Consistency and stability of U.K. academic publication output criteria
in accounting. British Accounting Review, 21(1), 4354.
Gray, R. H., Haslam, J., & Prodham, B. K. (1987). Academic departments of accounting in the U.K.:
A note on publication output. British Accounting Review, 19(1), 5371.
Gray, R., & Helliar, C. (1994). U.K. accounting academics and publication: An exploration of
observable variables associated with publication output. British Accounting Review, 26(3),
235254.
Hasselback, J. R., Reinstein, A., & Schwan, E. S. (2000). Benchmarks for evaluating the research
productivity of accounting faculty. Journal of Accounting Education, 18(2), 7997.
Hull, R. P., & Wright, G. B. (1990). Faculty perceptions of journal quality: An update. Accounting
Horizons, 4(1), 7797.
Imhoff, E. A. (1988). Planning academic accounting careers. Issues in Accounting Education, 3(2),
286301.
Judge, G. G., Hill, R. C., Griffiths, W. E., Lutkepohl, H., & Lee, T.-C. (1988). Introduction to the theory
and practice of econometrics (2nd ed.). New York: Wiley.
Long, R. G., Bowers, W. P., Barnett, T., & White, M. C. (1998). Research productivity of graduates
in management: Effects of academic origin and academic affiliation. Academy of Management
Journal, 41(6), 704714.
Maranto, C. L., & Streuly, C. A. (1994). The Determinants of accounting professors publishing productivity The early career. Contemporary Accounting Research, 10(2), 387407.
Mautz, R. K. (1988). Editorial: Fifty years of accounting. Accounting Horizons, 2(1), 126129.
McDonald, J. R., & Moffitt, R. A. (1980). The uses of Tobit analysis. Review of Economics and
Statistics, 62(2), 318321.
Meyer, M. J., & Titard, P. L. (2000). Those who can . . . teach. Journal of Accountancy, 190(1), 4958.
Newell, G., Langsam, S., & Kreuze, J. (1996). Accounting faculty profiles: Demographics and perceptions of academia. Journal of Education for Business, 72(2), 8794.
Nobes, C. W. (1985). International variations in perceptions of accounting journals. The Accounting
Review, 60(4), 702705.
Otley, D. (2002). British research in accounting and finance (19962000): The 2001 research assessment exercise. British Accounting Review, 34(4), 387417.
Schulman, C. G., Thomas, D. W., Sellers, K. F., & Kennedy, D. B. (1996). Effects of tax integration
and capital gains tax on corporate leverage. National Tax Journal, 46(1), 3154.
Wiley, J. (1998). Jacaranda Wiley directory of accounting: 19981999. Brisbane, Australia: Jacaranda
Wiley.
Wilkinson, B. R., & Durden, C. H. (1998). A study of accounting faculty publishing productivity in
New Zealand. Pacic Accounting Review, 10(2), 7595.
Zeff, S. A. (1996). A study of academic research journals in accounting. Accounting Horizons, 10(3),
158177.
Zivney, T. L., Bertin, W. J., & Gavin, T. A. (1995). A comprehensive examination of faculty publishing.
Issues in Accounting Education, 10(1), 125.
CLASSIFICATION OF CUSTOMIZED
ASSURANCE SERVICES BY DECISION
MAKERS: THE CASE OF SysTrust
Philip R. Beaulieu
ABSTRACT
When decision makers encounter new assurance services that can be
customized for individual clients, they must include them in their pre-existing
categorization of assurance, a cognitive task known as postclassication.
This paper draws upon three literatures (classication research in accounting, theory of assurance, and cognitive psychology) in order to suggest how
this task might be modeled and studied empirically, using the example of
SysTrust . The role of a necessary condition for successful postclassication
called the category use effect (Ross, 2000), in which decision makers are
reminded of pre-existing categories when they learn to use new categories,
is explained.
1. INTRODUCTION
New forms of assurance1 provided by public accountants have proliferated in the
last decade due to both supply and demand factors. On the supply side, public
accounting firms have sought to generate revenue in growth areas of assurance and
related consulting activities because growth opportunities in the mature market
for traditional financial statement assurance are limited. Demand for innovation in
assurance stems partly from technological innovation, which has led to concerns
Advances in Accounting Behavioral Research
Advances in Accounting Behavioral Research, Volume 6, 189215
2003 Published by Elsevier Ltd.
ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06009-5
189
190
PHILIP R. BEAULIEU
191
192
PHILIP R. BEAULIEU
a going-concern firm. Recall was also the dependent variable used by Moeckel
(1990) and Libby and Trotman (1993); see Libby (1995) for a review of research
in auditing involving knowledge structures and memory.
Memory models and recall have been featured in research concerning external
users of financial statements, although less has been done in this area than in
auditing. Beaulieu (1996) posited that commercial loan officers use a classification
system based on the Five Cs of Credit (character, capacity, capital, conditions
and collateral), a classification system used to teach loan officers to process information and make loan decisions. Greater recall of decision-consistent character
and accounting (capacity and capital) information than decision-inconsistent
information provided evidence that the classification system resided in long-term
memory and biased recall in favor of decision-consistent information. Another
example of research involving users of financial statements is Kida et al. (1998),
who proposed that managers making stock investment and financial difficulty
decisions encode (classify) accounting information according to affect, a positive
or negative response to numerical data. Recall and decision results supported the
existence of an affect-based classification system in long-term memory. A fair
question to ask is whether auditing and accounting classification systems really
exist in the minds of auditors and financial statement users, psychologically and
neurologically, or whether they exist only as conventions that are convenient for
research purposes. Cohen (2000, p. 2) proposed two types of evidential requirements logical and behavioral for hierarchical classification systems, in which
lower level items inherit the properties of higher level items. Logical evidence
requires a convincing argument that a hierarchy is more efficient than alternative
methods of organizing and accessing knowledge. The argument for an efficient
system asserts that it enables economical storage and access to information,
and that representation of factual knowledge at different levels of generality
facilitates the identification of useful analogies (p. 5). Behavioral evidence
consists of experiments in which different hierarchical levels are presented,
causing effects in response times, error rates, and quality of responses.3 Bonner
et al. (1997) illustrates how these two criteria can be used to evaluate potential
classification systems.
Bonner et al. (1997) studied how accounting students learn to estimate the
frequency of financial statement errors. Subjects in their experiment were taught
either: (1) the relationship between financial statement errors and three categories
of transaction cycles: sales and receipts, inventory/purchases, and investments;
or (2) the relationship between errors and three categories of audit objectives:
proper cutoff, validity, and valuation. Subjects then observed a sequence of errors
and finally were asked for frequency estimates. The first hypothesis of Bonner
et al. (p. 391) was:
193
Subjects receiving transaction cycle (audit objective) category instruction prior to experiencing
frequencies will make frequency estimates which more closely reflect experienced error frequencies for transaction cycle (audit objective) categories than for audit objective (transaction
cycle) categories.
194
PHILIP R. BEAULIEU
The term assurance services was not part of the auditing lexicon prior to the
1990s. For example, the classic book on the philosophy of auditing by Mautz and
Sharaf (1961) does not mention levels or categories of audit services in any of its
eight postulates of auditing or five primary concepts of auditing (evidence, due
audit care, fair presentation, independence and ethical conduct), let alone mention
assurance services. The term appeared in auditing textbooks after the AICPA
definition in 1996 a year later in the case of Arens and Loebbecke (1997).
Around that time, audit partners began calling themselves assurance partners.
The meaning of new concepts is adjusted by usage until a generally accepted
meaning is established. The most relevant examples for the purposes of this paper
are the concepts of review and compilation services defined in 1978 by the AICPA.
Statement on Standards for Accounting and Review Services (SSARS) No. 1 stated
that in a review engagement, the CPAs report would indicate limited assurance,
or negative assurance, that nothing came to the attention of the CPA indicating a
material misstatement (Kinney, 2000). A compilation was defined as providing no
opinion and no assurance regarding departures from GAAP, although the CPA is
still associated with the financial statements and has some responsibility (Kinney,
2000). Research regarding the financial statement users most affected by this
classification system, commercial lenders, has provided mixed evidence on their
understanding and use of reviews and compilations. Bandyopadhyay and Francis
(1995) found that loan officers interest rate recommendations and loan decisions
were affected by the level of attestation (including audit, review, and compilation).
Martin et al. (1988) reported that lenders do not generally differentiate between
audits and reviews, but their acceptance of compilations depends on a number of
195
factors, including the level of owners equity and term of the loan. Johnson et al.
(1983) found that level of attestation (audit, review, compilation, and no attestation) did not affect loan decisions; Wright and Davidson (2000) similarly found no
effect on loan risk assessments.
In the United States, a gap between users and practitioners expectations of
audits led to the adoption of many Statements of Auditing Standards (SAS),
including SAS Nos 5260, as well as SAS No. 82 on consideration of fraud in
a financial statement audit. Thus, in addition to the research conducted between
1983 and 2000 on financial statement users perceptions of audit, review, and
compilation services, other papers addressed the expectations gap related solely to
audit-level attestation. Some of this research suggests that expectations gap standards might effectively narrow the gap (e.g. Bamber & Stratton, 1997; Campbell &
Mutchler, 1988; Jennings et al., 1993; Kinney & Nelson, 1996). However, a paper
by Houston and Taylor (1999) on WebTrust indicated that users of that assurance
service incorrectly inferred that additional assurance regarding product quality
was provided.
Although the research cited in the preceding paragraphs offers the hope that
users can be educated in order to calibrate their expectations of assurance services
consistently with practitioners, it also discourages the assumption that decision
makers have any particular classification system in mind. To be conservative, this
paper will assume nothing about the classification hierarchies that decision makers
might have adopted since 1996 to accommodate customized assurance. Instead,
two theoretical classification systems, the AICPA (2000) and Kinney (2000), will
be examined for their potential in assisting decision makers to classify customized
assurance efficiently.
In addition to defining assurance services in terms of improvements to the
quality and context of information, the Special Committee (AICPA, 2000)
related them to attestation and consulting services in a framework of categories.
Attestation is a subcategory of assurance with detailed standards, whereas there is
some overlap between the categories of assurance and consulting activities. The
primary distinction between assurance and consulting is the goal of the service;
assurance improves decision-makers output indirectly, through provision of better
information, whereas consulting aims to aid decision makers directly through
research and findings. The AICPAs positioning of the assurance, attestation, and
management consulting categories is shown in Fig. 1. Essential features of these
categories are described in Table 1; the hierarchical relationship between attestation and assurance is evident in the table. For example, the objective of assurance
is better decision making, which subsumes the narrower objective of attestation,
reliable information. The level of assurance is defined as examination, review, or
agreed-upon procedures in the attestation category, but the assurance category is
196
PHILIP R. BEAULIEU
flexible with regard to levels, which may range from explicit assurance about the
usefulness of information for specific purposes to implicit assurance resulting from
CPA involvement.
The test of logical evidence advocated by Cohen (2000) requires that the
hierarchical system in Fig. 1 be more efficient than alternative classification
systems in terms of information storage and access, and identification of useful
analogies. The system is economical in that there are just seven categories at
three levels; a hierarchy that could accommodate the complexity and variety of
assurance services in fewer categories is hard to conceive. The attestation category
is parsimonious because when decision makers encounter a service that they
expect is attestation, they only have to consider coding it as audit examination,
review, or agreed-upon procedures. The system might help decision makers think
197
Result
Objective
Parties to the
engagement
Independence
Substance of CPA
output
Attestation
Assurance
Consulting
Written conclusion
about the reliability of
the written assertions
of another party.
Recommendations
based on the
objectives of the
engagement.
Reliable information.
Not specified, but
generally three (the
third party is usually
external); CPA
generally paid by the
preparer.
Required by
standards.
Conformity with
established or stated
criteria.
Recommendations might
be a byproduct.
Better decision making.
Generally three (although
the other two might be
employed by the same
entity); CPA paid by the
preparer or user.
Better outcomes.
Generally two; CPA
paid by the user.
Included in definition.
Not required.
Assurance about
reliability or relevance of
information.
Recommendations;
not measured
against formal
criteria.
Written.
Critical information
developed by
Information content
determined by
Level of assurance
Asserter.
Criteria might be
established, stated, or
unstated.
Some form of
communication.
Either CPA or asserter.
Preparer (client).
CPA.
Examination, review,
or agreed-upon
procedures.
No explicit
assurance.
Written or oral.
CPA.
198
PHILIP R. BEAULIEU
Fig. 2
Information Quality Assurance Services (Adapted from Kinney, 2000, p. 12). Source: This
figure is reproduced from Information Quality Assurance and Internal Control for Management Decision Making (2000, Irwin/McGraw-Hill) by W. Kinney and is reproduced with
permission of The McGraw-Hill Companies.
199
200
PHILIP R. BEAULIEU
to their individual circumstances, we proceed in the next section to show how they
could be revised to include SysTrust , an example of customized assurance.
The four principles used to judge whether a system is reliable, mentioned in the
above quotation, are defined as follows (AICPA/CICA, 2000, pp. 1113).
Availability: The system is available for operation and use at times set forth in service-level
statements or agreements.
Security: The system is protected against unauthorized physical and logical access. This principle also addresses privacy concerns related to use of confidential information.
Integrity: System processing is complete, accurate, timely, and authorized.
Maintainability: The system can be updated when required in a manner that continues to provide
for system availability, security, and integrity.
In a SysTrust engagement, a practitioner collects evidence about the effectiveness of controls over the principles for a defined period.4 Version 2.0 lists
over 200 illustrative controls, covering all four principles, whose effectiveness
practitioners may test. The result is a report on whether management maintained
effective controls over the SysTrust principles addressed by the engagement, or
on managements assertion about the effectiveness of controls. Any system may
be addressed by a SysTrust engagement, not just Internet-related systems as in
the case of WebTrust . For example, a corporations financial services system
may be defined for the purposes of a SysTrust engagement as its data center,
including infrastructure such as a CPU and peripherals, software, data, employees,
and procedures.
SysTrust engagements are generally considered attestation because they are
performed under Statement on Standards for Attestation Engagements (SSAE)
No 1, found in Section 100 of the AICPAs Professional Standards. However,
201
The examples that follow this quote are reporting on selected SysTrust principles, engagements for systems in the preimplementation phase, agreed-upon
procedures engagements, and consulting engagements (review level assurance is
not allowed). Thus, Exposure Draft Version 2.0 enables practitioners to customize
SysTrust assurance in several ways to meet specific client needs, but these
adjustments require a great deal of diligence on the part of decision makers to
understand. First, management defines the boundaries of the system in question
in a System Description attached to the management assertion regarding the
effectiveness of its controls, which in turn is attached to the assurance report.
Management can choose to define a system in any way it sees fit; the system
might be narrowly defined, as in the case of a data center, or broadly defined, as
in the case of an outsourced finance and accounting function or ERP system.
A second significant aspect of customization is that Version 2.0 allows reporting
on any one of the four SysTrust principles. Thus, an engagement could address
only the integrity principle, and provide no assurance regarding availability,
security, or maintainability. The accountants report would list all four principles
and state that integrity is the sole principle covered, but it would be left to
decision makers to search for a definition of the integrity principle. As defined
by SysTrust , integrity consists of complete, accurate, timely, and authorized
processing, but the auditors report refers the user to the AICPA (or CICA) Web
site for the definition; it does not appear in the report itself.
Customization under SysTrust (Version 2.0) extends even further than the
definition of system boundaries and reporting on selected principles. There can
also be engagements for systems in the pre-implementation phase, i.e. systems that
have not yet been placed in operation. Here, the practitioner tests the suitability
of the design of controls at a point in time, rather than the operating effectiveness
of controls for a period of time, as is the case for other SysTrust reports. For
pre-implementation phase engagements, the system description attached to the
practitioners report would require additional detail, such as the version of the
system and other appropriate identifiers (AICPA/CICA, 2000, p. 20).
There are few limits to customization of assurance under the proposed
SysTrust , making it relatively difficult to perceive as a single product. However,
202
PHILIP R. BEAULIEU
it has been trademarked and servicemarked in the United States and Canada, and
the brand appears in independent accountants or auditors reports, as in the phrase
SysTrust Principles and Criteria. SysTrust users have some alternatives
when they consider how to integrate it into their existing conceptual frameworks,
a process called postclassification in the cognitive psychology literature (Ross,
1999). They range from creating a single category for SysTrust , with features
of all customized options attached, to creating many SysTrust categories under
pre-existing assurance categories with features matching customization. These
choices are considered below in three possible postclassifications, two using the
AICPAs classification system and one based on Kinneys (2000) hierarchy.
Figure 3 revises Fig. 1 (the AICPA system) by including a category for
SysTrust that spans the attestation and management consulting categories, and
the attestation subcategories of audit and agreed-upon procedures (excluding
review), as defined by Exposure Draft Version 2.0. It might be a challenge for
decision makers to add the SysTrust category because it intersects different
203
levels and types of assurance, but at least there are concrete reference points
(audit, agreed-upon procedures, and consulting) in the classification system. This
postclassification is also likely to foster the brand-name awareness of SysTrust
among decision makers that the AICPA desires by creating a single category for it.
A difficulty with this postclassification is that the subject matter and customization
features of SysTrust , such as reporting on selective system reliability principles,
are not primary identifiers of the category.
An alternative postclassification to that shown in Fig. 3 would be to create three
separate SysTrust subcategories for audit examination, agreed-upon procedures,
and management consulting, as shown in Fig. 4. This option allows decision
204
PHILIP R. BEAULIEU
makers to compare the subject matter of SysTrust with other forms of assurance,
matched according to level of assurance. For example, within the category of
audit examination, the categories of financial statements and SysTrust explicitly
recognize that assertions regarding financial information and systems are involved.
However, breaking SysTrust down into three subcategories of other concepts
might sacrifice brand recognition among decision makers, and since SysTrust
is distributed among several subcategories, increase the cognitive effort required
to classify each new SysTrust engagement. Using the single-category approach
(Fig. 3), more effort is likely expended initially in identifying the breadth of the
category, but less effort might be needed to store and access new information
once postclassification is complete.
Postclassification of SysTrust according to the Kinney (2000) system is
pictured in Fig. 5, which is restricted to the reliability improvement category,
205
the relevant portion of Fig. 2. SysTrust would be excluded from the category
of audits of financial statements and would constitute a subcategory of audits
of internal control quality, business processes, etc. The meaning of audits in
quotations would necessarily expand to include both true audits and quasi-audits.
There is less emphasis on levels of assurance at the top of the Kinney hierarchy
than in the AICPAs classification system, so decision makers would be required
to recognize them at a lower point in the hierarchy, perhaps constructing subcategories of SysTrust for audit, agreed-upon procedures and consulting (not
shown in Fig. 5). Kinneys system is similar to the single-category approach based
on the AICPAs system in that there is a relatively high initial postclassification
cost in creating a comprehensive category having many features. The cost may
be even greater under Kinneys system because analogs of financial statement
assurance levels are further removed from SysTrust . The advantage of Kinneys
classification system is that decision makers could quickly classify SysTrust as
an assurance service that improves reliability of business processes (systems).
206
PHILIP R. BEAULIEU
Classification
learning
Learning of use
Final task
Result
Conclusion
207
208
PHILIP R. BEAULIEU
task separately for each disease. A symptom would be scored correct if it did
diagnose the disease, and incorrect if it indicated the other disease or was one of
the four symptoms that did not diagnose either disease. In the condition where
classification by disease was required in the learning of use phase, the ratio of
correct to incorrect relevant-use symptoms listed (0.80) was significantly higher
than the corresponding ratio for irrelevant-use symptoms (0.58). In the condition
where classification was not required during learning of use, the ratios of correct to
incorrect symptoms were lower and did not differ significantly between relevant use
(ratio = 0.40) and irrelevant-use (ratio = 0.38) symptoms. This result indicates a
category use effect; using the disease categories while learning subclassifications
of the system relevant-use versus irrelevant use symptoms improved the ability
of subjects to list symptoms in general, but particularly symptoms critical to the
subclassification being learned. The critical condition for the category use effect to
occur, identified in this experiment, is that the original categories must be activated
so that they can be revised. In plain language, people must be reminded of original
categories while they learn to use new, related categories in order for their use of
the original categories to be changed in the correct or intended manner.
In four other experiments, Ross (2000) ruled out alternative explanations of
the category use effect and found that it applied to a reverse-order task, in which
subjects were given one or two symptoms and asked to name the disease most
likely for a patient with the symptom(s). In other research, Ross found that the
category use effect applies to a problem-solving task in which formulas must be
learned (Ross, 1999). In short, the effect is robust across variables, measures, and
tasks, although the experiment described above is most relevant to the task of
learning features of customized assurance reports.
Applied to assurance services, the category use effect requires that decision
makers be reminded of initial categories of assurance as they encounter new
services, including SysTrust . The AICPA assumes that initial categories will be
related to the CPA brand name in some fashion (refer to the quote in Section 3).
In the AICPAs initial classification system (Fig. 1), the relevant categories are
attestation, including the subcategories of audit examination and agreed-upon procedures, and management consulting. Presumably, this reminder would heighten
awareness among decision makers of the customization inherent in SysTrust
with regard to level of attestation, regardless of whether postclassification involved
single (Fig. 3) or multiple (Fig. 4) categories for SysTrust . If decision makers
were taught Kinneys (2000) classification system initially (Fig. 2), the one
essential category would be audits of internal control quality, business process,
etc., because SysTrust is entirely contained in that category. However, reminders
about three higher levels of the hierarchy audits of financial statements, reliability
improvement, and relevance improvement may possibly help decision makers
209
6. RESEARCH IMPLICATIONS
Behavioral evidence supporting the psychological existence of any classification
system will most likely be found in controlled experiments similar to Ross (2000).
More importantly, if the category use effect is studied, then this methodology must
be used. This section begins with a detailed explanation of one possible experiment,
then considers variations in the design.
An experiment very similar to the one by Ross (2000), summarized in Table 2,
would involve classification of assurance engagements instead of diseases. The
initial classification learning task would consist of learning relevant features of
an assurance classification system as applied to traditional financial statement
assurance, such as the level of assurance implied by each category. An associated
characteristic would be the users risk level, the risk that an assertion accompanied by a favorable attest report is materially misstated (Kinney, 2000, p. 270).
Risk level might be rated on a four-point scale including low, medium, high, and
210
PHILIP R. BEAULIEU
very high. In the case of the AICPAs system (Fig. 1), audit examination would be
labeled as low risk, review as low to moderate, and agreed-upon procedures as low
to very high (Kinney, 2000). After being taught the classification system, subjects
would be given a series of two-part engagement descriptions (corresponding to
patients in Ross, 2000), the first part containing a description of the firm, its industry, its general financial position and performance, and the second part consisting
of an independent accountants report. Firms would be described as belonging to
different industries, and their financial condition would vary, so that there would
be some uncertainty regarding the risk of using the accounting information for an
investment decision. Subjects would be asked to make an investment decision and
rate users risk for each firm until a criterion level of agreement with the classification systems ratings was achieved, similar to criterion achievement in diagnosis
in Ross (2000).
In the learning of use task, where the manipulation would occur, all subjects
would learn the essential features of SysTrust , such as the decision situations
in which it could be used, levels of assurance, and customization options. Next,
they would all see cases similar to those seen in the first part of the experiment,
except that these would describe various fictitious SysTrust engagements with
different customization features. However, only subjects in the treatment group
would be asked to classify each SysTrust case according to the initial assurance
classification system and would be able to see a picture of the entire system (e.g.
Fig. 1), perhaps with some description of the categories. This classification task
would likely be more difficult than the corresponding task in Ross (2000) because
it is a challenge to perceive relationships between SysTrust engagements
addressing the reliability of systems and traditional forms of assurance concerning
accounting information used in investment decisions. No criterion level of
achievement would be sought at this point; the purpose is to demonstrate to
subjects through experience the similarities and differences between SysTrust
and all other assurance services, and the range of engagements possible within
SysTrust . Subjects in the control group would read the same descriptions of
SysTrust engagements in order to show them specific examples of the service,
but they would not be required to classify the engagements according to the initial
system and would not see a picture of it.
Still in the learning of use stage of the experiment, subjects would be taught
a postclassifcation of assurance services that includes SysTrust , for instance
AICPA option 1 as pictured in Fig. 3. Subjects would be shown where in the
revised system various types of engagements would be placed, according to levels
of assurance and customization provided. The final task of the experiment would
require subjects to read examples of SysTrust engagements (including the
independent accountants reports), rate the reliability of the systems described,
211
and rate the users risk for each of them. The treatment group would observe the
entire postclassification system (e.g. Fig. 3), whereas the control group would
be shown only the part of the postclassification system showing SysTrust . In
the case of Fig. 3, this would be only the oval containing SysTrust and the
subcategories of audit examination, agreed-upon procedures, and management
consulting. The categories of review, attestation, compilation, and assurance in
the initial classification system would not be shown to the control group.
In this design, judgments of users risk replace disease symptoms listed (Ross,
2000) as the dependent variable, but the design follows Ross (2000) in all other
respects as closely as possible. Evidence of a category use effect would be that
subjects in the treatment group rate users risk closer to the levels intended by
the AICPA (e.g. consistently low risk for audit examination engagements) than
the control groups ratings. The category use effect would have been caused first
by the treatment groups having attempted to classify SysTrust with the initial
system and being prompted to recall various levels of users risk associated with
analogous assurance services. Also, they would have been able to observe the
entire postclassification system, not just SysTrust categories, when making
users risk judgments. Stated as a formal hypothesis, the category use effect
would predict:
H1. Decision makers asked to classify SysTrust engagements according to
an initial classification system when learning to use SysTrust , and to use a
complete postclassification system when asked to rate the users risk of systems,
will give ratings closer to those intended by the AICPA than decision makers
not asked to use an initial classification system, or a complete postclassification
system, during learning of use tasks.
The experiment described above could provide behavioral evidence of a category
use effect, but fails to identify a more (or the most) efficient assurance classification system because subjects are only required to learn one system in the
classification learning task. In the absence of a compelling logical argument, as
required by Cohen (2000), that there exists an assurance classification system
that is more efficient than alternative methods of organizing and accessing
knowledge, additional behavioral evidence is needed to address the question of
cognitive efficiency. To answer this question, the research could be extended
by teaching another classification system, such as Kinney (2000, Fig. 2), in
the classification learning task and comparing users risk judgments to results
given the AICPAs system. Other dependent variables measuring cognitive
efficiency, for instance recall of the detailed information about the customized
features of individual SysTrust engagements, could be added to the design.
If one wished to predict that a relatively conceptual classification system is
212
PHILIP R. BEAULIEU
more efficient than a concrete system, then one possible alternative hypothesis
would be:
H2. Among decision makers given the opportunity to use complete initial
assurance classification systems when learning postclassification systems
that include SysTrust , those using systems based on Kinney (2000) will
later recall more information about SysTrust engagements than those using
AICPA-based systems.
Designing research in assurance classification is inherently more complex than
the experiments conducted by Ross (2000). Not only are there alternative initial
classification systems, there are alternative postclassification systems even for the
same initial system, as shown in Figs 3 and 4. When subjects begin participating
they might (e.g. commercial loan officers) or might not (e.g. students not having
taken an auditing course) have already learned an assurance classification system.
Those in the former group might find it difficult to ignore their preconceptions
if they contradict what is taught in the classification learning task. One means
of dealing with pre-existing classification systems among user groups would be
to survey them regarding concepts such as levels of assurance and to adjust the
systems taught in experiments for the results.
Research could be extended beyond the strictly cognitive domain by introducing a dependent variable not used in extant classification research price.
Experimental markets could be employed to measure the willingness of subjects
to pay for customized assurance services, although some abstraction from the
details of specific services such as SysTrust might be necessary. A category use
effect would be evident if subjects who were reminded in some way of an original,
generic assurance classification system as they learned to use customized assurance were willing to pay more for a customized service than subjects not reminded
of the initial system. This result would certainly please the AICPA and lend
support to their hope that the CPA brand can be extended to a broader spectrum
of assurance services.
Finally, research could be extended to other customized assurance services
offered under the AICPA brand, such as ElderCare. Services offered by other
providers could also be included. For example, experiments using recall, users risk
or price as dependent variables could require subjects to classify websites having
either WebTrust or BBB Online seals as to the level of assurance provided. Regardless of the assurance services and dependent variables addressed, the difficulty
remains that an initial classification system must either be assumed or taught to
participants in the research. Consensus is more likely with relatively homogeneous
user groups. Thus, a sample comprised of either commercial loan officers or
institutional investors will be more likely to share a common classification system
213
NOTES
1. Assurance will be defined in this paper as defined by the AICPA (2000, p. 1): independent professional services that improve the quality of information, or its context, for
decision makers. Attestation, including audits, is a subcategory of assurance (see Fig. 1),
and at times in the paper attestation services will be referred to as a type of assurance.
2. The fields of auditing and decision-making uses of accounting information are most
relevant to this paper, but cognitive models of classification also appear in accounting
education literature, e.g. Butler and Mautz (1996) and Bagranoff et al. (1994). There
is considerable discussion of ontologies in information systems literature concerning
databases and artificial intelligence, e.g. Dahlgren (1995), Parsons and Wand (1997),
Terenziani (1995), and Wand and Wang (1996). However, much of this work is based on
culture and language, for example in reproducing users classifications in artificial systems,
rather than psychology and cognition (the focus of this paper).
3. Cohen (2000) discusses two other types of evidence that are less relevant to this paper.
Neuropsychological evidence shows that damage affects hierarchical levels differently.
Ontogenetic evidence shows that children acquire some hierarchical levels before others.
4. Boritz (2001) pointed out that SysTrust assurance does not pertain to system
reliability itself it pertains to effectiveness of controls over principles. He questioned
whether this could cause an expectations gap.
ACKNOWLEDGMENTS
Thanks to Karla Johnstone, Janet Morrill, Steve Salterio, Mike Stein, Michael
Wright, and two anonymous reviewers.
214
PHILIP R. BEAULIEU
REFERENCES
AICPA (1996). Report of the AICPA Special Committee on Assurance Services. http://www.
aicpa.org/assurance/index.htm
AICPA (2000). Assurance Services Definition and Interpretive Commentary. http://www.aicpa.
org/assurance/scas/comstud/defincom/index.htm
AICPA/CICA (2000). SysTrust Principles and Criteria for Systems Reliability Exposure Draft,
Version 2.0. http://www.aicpa.org or http://www.cica.ca
Arens, A., & Loebbecke, J. (1997). Auditing: An integrated approach (8th ed.). Upper Saddle River,
NJ: Prentice-Hall.
Bagranoff, N., Houghton, K., & Hronsky, J. (1994). The structure of meaning in accounting: A crosscultural experiment. Behavioral Research in Accounting (Suppl.), 3557.
Bamber, E. M., & Stratton, R. (1997). The information content of the uncertainty-modified audit report:
Evidence from bank loan officers. Accounting Horizons (June), 111.
Bandyopadhyay, S., & Francis, J. (1995). The economic effect of differing levels of auditor assurance
on bankers lending decisions. Canadian Journal of Administrative Sciences, 12, 238249.
Beaulieu, P. (1996). A note on the role of memory in commercial loan officers use of accounting and
character information. Accounting, Organizations and Society (August), 515528.
Bonner, S., Libby, R., & Nelson, M. (1997). Audit category knowledge as a precondition to learning
from experience. Accounting, Organizations and Society (July), 387410.
Boritz, J. E. (2001). Information systems assurance. In: V. Arnold & S. G. Sutton (Eds), Researching Accounting as an Information Systems Discipline. Sarasota, FL: American Accounting
Association (forthcoming).
Butler, J., & Mautz, R. D., Jr. (1996). Multimedia presentations and learning: A laboratory experiment.
Issues in Accounting Education (Fall), 259280.
Campbell, J., & Mutchler, J. (1988). The expectations gap and going-concern uncertainties. Accounting
Horizons (March), 4249.
Choo, F., & Trotman, K. (1991). The relationship between knowledge structure and judgments for
experienced and inexperienced auditors. The Accounting Review (July), 464485.
Cohen, G. (2000). Hierarchical models in cognition: Do they have psychological reality? European
Journal of Cognitive Psychology, 12(1), 136.
Dahlgren, K. (1995). A linguistic ontology. International Journal of Human-Computer Studies,
43(56), 809818.
Frederick, D., Heiman-Hoffman, V., & Libby, R. (1994). The structure of auditors knowledge of
financial statement errors. Auditing: A Journal of Practice and Theory (Spring), 121.
Houston, R., & Taylor, G. (1999). Consumer percentions of CPA WebTrust assurances: Evidence of
an expectation gap. International Journal of Auditing, 3, 89105.
Jennings, M., Kneer, D., & Reckers, P. (1993). The significance of audit decision aids and precise
jurists attitudes on perceptions of audit firm culpability and liability. Contemporary Accounting
Research (Spring), 489507.
Johnson, D., Pany, K., & White, R. (1983). Audit reports and the loan decision: Actions and perceptions.
Auditing: A Journal of Practice and Theory (Spring), 3851.
Johonson, K., & Mervis, C. (1997). Effects of varying levels of expertise on the basic level of categorization. Journal of Experimental Psychology (September), 248277.
Kida, T., Smith, J., & Maletta, M. (1998). The effects of encoded memory traces for numerical data on
accounting decision making. Accounting, Organizations and Society (July/August), 451466.
215
Kinney, W. (2000). Information quality assurance and internal control for management decision making. Boston: Irwin/McGraw-Hill.
Kinney, W., & Nelson, M. (1996). Outcome information and the expectation gap: The case of loss
contingencies. Journal of Accounting Research (Autumn), 281294.
Libby, R. (1995). The role of knowledge and memory in audit judgment. In: R. Ashton & A. H.
Ashton (Eds), Judgment and Decision-Making Research in Accounting and Auditing.
Cambridge: Cambridge University Press.
Libby, R., & Trotman, K. (1993). The review process as a control for differential recall of evidence in
auditor judgments. Accounting, Organizations and Society (August), 559574.
Lymer, A., Debreceny, R., Gray, G., & Rahman, A. (1999). Business reporting on the internet
(Discussion Paper). London: International Accounting Standards Committee (IASC).
Malt, B., Ross, B., & Murphy, G. (1995). Category coherence in cross-cultural perspective. Cognitive
Psychology, 29, 85148.
Martin, C., Handorf, W., & Clewell, W. (1988). Small business lending and levels of report assurance.
Akron Business and Economic Review (Summer), 6984.
Mautz, R., & Sharaf, H. A. (1961). The philosophy of auditing. Sarasota, FL: American Accounting
Association.
Moeckel, C. (1990). The effect of experience on auditors memory traces. Journal of Accounting
Research (Autumn), 368387.
Nelson, M., Libby, R., & Bonner, S. (1995). Knowledge structures and the estimation of conditional
probabilities in audit planning. Accounting Review (January), 2747.
Osherson, D., Smith, E., Wilkie, O., Lopez, A., & Shafir, E. (1990). Category-based induction. Psychological Review, 97, 185200.
Parsons, J., & Wand, Y. (1997). Choosing classes in conceptual modeling. Communications of the
ACM, 40(6), 6369.
Rosch, E., Mervis, D., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural
categories. Cognitive Psychology, 8(3), 382439.
Ross, B. (1996). Category representations and the effects of interacting with instances. Journal of
Experimental Psychology: Learning, Memory and Cognition, 22, 12491265.
Ross, B. (1997). The use of categories affects classification. Journal of Memory and Language,
37(August), 240267.
Ross, B. (1999). Postclassification category use: The effects of learning to use categories after learning
to classify. Journal of Experimental Psychology: Learning, Memory and Cognition, 25(May),
743757.
Ross, B. (2000). The effects of category use on learned categories. Memory and Cognition, 28(January),
5163.
Terenziani, P. (1995). Towards a causal ontology coping with the temporal constraints between causes
and effects. International Journal of Human-Computer Studies, 43(56), 847863.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 8695.
Wright, M., & Davidson, R. (2000). The effect of auditor attestation and tolerance for ambiguity on
commercial lending decisions. Auditing: A Journal of Practice and Theory (Fall), 6781.