Oldfather: Against Accuracy (As A Measure of Judicial Performance)

Against Accuracy (as a Measure of
Judicial Performance)
CHAD M. OLDFATHER*
INTRODUCTION
n proposing the inclusion of bench presence among the measures of

judicial performance, Judge William Young and Professor Jordan
Singer have offered a significant advance in our efforts to understand
and assess judicial behavior.1 As Young and Singer observed, past inquiries
into judicial productivity have focused primarily on timeliness of
disposition.2 Yet timeliness is only a portion of what we seek from courts.
To deem a district court productive simply because it clears its docket
expeditiously is to disregard a substantial component of the courts social
and institutional role.3 What is more, because we manage what we
measure,4 as the saying goes, the focus on timeliness has created a
preoccupation with speed, likely to the detriment of other valued
components of adjudicative quality.
The purpose of this Essay is not to engage directly with the concept of
bench presence, but rather to address an observation Young and Singer
made in the process of outlining the broader assessment framework within
which they envision bench presence fitting. That observation concerns the
role of accuracy assessments within the project of assessing judicial
behavior. Drawing on the literature concerning public-sector productivity
enhancement, they suggest that an appropriate measure of judicial
performance must take into account the quality of judicial decision-making
in addition to its speed. As they put it, [a] comprehensive analysis of
federal district court productivity must transcend pure efficiency measures
and account as well for the courts unique role as a public forum for
* Professor of Law, Marquette University Law School. Thanks to Mark Spottswood for
providing comments on an earlier draft, and to Robert Steele for his characteristically
outstanding research assistance.
1
Hon. William G. Young & Jordan M. Singer, Bench Presence: Toward a More Complete Model
of Federal District Court Productivity, 118 PENN. ST. L. REV. 55 (2013).
2 Id. at 57.
3 Id.
4 See Louis Lowenstein, Financial Transparency and Corporate Governance: You Manage What
You Measure, 96 COLUM. L. REV. 1335, 1342 (1996).
493
494
New England Law Review
v. 48 | 493
dispute resolution and its ability to provide accurate results and a visibly
fair process for all parties.5
This is quite sensible. Surely we want courts that are not merely fast,
but also good. So what do we mean by quality? Young and Singer are
under no illusion that adjudicative quality is easily distilled to some
concrete measure. Quality has many facets, and the nature of each lies to
some degree in the eye of the beholder.6 Thus, each of these groups [that
constitute the courts constituency] (and individuals within the groups)
may differ somewhat in their outlook and priorities, and it is not possible
to quantify the extent to which one groups expectations should be given
preference over another.7 Still, they contend, if we view the matter at a
sufficiently high level of abstraction we can identify a pair of
commonalities: Regardless of their social or economic status or role in the
court system, Americans expect district court adjudication to feature both a
fair outcome and a fair process.8
Young and Singers focus is of course bench presence, and thus the
process component of quality. But they pause to provide a general (and
sensible) definition of the outcome component: Fair outcomes mean that
both fact-finding and law application are objectively accurate (or as
accurate as possible given the limitations of human cognition).9 So far, so
good. But as they recognize, there is a problem:
[A]lthough accuracy is recognized as a core value of adjudication
and a key measure of its quality, scholars and court watchers are
thus far unable to agree upon a reliable and consistent method to
measure the accuracy of district court decisions. For some cases
or case types, the problem may be lack of agreement on what
constitutes an accurate outcome; for others, it may be that a range
of possible outcomes may all be fairly deemed accurate.10
It is this last point that I wish to expand upon.

Briefly stated, my thesis is this: Attempting to assess the accuracy of
judicial decisions in any scalable way is either impossible or imprudent.
Accuracy, in cases where it counts,11 depends on too many assessments
Young & Singer, supra note 1, at 69.

Id.
7 Id. at 70.
8 Id.
9 Id.
10 Id. at 7879.
11 To be clear, I am not making any claim of radical indeterminacy here. I acknowledge
the existence of easy cases and issues as to which it would be possible to assess accuracy,
though my instincts are with those who imagine that category of cases and issues to be
smaller than often imagined.
6
2014]
Against Accuracy
495
that are too contestable or indeterminable in too many respects. Indeed,

our system recognizes this. The familiar concerns of judicial ethics belie
any systemic belief in the determinacy of outcomes in close cases. Put
simply, if we believed that it is easy to determine whether (some significant
subset of) judicial decisions are right or wrong, we would not care about
such things as whether judges own trivial amounts of stock in corporate
parties that appear before them. Because of this, the most we are able to
say in many cases is that an accurate judicial decision is one rendered in
accordance with applicable procedural safeguards. As a result, the
assessment of judicial performance, much like the regulation of judicial
ethics, may best entail a focus solely on process.
I.
Inaccuracy: An Example
To illustrate the point, let us consider a case in which the judicial

system reached what appears to be the wrong result. It is a storyto most
observers, at leastof a wrongful conviction. In the Wisconsin case of State
v. Edmunds,12 Audrey Edmunds was charged with and convicted of firstdegree reckless homicide for causing the death of an infant in her care. The
prosecutions theory was that this was a case of Shaken Baby Syndrome.
The evidence, as related by the Court of Appeals, was that the child,
Natalie, arrived at Edmundss house at around 7:25 in the morning.13
Natalie was in a fussy mood, and another parent who dropped off his own
daughter with Edmunds testified that Natalie was crying at that time. 14 He
also stated that she was alert and followed the adults with her eyes, as
they moved about the room.15
Edmunds testified that Natalie continued to cry and that at about 8:00
she left Natalie in a bedroom with a bottle while she got her own daughters
dressed.16 When she returned to the bedroom, at around 8:35, the girl was
limp and unresponsive. 17 Edmunds called 911 at 8:41, and rescue personnel
arrived three minutes later.18 Natalie died that evening.19
Edmunds denied doing anything to harm the child. 20 At trial, she
portrayed herself as an experienced and loving child care provider and
12
13
14
15
16
17
18
19
20
598 N.W.2d 290 (Wis. Ct. App. 1999) [hereinafter Edmunds I].
Id. at 293.
Id.
Id.
Id.
Id.
Edmunds I, 598 N.W.2d at 293.
Id.
Id.
496
v. 48 | 493
implied that the injury must have been caused by Natalies parents.21 In
contrast, the prosecutions case relied heavily on medical testimony:
An autopsy showed that [Natalies] head injuries were extremely
severe. For example, she had extensive retinal hemorrhaging of
both eyes; retinal folds, due to the retinas being torn from the
backs of her eyes; bruising on her scalp from an impact injury;
and extensive subdural and subaracnoid hemorrhages. The
physicians who testified for the State said that her major injuries
results from extremely vigorous shaking and as the result of
severe force, comparable to that exerted in an automobile
accident or in falling from a second story window. There was no
evidence that the severe injuries Natalie sustained could have
been the result of an accident, rather than intentional, forceful
conduct, directed specifically at Natalie.22
This was, in other words, a case of Shaken Baby Syndrome, and

according to the then-prevailing understanding of the Syndrome the
person last responsible for the childs care was also responsible for her
injuries.23
Edmundss primary contention on appeal was that the evidence was
insufficient to support her conviction. Sufficiency arguments, of course,
present an uphill battle: I didnt do it is a much more difficult argument
to prevail on than, for example, I did it but they violated my Fourth
Amendment rights.24 The reason for this, of course, is that the job of an
appellate court is not to review factual matters anew. Instead, it is to view
the evidence in the light most favorable to the verdict and to ask whether it
is so insufficient in probative value and force that as a matter of law no
jury could have found guilt beyond a reasonable doubt.25 In the face of
this, for Edmunds to continue to maintain her defense at trialshe did
nothing to harm Nataliewould be futile, because the jury certainly had a
sufficient basis to conclude that she shook the child. She instead focused on
contending that the evidence was insufficient to support the jurys finding
as to the utter disregard element. Although this gave her the advantage
of a legal hook, this argument was likely undeveloped at trial. This is
understandable, for it would have been tricky at best for Edmunds to argue
at trial both that she did nothing to Natalie, but that if she did, she did not
21
Id.
Id. at 294.
23 See Emily Bazelon, Shaken-Baby Syndrome Faces New Questions in Court, N.Y. TIMES MAG.,
Feb. 2, 2011, at MM30, available at http://www.nytimes.com/2011/02/06/magazine/06babyt.html?pagewanted=all.
24 See Chad M. Oldfather, Appellate Courts, Historical Facts, and the Civil-Criminal Distinction,
57 VAND. L. REV. 437, 43839 (2004).
22
25
Edmunds I, 598 N.W.2d at 294.
2014]
Against Accuracy
497
do so with utter disregard. Unsurprisingly, this argument, too, was

unsuccessful.
Over eight years later, the Wisconsin Court of Appeals again issued an
opinion in Audrey Edmunds case.26 She had filed a motion for a new trial
in 2006, arguing that there were significant developments in the medical
community around shaken baby syndrome in the ten years since her trial
that amounted to newly discovered evidence.27 One of the doctors who
testified on behalf of the prosecution now testified for Edmunds. 28 Yet the
trial judgethe same one who presided over her trialrejected her claim,
concluding that Edmunds had not established that there was a reasonable
probability of a different result with the new evidence.29 The Court of
Appeals, however, sided with Edmunds, concluding:
The newly discovered evidence in this case shows that there has
been a shift in mainstream medical opinion since the time of
Edmundss trial as to the causes of the types of trauma Natalie
exhibited. We recognize, as did the circuit court, that there are
now competing medical opinions as to how Natalies injuries
arose and that the new evidence does not completely dispel the
old evidence. . . . However, it is the emergence of a legitimate and
significant dispute within the medical community as to the cause
of those injuries that constitutes newly discovered evidence. . . .
Now, a jury would be faced with competing credible medical
opinions in determining whether there is a reasonable doubt as to
Edmundss guilt.30
The court ordered a new trial because this change created the
possibility that a jury would have a reasonable doubt as to guilt.31
The Court of Appeals did not, of course, conclude that Edmunds was
innocent, and the prosecutors likewise did not concede the point either on
or following her second appeal. In the States motion to dismiss, the
prosecution based its decision not to proceed on the trauma that a new trial
would cause for Natalies parents. The motion related a meeting between
the prosecution team and the parents in which the district attorneys,
stressed our confidence in the case; that the evidence that Audrey
Edmunds inflicted the traumatic brain injuries resulting in
Natalies death was stronger today than it was in 1997; that the
diagnosis was strong and that a purported debate among
26
27
28
29
30
31
State v. Edmunds, 746 N.W.2d 590 (Wis. Ct. App. 2008) [hereinafter Edmunds II].
Id. at 593.
See Bazelon, supra note 23.
Edmunds II, 746 N.W.2d at 594.
Id. at 59899.
Id. at 599.
498
v. 48 | 493
experts should not undermine the significant medical findings in

the case.32
For her part, Edmunds has tried to rebuild her life. She wrote a book
about her ordeal,33 was interviewed by Katie Couric,34 and figured
prominently in stories in, among many others, the New York Times35 and
ABA Journal.36
Is Audrey Edmunds innocent? She may be the only one who knows for
sure. But even if we assume that she did shake the child, there are other
ways in which we might plausibly characterize the outcome in her case as
wrong. Perhaps a proper weighing of the systemic values underlying the
beyond a reasonable doubt standard make not guilty the right answer
given the uncertainty surrounding whether Natalies death was actually
caused by shaking and, if it was, the identity of the person who shook her.
Perhaps we should regard her as guilty, but of a lesser crime. Absent a
reliable confession from her or someone else, it is impossible to say with
certainty.
II. Edmunds and the Illusion of Accuracy
A case like Edmunds reveals that it is extraordinarily difficult to
determine in any meaningful way what accuracy entails. Are we to focus
on the overall, bottom-line result of the case? On that view, the current
consensus is that the trial and appellate courts in the first iteration of the
case got it wrong. They countenanced the conviction of an innocent
woman. Yet that belief is not universally held. The trial judge did not find
the change in the experts views sufficient to warrant the ordering of a new
trial.37 The prosecutors declined to retry Edmunds, but as a result of a costbenefit analysis, rather than a concession that she was innocent.38 Some of
the medical experts changed their opinion about what happened, but not
32
States Motion to Dismiss Prosecution at 3, State v. Edmunds, No. 1996CF000555 (Wis.

Cir. Ct. Dane Cnty. Jul. 11, 2008), available at http://www.madisonmagazine.com/MadisonMagazine/July-2009/Oh-Baby/Motion_to_Dismiss_Audrey_Edmunds.pdf.
33 See generally JILL WELLINGTON & AUDREY EDMUNDS, IT HAPPENED TO AUDREY: A
TERRIFYING JOURNEY FROM LOVING MOM TO ACCUSED BABY KILLER (2012) (describing Audreys
ordeal).
34
Audrey Edmunds, KATIE COURIC, http://katiecouric.com/tag/audrey-edmunds/ (last

visited Jul. 8, 2014).
35 Bazelon, supra note 23.
36 Mark Hansen, Unsettling Science: Experts are Still Debating Whether Shaken Baby Syndrome
Exists, A.B.A. J., Dec. 1, 2011, available at http://www.abajournal.com/magazine/article/
unsettling_science_experts_are_still_debating_whether_shaken_baby_syndrome_/.
37 Edmunds II, 746 N.W.2d 590, 59394 (Wis. Ct. App. 2008).
38 See id.
2014]
Against Accuracy
499
all. Did the courts get it right? It depends not only on who you ask, but for
at least some of the participants, on when you asked them.
Suppose we tighten our focus to consider not merely the overall result
of the case, but also the rulings that took place within it. Edmunds, like most
if not all adjudication, was replete with zones of both factual and legal
discretion. To take an obvious example, trial judges enjoy a considerable
amount of discretion with respect to evidentiary rulings. In Edmunds, there
was a strong basis for arguing that other-acts evidence was erroneously
admitted, and erroneously condoned by the appellate court. 39 Yet reaching
that conclusion requires making contestable judgments about what the law
properly allowsthat is, whether it is acceptable to regard a remark in an
opening statement as having opened the door to the introduction of
character evidence by the prosecutionand whether, in the context of this
case, what actually occurred in the trial was sufficient to trip such a trigger.
39 There is a strong argument that the appellate courts analysis relating to the trial courts
admission of a prior act under Wisconsins equivalent to Federal Rule of Evidence 404(b) is
wrong. The background is this: in his opening statement, Edmundss counsel told the jury that
it will hear from no one who ever saw Audrey do an unloving act to a child. Edmunds I, 598
N.W.2d at 296. In response, the trial court allowed the State to introduce evidence of an
incident in which Edmunds was caring for a child whom she struck over the head with a
hard-cover book. Id. This was justified, the State contended and the Court of Appeals agreed,
to show Edmundss motive in acting as she did and to rebut the defense Edmunds put forth
in opening statement. Id. at 297. The jury could have believed that Edmunds struck her in
the head and shook her to change her behavior. The evidence could also have been used to
show absence of mistake or accident for the injury Natalie sustained . . . . Id.
The problem with this reasoningwhich is all too common in the context of 404(b)
problemsis that it does not confront the central injunction of Rule 404, which is to prohibit
resort to character-based reasoning. Wisconsin Rule of Evidence 904.04(2), like Rule 404(b),
opens by stating, [e]vidence of other crimes, wrongs, or acts is not admissible to prove the
character of a person in order to show that he acted in conformity therewith. The next
sentence is not the articulation of exceptions to this injunction, but rather confirmation that
such evidence is not barred if offered to prove something else. In Edmunds, it is difficult to see
how the evidence shows motive except by showing that the defendant is the sort of person
who is cruel to children and therefore was more likely to have been cruel on this occasion.
That is not what the rule allows. See, e.g., GEORGE FISHER, EVIDENCE 15860 (3d ed. 2013). The
same analysis holds with respect to absence of mistake or accident.
One could conceivably attempt to justify the introduction of the evidence by arguing that
Edmundss counsel opened the door to it via his opening statement. This would not work
because Wisconsins Rule 904.05, like Federal Rule 405, allows for specific instances to be
inquired into on cross-examination of a character witness, but not proven via extrinsic
evidence unless character is an essential element, which it was not. Alternatively, one might
suggest that the evidence was necessary to contradict (or impeach) counsels opening
statement. This seems to be the most promising justification, insofar as it is the only one not
apparently prohibited by the text of the rules. But it is not mentioned in the courts cursory
analysis.
500
v. 48 | 493
If the ruling was wrong, there is a subsequent question: Was the error
consequential enough to require a new trial, or was it harmless? How can
we say for sure? Moreover, even if the result of that individual ruling is
correct, how should we regard the nature of the courts analyses? We do
not have full access to the trial courts ruling, but the appellate courts
treatment of the issue was cursory and unsophisticated. Is that an error?
Instruction of the jury is another area in which trial judges have
discretion. As the Edmunds court in the direct appeal stated the standard, a
trial judge enjoys wide discretion in framing the instructions a jury will
receive in each individual case. In exercising its discretion, the court must
fully and fairly inform the jury of the rules of law applicable to the case
and to assist the jury in making a reasonable analysis of the evidence.40
This notionwhich one could cynically recast as close enough is good
enoughis somewhat perplexing given our casual sense that there is a
fixed content to law such that legal questions, or at least most of them, have
right and wrong answers.
Edmunds, one might contend, is atypical, and therefore not a good
illustration of the point. But Edmunds is atypical only in its notoriety, and
in the sense that it shows us how what most observers would once have
regarded as an easy case was ultimately not easy at all. Its more mundane
aspects are instructive as well. There it serves simply to demonstrate how
difficult it is, if it is possible at all, to say that some substantial portion of
legal or factual analyses are correct or incorrect. The resolution of legal
issues is to some degree dependent on perspective. Whether one thinks a
jury was properly instructed in a given case will be influenced by ones
instincts concerning the appropriate roles and capabilities of judges and
juries, ones view of the underlying substantive law, ones predilections
with respect to the type of case at issue, and a host of other factors of which
one might be only dimly aware. Factually driven determinations are
context-dependent, turning on assessments that are perhaps inherently
subjective. The factors that we take into account in deciding whether to
believe a witness, for example, vary from one person to the next and
probably turn on judgments that are difficult to articulate, let alone review.
Against this backdrop, assessing accuracy would require replicating the
entire case, and even then would be subject to the very real possibility that
different people would come to different conclusions about the same
evidence, or even that the same person would come to different
conclusions at different times.41
40
598 N.W.2d at 299 (quoting State v. Dix, 273 N.W.2d 250, 256 (Wis. 1979)).
See Shai Danziger, Jonathan Levav, & Liora Avnaim-Pesso, Extraneous Factors in Judicial
Decisions, 108 PROC. OF THE NAT L. ACAD. OF SCI. OF THE U.S. OF AM., no. 17, Apr. 26, 2011, at
6889, available at http://www.pnas.org/content/108/17/6889.full.pdf+html.
41
2014]
Against Accuracy
501
One response to this line of reasoning is to suggest that, while we may

not be able to identify right answers, we can at least identify wrong ones.
Put differently, it may be that we can redefine correct to mean falling
within the zone of the proper exercise of discretion. That may be. But we
should not lose sight of what happens if we do so. That would be a large
step toward turning a substantive inquiry into a procedural one, because to
ask whether a judge abused his discretion is often to ask whether the judge
took the proper information into consideration.
As suggested above, recognition of this dynamic seems to be engrained
into the system. Judicial codes of ethics demonstrate this. Because it is too
difficult to tell, with respect to some substantial portion of judicial
decisions, whether any given decision is correct,42 we impose an array of
restrictions designed, at their core, to prevent not merely impropriety, but
also the appearance of impropriety.43 We ask judges to disqualify
themselves in situations where their partiality might be questioned,
whether based on personal bias, economic interest, campaign
contributions, or otherwise.44 Judges can be censured or sanctioned for
behaving inappropriately in these ways or others, but we will not punish
judges for mere legal error.45 We require something more. In the words
of the California Supreme Court: a judge who commits legal error which,
in addition, clearly and convincingly reflects bad faith, bias, abuse of
authority, disregard for fundamental rights, intentional disregard of the
42
Judge Posner made this point in a couple of different ways:

American judges today are subject to exquisitely refined and elaborated
rules on disqualification for conflict of interest. The tiniest potential
conflict is disqualifying. This would make no sense if legal reasoning
(including the resolution of factual disputes) were as transparent and
reproducible as scientific reasoning and experimentation, for then an
erroneous decision would be perceived and corrected and the judge
ridiculed or removed for having yielded to temptation. The legal system
must lack confidence in its ability to detect judicial errors.
RICHARD A. POSNER, THE PROBLEMS OF JURISPRUDENCE 127 (1990).

43 The Preamble to the Model Code of Judicial Conduct provides: Judges should maintain
the dignity of judicial office at all times, and avoid both impropriety and the appearance of
impropriety in their professional and personal lives. MODEL CODE OF JUDICIAL CONDUCT,
pmbl. 2 (2011), available at http://www.americanbar.org/content/dam/aba/
administrative/professional_responsibility/2011_mcjc_preamble_scope_terminology.pdf.
Canon 1 echoes this: A judge shall uphold and promote the independence, integrity, and
impartiality of the judiciary, and shall avoid impropriety and the appearance of impropriety.
Id. at Canon 1, available at http://www.americanbar.org/groups/professional_responsibility/
publications/model_code_of_judicial_conduct.html.
44 Id. at Rule 2.11, available at http://www.americanbar.org/content/dam/aba/administrative/
professional_responsibility/2011_mcjc_rule2_11.authcheckdam.pdf.
45
Sambhav N. Sankar, Disciplining the Professional Judge, 88 CAL. L. REV. 1233, 1272 (2000).
502
v. 48 | 493
law, or any purpose other than the faithful discharge of judicial duty, is
subject to investigation.46 What that amounts to, of course, is a failure of
process. We can generalize and suggest that all of the various procedural
guarantees that attend litigation are products of our recognition of this
dynamic. Even the appellate process ends at a court the finality of which is
not based in its infallibility,47 but in the practical need to have an entity
with the last word.
That leaves us in a position to meaningfully assess the quality of many
judicial decisions only by resort to procedural proxies.48 So long as the
judge appeared to be acting in good faith, subject to no inappropriate
influences, and so forth, even what we label error we will treat as
disagreement. Fortunately for scholars of judicial behavior, these proxies
will often be measurable. Such proxies will often be precisely the sorts of
things that Young and Singer discuss in the procedural component of their
measurements. Measures of judicial performance that are limited to such
factors are likely to be the best we can do.
46 Oberholzer v. Commn on Jud. Performance, 975 P.2d 663, 680 (Cal. 1999) (citations
omitted).
47
We are not final because we are infallible, but we are infallible only because we are
final. Brown v. Allen, 344 U.S. 443, 540 (1953) (Jackson, J., concurring).
48 Judge Posner hints at this idea, as well:
Many of the decisions that constitute the output of a court system cannot
be shown to be either good or bad, whether in terms of consequences
or of other criteria, so it is natural to ask whether there are grounds for
confidence in the design of the institution and in the competence and
integrity of the judges who operate it.
RICHARD A. POSNER, HOW JUDGES THINK 3 (2008).

Oldfather: Against Accuracy (As A Measure of Judicial Performance)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Oldfather: Against Accuracy (As A Measure of Judicial Performance)

Încărcat de

Drepturi de autor:

Formate disponibile

Against Accuracy (as a Measure of

n proposing the inclusion of bench presence among the measures of

New England Law Review

It is this last point that I wish to expand upon.

Young & Singer, supra note 1, at 69.

that are too contestable or indeterminable in too many respects. Indeed,

To illustrate the point, let us consider a case in which the judicial

New England Law Review

This was, in other words, a case of Shaken Baby Syndrome, and

Edmunds I, 598 N.W.2d at 294.

do so with utter disregard. Unsurprisingly, this argument, too, was

New England Law Review

experts should not undermine the significant medical findings in

States Motion to Dismiss Prosecution at 3, State v. Edmunds, No. 1996CF000555 (Wis.

Audrey Edmunds, KATIE COURIC, http://katiecouric.com/tag/audrey-edmunds/ (last

New England Law Review

One response to this line of reasoning is to suggest that, while we may

Judge Posner made this point in a couple of different ways:

RICHARD A. POSNER, THE PROBLEMS OF JURISPRUDENCE 127 (1990).

New England Law Review

S-ar putea să vă placă și