Sunteți pe pagina 1din 15

Journal of Experimental Psychology: Learning, Memonr, and Cognition 19%, Vol. 22, No.

2,510-524

CopyrMii 1996 by the American Psychological Association, Inc. 0278-7393/96/S3.00

The Learning Curve as a Metacognitive Tool


Robert A. Josephs, David H. Silvera, and R. Brian Giesler
University of Texas at Austin
A series of 4 experiments demonstrated that when practicing for a test of problem solving, recognition and selectbn of a particular stopping signal was shown to depend on the within-set variability (statistical noise) of a problem set. When noise was high, participants used a nondiagnostic stopping signal based on the initial size of the problem set. However, when noise was low, access to evidence of learning was possible, resulting in the operation of a stopping signal based on learning curve characteristics. By blocking access to local segments of the learning curve through the use of variability masking, it was demonstrated that participants were sensitive to changes throughout the entire curve. In spite of the long history associated with the psychological study of the learning curve, these results provide the first demonstration of the learning curve as a source of metacognitive judgment and regulation.

On the face of it, preparation for an exam appears to be a relatively simple task. The sensible strategy should involve little more than continuing in one's study efforts until the material is learned. Unfortunately, predictions of performance and judgments of learning are often weakly correlated with actual performance {e.g., Glenberg, Sanocki, Epstein, & Morris, 1987; Maki & Berry, 1984; Maki & Serra, 1992; Weaver, 1990). A number of explanations have been given to account for this poor relationship, including the use of general, undifferentiated domain familiarity in place of text-specific knowledge (e.g., Glenberg et al., 1987), the underutilization of normative item difficulty coupled with an overemphasis on individuating factors (e.g., Nelson, Leonesio, Landwehr, & Narens, 1986), the lack of sensitivity in the comprehension measure (e.g., Weaver, 1990), the substitution of valid preparation cues with computationally simpler but often misleading indicators of preparation (e.g., Josephs, Giesler, & Silvera, 1994; Josephs & Hahn, 1995), lack of prior knowledge (e.g., Maki & Serra, 1992; Josephs & Hahn, 1995), and the nature of the predictor (e.g., feeling of knowingCostermans, Lories, & Ansay, 1992; Metcalfe, 1986; Metcalfe & Wiebe, 1987). This disconnection between judgment and performance has obvious implications for education and human performance. Inadequate preparation can lead to poor performance; overpreparation is inefficient and can lead to poor performance on

other tasks that compete for attention within the limited period of time that characterizes scholastic and work settings (e.g., Josephs & Hahn, 1995; Mazzoni & Cornoldi, 1993). On a more positive note, it should be pointed out that identification of these factors has, in some cases, led to noticeable improvements in the judgment-performance relationship (e.g., Josephs & Hahn, 1995; Maki & Serra, 1992; Weaver, 1990). Single Items Versus Item Strings In spite of the impressive array of factors that have been established through investigations of the judgment-performance relationship, the literature has focused by and large on judgments relating to individual items (e.g., predictions and judgments based on individual text passages, single pieces of general knowledge information, individual insight problems, and so on). A moment's reflection should persuade most readers that judgments of learning and preparedness often are based not on individual items, but on strings of items (e.g., the student studying for an exam judging her preparedness by means of an overall performance trend within a problem set). We propose that there exist metacognitive factors unique to multiple problem situations that parallel the factors discussed above in terms of their implications for the judgmentperformance relationship. In this article, we test the idea that when working on a problem set, the nature of the metacognitive cue that signals readiness and thus leads to a termination of study effortcommonly known as a stopping signal depends in large part on the overall structure of the problem set, rather than on the nature of the individual problems composing the set. Additionally, although the experiments presented in this article do not include performance outcome measures, we suspect that the connection to and implications for the calibration literature cited above will be self-evident. Problem Set Noise and Awareness of the Learning Curve Although there are likely a number of characteristics that may serve to predict the amount of effort devoted to preparing for an exam (e.g., perceived difficulty of the upcoming test, perceived match between practice and test problems, and

Robert A. Josephs, David H. Silvera, and R. Brian Giesler, Department of Psychology, University of Texas at Austin. David H. Silvera is now at the Department of Psychology, University of Tr<sms0, Tn)ms0, Norway. R. Brian Giesler is now at the Department of Medicine, Baylor College of Medicine and at the Houston Center for Quality of Care and Utilization Studies at the Houston Veterans Affairs Medical Center. We would like to thank Larry Cormack, Dan Giibert, Jay Koehler, Greg Nichols, Jonathan Schooler, Dan Schroeder, Bill Swann, Tom Trainham, Bill von Hippel, and Jacqui Woolley for their comments on a draft of this article. Correspondence concerning this article should be addressed to Robert A, Josephs, Department of Psychology, University of Texas, Mezes 330, Austin, Texas 78712. Electronic mail may be sent via Internet to josephs@mail.utexas,edu.

510

LEARNING CURVE AS STOPPING SIGNAL

511

individual level of aspiration and motivation), we propose that the variability in item difficulty within a problem set (in essence, the statistical noise within the set that is created by between-item difficulty) plays a central role in determining the nature of the stopping signal used to terminate study effort. When noise is low, individuals presumably are afforded the opportunity to access changes in their performance (i.e., they have access to their learning curves), making it possible to experimentally investigate the relationship between learning curve characteristics and amount of study effort. Effort should cease soon after the learning curveflattensout, signaling to the learner a lack of marginal gain associated with further study. This seems reasonably intuitive.1 However, we hasten to point out that in spite of the long history associated with the psychological study of the learning curve (e.g., Mazur & Hastie, 1978), it appears that investigations of the learning curve (occasionally referred to as the practice curve) have focused exclusively on its function as a descriptive device used to illustrate the relationship between practice and performance. We are aware of no previous research that investigated the explicit metacognitive functions of the learning curvein essence, a person's awareness of his or her learning curve, and the implications of this awareness for judgment and behavior nor are we aware of research into the conditions under which access to the learning curve is promoted or obscured. Nevertheless, this lack of empirical and theoretical precedent is quite understandable, given the relatively brief history associated with the study of human metacognition. A Stopping Signal Based on Problem Set Size Under conditions in which learning curve access is blocked as a result of a high degree of within-set variability, an individual presumably initiates a search for auxiliary stopping signals. An array of candidates quickly comes to mind. For example, a person can use social comparison information ("I've studied at least as hard as my classmates"), individual base-rate information ("I've studied as much as I usually study"), externally defined context information ("I've completed all of the homework and problem sets assigned by the professor"), information that defies categorization ("I studied until I fell asleep"), and so on. The less than ideal nature of many of these strategies is attested to by the large individual differences found in people's abilities to acquire and master the declarative and procedural knowledge necessary to perform well on such exams (e.g., Lehman, Lempert, & Nisbett, 1988; Staszewski, 1988). For example, for some people completion of an assigned problem set may be more than sufficient to allow for good performance on the exam, whereas for others, a great deal more practice may be needed (e.g., Nelson, 1993). When a test situation becomes familiar and thus predictable (e.g., weekly quizzes), a study strategy that is initially nondiagnostic may become appropriate. For example, a social comparison strategy can be adjusted as the student becomes aware of his or her performance, relative to the rest of the class ("Based on the last exam, I probably need to study more than my classmates if I want to perform as well as they perform"). However, there are numerous and important test situations that do not yield this familiarity advantage (e.g., standardized

tests such as the Scholastic Aptitude Test or Graduate Record Examination, final exams, entrance exams, and so on). In these situations, we argue that learning curve information is the ideal information source for successful and efficient study effort. Yet, as we hope to demonstrate, access to the learning curve depends on an infrequently occurring set of problem set characteristics, thus opening the door to a host of strategies whose diagnostic value is questionable. Although any one of a number of these strategies may be used when learning curve access is not possible, we propose that the nature of many study situations is such that one strategy in particular may enjoy disproportionate use. The fact that many study situations are characterized by a well-defined and discrete problem space (e.g., a problem set assigned prior to a final exam, a series of homework assignments assigned prior to a midterm exam, a study manual used in preparation for a standardized exam) suggests that the degree of completion of a particular set of information can provide a readily apparent and computationally simple indicator of preparation. To the extent that the initial information set is equated with optimal preparation, the "dent" placed in the size of the set can be used to judge progress in skill acquisition. Although, this implies the use of a proportion heuristic in which preparation is gauged as a function of the proportion of initial problem set completed, we do not believe that most individuals engaged in study behavior have an implicit rule that would generate an invariant proportion-to-judgment relationship. Not only is this unlikely because of the cognitive effort involved in such a calculation (e.g., Payne, Bettman, & Johnson, 1990), there is nothing in our experience to suggest the existence of such a rule. Rather, it is more likely that initial set size serves to anchor subsequent judgments of preparedness. Why should set size serve to anchor preparedness judgments? Although admittedly speculative, it may be that the perception of privileged access attributed by the test taker to the problem set provider (e.g., the professor or the study guide publisher) bolsters the test taker's faith in anchoring onto problem set size as a metacognitive control cue ("The professor knows what will be on the test. Furthermore, I have no reason to distrust her. Therefore, I have confidence in the efficacy of preparing for the test by completing the study materials she has provided for me. If I discharge my responsibilities to her by studying hard using the materials she has provided, I will be rewarded with a high grade." Indeed, this logic is quite similar to Lerner's, 1980, notion of the belief in a just world in which rewards and punishments are believed to
1 The term learning curve as used throughout this article is defined not as the formal mathematical function that best captures an individual's practice-performance relationship, but rather in the broad sense as the progressive improvement and subsequent mastery (decrease in marginal improvement) that characterizes the typical practice-performance relationship. Thus, when we state that individuals demonstrate access to and use of their learning curves, we assume only an awareness of improvement and a subsequent leveling-off is performance, leading to a regulation of behavior based on this awareness. We further assume that most people lack the ability to perform the complex mathematical computations necessary to afford them access to the arithmetic functions that formally describe their learning processes.

512

JOSEPHS, SILVERA, AND GIESLER

be consistently and justly rneted out to those who deserve them). The normative status of the use of set size is likely to vary on a case-by-case basis, determined by individual and situational factors. Thus, unlike the strong rhetoric regarding the nonnormative status of other anchoring effects (e.g., Poulton, 1968; Slovic, Fischhoff, & Lichtenstein, 1982; Tversky & Kahneman, 1974), we prefer to adopt a wait-and-see attitude, allowing objective measures of performance to be the ultimate arbiter of what is normative and what is not. Overview of Experiments In Experiment 1, we explored the effects that the within-set variability of a practice set has in the selection of a stopping rule. When variability is low, we suspected that problem solving effort would be influenced by learning curve characteristics. Specifically, we hypothesized that problem solving effort should continue until the participant's performance curve flattens out. When variability is high, we expected effort to be determined by the size of the problem set. The larger the initial set of practice problems, the more problems participants should solve. In Experiment 2, we tested the hypothesis that within-set variability would influence judgments of exam preparedness when solving a predetermined quantity of practice problems. We predicted that judgments of preparedness and predictions of future performance would be determined by the size of the problem set under conditions of high within-set variability. No set-size effects were predicted under low variability conditions, presumably because of the dominance of learning curve information. In Experiment 3, we examined the contributions to metacognitive functioning of the two major components of an anagram-generated learning curve: the descending, or improvement limb, and the flat, or mastery limb. By independently masking access to each limb, we were able to assess each limb's unique contribution to metacognitive regulation. In Experiment 4, we tested the hypothesis that the heuristic set size strategy would persist in spite of the obvious lack of diagnostic value of the initial size of the study set. Experiment 1 Given the lack of empirical precedent to support our primary hypotheses, this first experiment was necessarily exploratory in nature. Participants were told that an anagram solving session they were about to begin served as practice for an anagram test to be administered later in the experimental session. They were instructed to continue solving anagrams until they felt prepared enough to do well on the upcoming test. In a situation such as this, the appropriate stopping signal should be based on a participant's assessment of his or her performance as reflected in a leveling-off of problem solving performance, in essence, the type of information awareness of one's learning curve would provide. Unfortunately, awareness of the learning curve may require a series of formidable mental computations. Any variability in the difficulty of a study set has to be factored out to obtain a valid assessment of learning. So, for example, if several moderate-difficulty problems are followed by several easier problems, which are, in turn, followed by several very difficult problems, an improvement in problem

solving performance attributable to learning cannot be calculated without controlling for and factoring out the variability of difficulty of the problems contained within the set. In Experiment 1, we sought to manipulate the ease with which participants would be made aware of learning curve information. Participants either solved a series of anagrams that were relatively uniform in difficulty or that varied considerably in difficulty. The uniform set was composed exclusively offive-letteranagrams, whereas the variable set was composed of an equivalent mixture of four-, five-, and six-letter anagrams. The logic behind this manipulation was straightforward performance information (e.g., one's learning curve) should be relatively accessible in the uniform anagram set because of the relative lack of variability in the difficulty of the anagram set. The lack of between-anagram variability in difficulty should give participants a chance to access systematic changes in their performance and regulate their problem solving efforts as per these changes. However, the between-anagram variability in the variable set should make access to the learning curve difficult. Changes in problem solving performance as a result of learning should be overwhelmed by the variability in problem difficulty. In essence, noise should surpass signal. In this case, we hypothesized that participants would be forced to use set-size information to regulate their problem solving effort. Thus, we predicted that the initial size of the problem set would determine time and effort spent in the skill acquisition process. To this end, the number of anagrams initially placed before participants was manipulated, such that some participants were given a set of 25 anagrams and others were given 200 anagrams. We predicted that participants in the variable-set condition would be influenced by the initial size of the problem set, such that participants in the 200-anagrarn set condition would solve more anagrams than participants in the 25-anagram set condition, in spite of our efforts to make participants aware of the arbitrary methods used to determine the size of these anagram sets. However, we predicted that participants in the uniformset condition would show a minimal influence of set size. Rather, we entertained the possibility that these participants would cease solving anagrams at a point about which their learning curves began to flatten out. Method
Participants. Ninety-five undergraduates (45 men and 50 women) at the University of Texas at Austin participated in exchange for course credit. Design and procedure. The experimental design took the form of a 2 (set size) x 2 (uniformity of problem set) between-subject factorial design. Participants were either given a 25-anagram set or a 200anagram set. To emphasize the arbitrary nature of the size of the problem set, the experimenter left the laboratory at this point, explaining that "I have to go into the next room to grab a bunch of anagrams for you to work on." The experimenter returned to the laboratory with a stack of either 25 or 200 anagrams, depending on the experimental condition. Anagrams were handwritten on index cards, with 1 anagram per card. The experimenter kept track of problem solving performance by the use of a hand-held stopwatch. Participants were not told how many anagrams were contained in

LEARNING CURVE AS STOPPING SIGNAL each set, which indicated to participants the ostensibly capricious method by which the quantity of anagrams was selected. To add emphasis to the arbitrary nature of the size of the problem set, the experimenter stated that "We have a virtually unlimited supply of these anagrams, so if you finish the ones in front of you, let me know and I'll go grab another bunch." Each problem set was composed of either allfive-letteranagrams or a randomized set of an equal number of four-, five-, and six-letter anagrams. Participants were told that when they felt prepared to perform well on a test of their anagram solving ability, they were to stop work, at which point they would be given a short test of their problem solving ability. Actually, such a test was not administered. Rather, when participants indicated that they felt sufficiently prepared, they were debriefed and dismissed from the experiment.

513

Results Performance was defined simply as the time required to solve each anagram. To test the efficacy of the anagramvariability manipulation, we conducted an analysis of variance (ANOVA) to determine whether the overall performance variation in the variable set was greater than that in the uniform set. For each participant, a least-squares regression line was fitted to his or her performance plot (the time required to solve each anagram was plotted against anagram order), with the ensuing residual indicating variability. These residual scores were averaged within condition, and, as we expected, problem solving performance was significantly more variable in the variable-difficulty set (M = 35.3 s) than in the uniform-difficulty set (M = 25.9 s), F{\, 91) = 5.15,p < .05, MSE = 122.62. The initial size of the anagram set had an overall main effect on number of anagrams solved, with an average of 19.2 anagrams solved in the 25-anagram set and an average of 25.7 anagrams solved in the 200-anagram set, F(l, 91) = 4.42,p < .05, MSE = 267.21. This difference could not be explained by a ceiling effect, as attested to by thefindingthat only 6 out of 49 participants in the 25-anagram set condition solved exactly 25 anagrams (In addition, 4 of the participants in the 25-anagram set condition solved more than 25 anagrams, further weakening the ceiling effect explanation). However, as predicted, the influence of set size was seen most clearly in the variable-difficulty set. A 2 (set size) x 2 (variability condition) ANOVA revealed a marginally significant interaction between the two variables, F(l, 91) = 3.87, p < .06, MSE = 267.21. The nature of this interaction was uncovered in a pair of planned comparisons that revealed that participants solved approximately 40% more anagrams in the variable-difficulty 200-anagram set (M * 30.2) than in the variable-difficulty 25-anagram set (M = 21.6), F(l, 43) = 3.96, p < .06, MSE 267.21, whereas they solved 27% more anagrams in the uniform-difficulty 200-anagram set (M = 21.2) set than in the uniform-difficulty 25-anagram (M = 16.7), an increase that was not statistically significant, F(\, 48) = 1.38, p < .25, MSE = 267.21. Clearly, the relative lack of betweenanagram variability in the uniform set attenuated, but did not completely eliminate, the influence of set size.
Learning curve versus set size as a function ofintraset variabil-

set size than were participants in the variable condition, the question remains as to what stopping signal was used by participants who did not use set size. Recall that we hypothesized that access to the learning curve would promote the use of a stopping signal based on the shape of the learning curve. To test this hypothesis, an exponential decay function was fitted to each participant's performance plot by minimizing the error between the fitted values and the raw scores (a power function, a hyperbolic function, and several polynomial functions were also fitted to participants' performance plots, but it was found that for most participants, an exponential decay function expressed as y = a-e~b& ~ P) -j- c resulted in best fit),2 The first derivative was then calculated for each fitted value, and the fitted value that corresponded to a first derivative {dyldx) closest to - 1 was used as the stopping value.3 We then examined the number of anagrams solved beyond this stopping value as a function of experimental condition.4
Problem solving effort subsequent to dyldx = -1. If low

intraset variability allows learning curve access, then participants in the uniform-set condition should stop at or soon after the point at which problem solving efforts cease to yield significant or noticeable improvement (this point is arbitrarily defined as dyldx = -1). Importantly, they should not use set size as a stopping signal. We did not predict the exact number of anagrams participants in the uniform-set condition should solve after dyldx = -I, but rather only that this number should not differ as a function of set size. Participants in the variable-set condition, however, should not have access to their learning curves and should rely on set-size information as their primary stopping signal. Thus, we predicted an interaction between set-size and intraset variability. Across both set-size conditions, participants with access to their learning curves
2 The constants p and c were included in the equation to allow the curve to shift on the horizontal and vertical axes, respectively, to accommodate individual differences in number of anagrams solved and solution speed. 3 To decide on a point beyond which sustained effort yielded little or no anagram solving improvement, we pooled the entire data set, averaged solution time for each successive anagram, and plotted average problem solving time against number of anagrams solved. This resulted in a curve whose shape resembled an exponential decay function that decelerated rapidly beyond the fifteenth anagram (beyond this point, additional anagrams resulted in problem solving improvements of less than 1 s). On the basis of this informatbn, we guessed that for each participant's fitted performance curve, the point at which performance changed by 1 s was the point beyond which, if access to the learning curve is possible, most participants would realize a lack of marginal gain in performance and thus stop working soon after this point had been reached. In support of this 1-s stopping criterion, we observed that within the pooled data set, moving backwards byfiveanagrams from the 1-s point incurred a 3-s decrease in performance, whereas moving ahead byfiveanagrams resulted in an improvement of a mere 0.4 s. 4 Six participants were dropped from consideration because none solved more than seven anagrams, and all demonstrated negative learning effects (the linear slopes for these participants ranged from 8 to 16. It is therefore likely that these participants quit in response to frustration). Three of these participants were in the uniform-set condition (2 in the 25-anagram set size condition), and 3 were in the variable-set condition (2 in the 200-anagram set size condition).

ity. Although the previous set of analyses demonstrated that participants in the uniform condition were less influenced by

514

JOSEPHS, SILVERA, AND GFESLER

(those in the uniform-set condition) were expected to solve an equivalent number of anagrams after dyldx = 1, and thus were predicted to show no effect of set size. However, participants in the variable-set condition were expected to show a relatively large effect of set size, such that those in the 25-anagram set condition were predicted to solve fewer anagrams than those in the 200-anagram set condition, consistent with their hypothesized lack of access to their learning curves. As Figure 1 shows, the data conformed quite well to our expectations. A 2 (set size) x 2 (intraset variability) ANOVA revealed a statistically significant interaction between set size and intraset variability, F(l, 85) = 6.44, p < .05, MSE = 208.93. A set of planned contrasts revealed that set size had no effect among uniform-set participants (Ms = 6.2 in the 25anagram set, 7.2 in the 200-anagram set), but a large effect among participants in the variable-set condition. Among participants in the variable-set condition, those in the 25anagram set condition solved, on average, 3.2 anagrams beyond the point at which dyidx = 1, whereas those in the 200-anagram set condition solved, on average, 13.1 anagrams, F(l, 41) = 13.83,/; < .05,M5 = 208.93. Discussion In Experiment 1, we sought to explore the role of problem set variability in the selection of stopping signals, and by implication, the importance of problem set construction for academic performance. When the internal variability of a

15

problem set was low, problem solving effort was unaffected by set size. Rather, participants appear to have been guided by learning curve characteristics. On average, participants who were working under low-variability conditions discontinued their problem solving efforts soon after the point at which their learning curves began toflattenout. When variability was high, problem solving effort was strongly influenced by the size of the initial problem set, a heuristic strategy based on initial size of the problem set. Although the degree to which this heuristic permeates real world situations is not known, the conditions under which set-size effects operate suggest the more general question of the extent to which acquisition set characteristics prevent access to normative stopping signals, and thus open the door to a host of low quality heuristic stopping strategies of which a set-size effect is but one of many. The importance of problem set variability for successful skill acquisition seems self-evident, yet has not been a topic of investigation for reasons discussed earlier. The results of Experiment 1 suggest that the extent to which a heuristic and potentially misleading stopping strategy is employed depends on the amount of statistical noise created by a problem set. When noise was low, participants apparently were aware of changes in their performance and appeared to use this information appropriately. Although no poststudy performance measures were incorporated into the design of Experiment 1, the implications for performance are clear. Use of a misleading stopping signal can result in either premature termination of a study session or a needlessly long study session, thus robbing other tasks of the time they deserve. Finally, although the learning curve as a descriptive function has been studied since the earliest days of modern experimental psychology (e.g., Bryan & Harter, 1897), the use of the learning curve as a metacognitive control cue has not, to the best of our knowledge, been the focus of psychological inquiry. Thus, the fact that Experiment 1 provides evidence suggestive of participants' ability to recognize and correctly apply the information communicated by the learning curve is both heartening and exciting. Although the internal variability of the two problem sets used in Experiment 1 was demonstrably different, the fact remains that this difference in variability was produced at a cost. The sets differed in content, with the variable set containing anagrams of a type and difficulty that distinguished it from the five-letter anagrams exclusively composing the uniform set. Thus, it is conceivable that the behavioral differences that resulted from the two sets may have been due to a characteristic other than intraset variability. To address the possibility of a confounding variable that may have arisen as a result of this procedure, a second experiment was conducted in which intraset variability was manipulated in a manner that did not sacrifice across-set anagram equivalence.

Experiment 2
Uniform Variable Intraset Variability

Figure I. Number of anagrams solved post dyidx = 1 as a function of set size and intraset variability.

In Experiment 2, all participants, regardless of condition, were supplied with the same anagrams. To achieve this equivalence, a separate group of pretest participants was used to determine the mean time to solution for each anagram. We then used this information to construct two anagram sets that

LEARNING CURVE AS STOPPING SIGNAL took on the following characteristics: In one set, the anagrams were ordered from most to least difficult, on the basis of their pretest means. In the other set, these same anagrams were randomly sorted. We hypothesized that as a result of the directional nature of the variance in the descending anagram set, participants in this condition would attribute improvements in performance to learning and would thus ignore set size as a control cue. The progressive decrease in normative difficulty associated with this set should mimic and amplify a participant's natural learning effects. It is important to note that although this manipulation was designed to result in an improvement in performance due in part to factors other than learning, the phenomenal experience of learning was expected to remain. Participants were told that the anagrams selected for inclusion in the problem set were of equivalent difficulty. Thus, awareness of a marginal decrease in learning due to practice should communicate the same metacognitive awareness, regardless of the ersatz nature of the performance curve. Similar to the variable-set condition used in Experiment 1, the greater intraset variance of the random-sort set was predicted to block access to learning curve information, thus promoting the use of set-size information. One other important difference existed between Experiments 1 and 2. In Experiment l, recall that participants were allowed to continue working until they felt prepared enough to perform well on an upcoming exam. In Experiment 2, all participants were stopped after completing a predetermined quantity of problems (20 anagrams) and were then given a short questionnaire asking for judgments of preparedness and predictions of future performance. On the basis of the idea that preparedness judgments underlie metacognitive regulation, we expected that the patterns of data obtained in this experiment would mimic those obtained in Experiment 1. In the random-sort condition, judgments were predicted to depend on initial set size, with judgments of preparedness and predictions of upcoming performance being higher in the 25-anagram condition than in the 200-anagram condition. We expected no judgment differences in the descending set condition, as the nature of the anagram sequence should allow participants to judge preparedness on the basis of the shape of their performance curves.

515

For example, Anagram 1 had a mean solution time of 106.0 s, and Anagram 2 had a mean solution time of 301.8 s); (c) The last five (Anagrams 16-20) all had mean solution times that were within 1 s of each other and were, as a group, 5.5 s faster than the last anagram in the descending set (Anagram 15). This set-up was designed to mimic a typical exponential decay curve (as in Experiment 1, we found that a decay function expressed asy = a e~bi-x ~ P> + c resulted in a good fit to these data), (d) We sought to minimize variability within the problem set by excluding anagrams with large standard deviations whenever possible (e.g., if 3 anagrams from the initial set of 60 each had mean solution times of approximately 77 s, the one with the lowest standard deviation was selected). It was the case that quite of few of the six-letter anagrams generated faster mean solution times than the five-letter anagrams. This was partly a function of the number of solutions for a given anagram as well as a function of the particular letter order composing a given anagram (letter order wasfixedprior to the pretest). So, for example, the most difficult of the 20 anagrams chosen was afive-letteranagram, whereas among the easiest group of 5 anagrams (Anagrams 16-20), 2 were six-letter anagrams. This mixture gave the appearance of uniform difficulty and increased the likelihood that the attempt to deceive participants into believing that the anagram set was uniform in difficulty would meet with success. Design and procedure. The experimental design rook the form of a 2 (set size) x 2 (descending vs. variable set) factorial. Participants were randomly assigned to a set-size condition (either 25 or 200 anagrams) and a variability condition. Participants were told that they were participating in a study of learning, in which they would be given a set of practice problems and would subsequently be tested to ascertain how well they learned to solve these types of problems. Participants were told that in preparation for the upcoming problem solving test they were being given a set of practice problems to work on and were to continue working until they were told that time had expired. As in Experiment 1, participants were informed of the existence of an abundance of additional practice problems in an adjoining room, such that if they finished the set in front of them before time expired, the experimenter would supply them with additional anagrams.

Furthermore, participants were told that the anagrams composing the practice set were all equivalent in difficulty and were also equivalent in difficulty to the anagrams that composed the forthcoming exam. As in Experiment 1, the experimenter, using a stopwatch, timed each participant's anagram performance, supplying a one-letter hint for each 60s period that the anagram remained unsolved. Upon completing the 20th anagram, the experimenter informed the participant that time had expired. The experimenter then placed the judgment questionnaire in front of the participant, and asked the participant to complete the questionnaire before taking the exam. The questionnaire consisted of two questions, each followed by a 15-point Method Likert scale. Question 1 was "How prepared are you for the upcoming exam?", with the rating scale anchored at not at all prepared and Participants. Forty-nine undergraduates (22 men and 27 women) extremety well prepared. Question 2 was "How do you think you will at the University of Texas at Austin participated in exchange for course credit. perform on the upcoming exam?", with the rating scale anchored at Pretest to determine anagram difficulty. We pretested a set of 60 five-poorly and extremely well. After completion of the questionnaire, the experimenter informed the participant that the experiment was over. and six-letter anagrams on an independent sample of 38 undergraduThe participant was then debriefed, probed for suspicion, and disates and obtained mean solution times for each anagram. For each missed. pretest participant, within-set anagram order was randomized. Average solution times with associated standard deviations were plotted in order of descending magnitude of solution time, and from this plot a set of 20 anagrams was selected with the following characteristics: (a) Results The 20 anagrams selected for inclusion ranged in average solution time from a high of 106 s to a low of 43 s. (b) Anagrams 1-15 were selected Prior to debriefing, the experimenter asked participants to such that each successive anagram had an associated mean solution indicate which, if any, aspect of the experimental procedure time that was approximately 4 s faster than the previous anagram (as may have produced feelings of suspicion or incredulity. One one would expect, there was some slight variation around this average. participant indicated that he was skeptical of the occurrence of

516

JOSEPHS, SILVERA, AND GIESLER

the upcoming exam. He was dropped from all forthcoming analyses. None of the remaining participants expressed skepticism or suspicion about any aspect of the experimental procedure. Intraset variability. To assess the manipulation of intraset variability, an exponential decay function ( y = a + c ) w a s u s e c i to assess goodness of fit for each e-H'-p) participant's performance function. Average variability was calculated for each participant by taking the average absolute difference between raw and fitted values and then calculating the average within each of the two anagrams sets. In other words, variability scores reflected average variability about the fitted decay function. In testament to the effectiveness of the variability manipulation, average variability in the descending anagram set was under 19 s (18.3 s), but greater than 30 s (30.1 s) in the random-sort set, F(l, 46) = 5.49, p < .05, MSE = 313.51. In spite of the differences in variability, no differences in average overall problem solving time between the two conditions were found (F < 1), attesting to the equivalent difficulty of the two problem sets. Judgment and prediction. We predicted that the influence of set size on judgments of preparedness and predictions of upcoming exam performance would interact with intraset variability. Specifically, set-size effects were predicted to show up only in the random-sort condition. For the question "How prepared are you for the upcoming exam?", this predicted Set Size x Set Variability interaction was supported, F(l, 44) = 6.15, p < .05, MSE = 89.19. As Figure 2 shows, set size had no

25 anagrams
15 -i

200 anagrams

8.

"

75

Descending Intraset Variability

Variable

Figure 3. Predictions of performance as a function of set size and intraset variability.

25 anagrams

200 anagrams

10

7.5 o J,

influence on judgment in the descending size condition, but a large effect in the variable-set condition. A planned comparison revealed that within the random-sort set, participants in the 25-anagram set condition judged themselves to be better prepared, relative to participants in the 200-anagram set condition, F(l, 22) = 6.76,;; < .05, MSE = 89.19 (Ms = 12.9 vs. 8.3, respectively). As Figure 3 shows, the same pattern of results was observed for the prediction question. For the question "How do you think you will perform on the upcoming exam?", the predicted Set Size X Set Variability interaction was supported,.F(l, 44) = 4.99,;? < .05, MSE = 117.37. Set size only had an influence on prediction in the variable-set condition. A planned comparison revealed that participants' predictions of their upcoming performance were higher in the 25-anagram set condition than in the 200-anagram set condition, F(l, 22) = 5.38, p < .05, MSE = 117.37 (Ms = 10.3 vs. 7.6, respectively). Discussion Serving as a conceptual replication of Experiment 1, the internal variability of the problem set was shown to be crucial in determining the operation of learning-curve-based versus set-size-based preparedness judgments. When variability was high, access to information that would have allowed attributions to learning was masked, forcing participants to resort to the size of the problem set as the primary determinant of preparedness judgments and performance predictions. This

2.5

Descending Intraset Variability

Variable

Figure 2. Judgments of preparedness as a function of set size and intraset variability.

LEARNING CURVE AS STOPPING SIGNAL set-size effect was not evident under conditions of low variability. Rather, when variability was low, there was modest, albeit statistically nonsignificant support for the idea that participants used the location of the dy/dx = - 1 stopping signal to judge preparedness and predict performance. It should be noted that in spite of the success we had in tricking participants into attributing learning to a bogus learning curve, the ecological validity of this type of situation is probably quite low. It seems highly unlikely that a problem set engineered to create the experience of learning would be coupled with a deceptive instruction claiming within-set problem equivalence. Obviously, the implementation of such a highly contrived and deceitful situation was deemed necessary in order to defeat the possibility of problems introduced as a result of the variability manipulation used in Experiment 1. In the majority of cases in which a learning curve is generated, it is most likely true that the learning curve does reflect legitimate learning, and thus has the potential to be a high quality source of metacognitive awareness. This ability to recognize and use diagnostic performance information while resisting the influence of a computationally simple judgment heuristic is important, for at least two reasons. First, this finding has important and clear-cut practical significance. It offers a clear prescription for effective and efficient learning. Specifically, it suggests that the challenge for students and others who depend on the acquisition of a particular skill or body of knowledge for performance success lies in the identification and/or construction of conditions under which appropriate signaling information can be accessed, rather than in the use and identification of the signaling information itself. The findings from Experiments 1 and 2 suggest that participants have little problem in recognizing and using learning curve information, Rather, the problem lies in the accessibility of such information. It can be argued further that the responsibility for solving this problem lies not with the problem solver, but rather is the responsibility of those who select and construct the practice sets that are used in the service of skill acquisition. A second argument for these findings' importance can be made by viewing them against the context provided by the judgment literature. Compared to the large number of findings that are classified as part of the heuristics and biases research tradition, the demonstration of individual competence in the recognition and use of what is apparently high-quality judgment information while simultaneously resisting a seductively simple but potentially misleading judgmental shortcut (i.e., set size) is indeed unusual. We would suggest that this demonstration provides a counter to the view of human performance that this literature supports (see, e.g., Koehler, in press). This point is discussed in somewhat greater detail later in the article. In spite of the evidence marshaled thus far demonstrating metacognitive regulation, the strongest case can be made not for use of the learning curve, but rather for use of set size. We can state rather confidently that relative to participants in the variable conditions, participants in the uniform conditions of Experiments 1 and 2 were not influenced by set size. However, the source of their metacognitive regulation remains unclear. Participants clearly were not using set size, but whether or not they defaulted to the use of their learning curves has not yet

517

been verified. A stronger test of learning curve regulation would require an examination of the behavioral effects of experimentally manipulating various segments of the learning curve. Thus, the primary purpose of the following experiment was to provide a more direct test of the regulatory effects of learning curve characteristics by blocking participants* access to different segments of an engineered learning curve of the type used in the previous experiment. Because of the shape of the average learning curve as revealed in Experiment 1 (see footnote 3), it appears reasonable, at least for an initial attempt, to conceptually decompose the anagram-generated learning curve into two relatively discrete components: improvement and mastery. The improvement, or descending limb, corresponds to the initial stage, in which problem solving effort results in a more-or-less monotonic decrease in solution time. The mastery, or flat limb, follows the improvement limb, and is characterized by solution times that are relatively unaffected by problem solving effort. Recall that we have argued repeatedly that learning curve awareness should result in problem solving effort continuing throughout the descending limb, and stopping soon after encountering the flat limb. By generating a noise mask using anagram variability, it becomes possible to examine the independent effects on behavioral regulation of access to these two limbs of the learning curve. In other words, by masking access to one limb, the effects of the other limb on metacognitive regulation can be isolated. We sought to accomplish this by adopting and modifying the paradigm used in Experiment 2. More specifically, we manipulated intra-anagram variability within each limb, resulting in a 2 x 2 + 1 factorial design, with the presence or absence of the variability mask crossed with each of the two limbs of the learning curve. A fifth condition was added to test the hypothesis that the perception of mastery generated by the flat limb is sensitive to awareness of improvement, but not necessarily sensitive to magnitude of improvement.

Experiment 3
In this experiment, all participants were presented with a set of 200 anagrams. Participants were told that the anagrams were pretested to insure that all were roughly equivalent in difficulty. In actuality, each of the five anagram sets was composed of anagrams sequenced to mimic (or mask) a descending and/or flat performance curve. In the no-maskiong condition, both limbs of the learning curve were hypothesized to be accessible. This condition was equivalent in structure to the descending condition used in Experiment 2, in which a series of progressively easier anagrams was followed by a series of anagrams that were roughly equivalent in difficulty. The no-maskshort condition was identical to the no-maskiong condition, except that the descending limb was 50% longer in the no-maskiong condition. The opposite of the no-mask conditions was the full-mask condition, consisting of an equivalent mixture of four-, five-, and six-letter anagrams presented in random order. The full-mask condition was identical to the variable condition used in Experiment 1. The mastery-mask condition consisted of the same series of progressively easier anagrams used in the no-maskiong condition, followed by the

518

JOSEPHS, SILVERA, AND GIESLER

variable difficulty anagrams used in the full-mask condition. In this condition, access to the descending limb should be possible, whereas access to the flat limb should be blocked. In the improvement-mask condition, a series of variable difficulty anagrams (four-,five-,and six-letter) was followed by the same series of equivalent-difficulty anagrams used in both no-mask conditions. In this condition, access to the descending limb should be blocked, whereas access to the flat limb should be possible. In all but the full-mask and no-maskShort conditions, the juncture between improvement and mastery was located at the 16th anagram is the sequence. In the ao-mask^t condition, the flat limb began at the 11th anagram (see Figure 4 for a schematic of thefiveexperimental conditions). As in Experiment 1, all participants were instructed to continue to solve anagrams until they felt prepared to do well on an upcoming test of their problem solving ability. In actuality, no such test was given. As soon as participants indicated that they were ready to begin the exam, they were debriefed and dismissed. Predictions The experimental design generated the following predictions concerning problem solving effort. No-mask. Relative to other conditions, both no-mask conditions were predicted to result in the fewest anagrams solved. The relative lack of noise in these conditions should allow participants maximal awareness of improvement and mastery.

The awareness of a lack of marginal improvement associated with the flat limb was predicted to result in participants discontinuing work on the anagram set soon after this limb is encountered. Of course, we expected participants in the no-maskiong condition to solve more anagrams relative to participants in the no-mask^t condition, because of the longer improvement limb associated with the no-mask]Ong condition. However, subsequent to the juncture between limbs, the point at which participants stop work was predicted not to differ between the no-mask^^ and no-maskiong conditions, in spite of the relative difference in the size of the improvement limbs. This prediction squares well with our intuition regarding the phenomenology of learning curve awareness. Whether it takes a person 10 min, 20 min, or 2 hrs to reach the point at which practice ceases to yield gains in improvement, the feeling of "finally getting it" should result in a rapid termination of effort. It has been our experience that the hectic pace of modern life (especially college life) fosters little patience in mast people for efforts that do not continue to yield tangible gains. Full mask. The high intra-anagram variability in the fullmask condition was hypothesized to block access to the learning curve signal and thus promote the use of set-size information. The intentionally large set size (n = 200), coupled with the structure of the learning curve as represented in the other experimental conditions (recall that the mastery limb began at Anagram 16 in all but the no-mask^, condition, in

no-mask long

no-mask short

full-mask

improvement-mask

mastery-mask

Figure 4. Diagram of experimental conditions, with normed solution times on the ordinate and anagram sequence on the abscissa.

LEARNING CURVE AS STOPPING SIGNAL

519

which the mastery limb began at Anagram 11) was predicted to result in average problem solving effort in the full-mask condition to exceed average problem solving effort in all other conditions. Mastery mask. We hypothesized that people are sensitive to both limbs of the learning curve. However, if the predictions associated with the no-mask conditions are confirmed, the results may be explained by the possibility that participants are sensitive only to the flat limb, and are not necessarily displaying any unique metacognitive effects of the descending limb. The mastery-mask condition was one of two conditions created to test this possibility. If learning curve awareness is identified exclusively with awareness of lack of marginal performance gain due to practice (the flat limb), then the average number of anagrams solved in the mastery-mask condition should not differ from the average number solved in the full-mask condition. However if the descending limb does serve a useful metacognitive function by communicating to participants evidence of their improvement, we predicted that individual differences in participants' reactions upon encountering the noise mask would result in an average number of anagrams solved that would fall somewhere between the averages yielded by the no-mask and full-mask conditions. Although Experiment 3 was not designed to elucidate the phenomenal experience generated by each experimental condition, several likely reactions to the mastery-mask are believed to underlie this prediction. We expected that a certain percentage of participants, as a result of the presumed confidence-shaking effects of the onset of a significant increase in intra-anagram variability, would switch to a set-size signal, thus behaving like participants in the full-mask condition. It also seemed likely that many participants would behave like no-mask participants, terminating effort soon after the noise mask was encountered, resigned to the knowledge that they no longer had access to their learning curves, in essence "cutting bait" once their practice efforts ceased yielding systematic and noticeable benefits. The combined effect of these individual differences would be to yield an average number of anagrams solved that would fall somewhere between the averages yielded by the no-mask and full-mask conditions. We hasten to point out that in spite of our descriptions of possible phenomenological reactions, Experiment 3 was not designed to offer a direct glimpse into the black box. For now, the best we can hope for are proxiesnamely, conditional differences in problem solving effort and problem solving variability. Improvement mask. Problem solving effort in the improvement-mask condition was also predicted to fall somewhere between that observed in the full-mask and no-mask conditions. The metacognitive ambiguity created by masking the descending limb was hypothesized to disappear soon after the flat limb was encountered. However, the lack of a clear picture of one's preparatory progress during the masked descending limb was hypothesized to result in a lack of confidence, relative to the clear and steady illusion of progress that is generated by an accessible descending limb. This difference should encourage some participants to solve a somewhat greater number of problems while on the flat limb, relative to the majority of participants in the no-mask condition. Combined with the predicted lack of difference between the two no-mask condi-

tions, this result would add further support to the idea that selection of a stopping value along the flat limb of the learning curve is sensitive to awareness of improvement, but not magnitude of improvement. A secondary set of predictions targeted within-cell variability. We predicted that the two no-mask conditions would generate the lowest variability in choice of stopping value (stopping value is defined as the number of anagrams solved), on the basis of the hypothesis that individual differences in decisions regarding stopping values are minimized when access to the learning curve is unobstructed. As we have argued repeatedly throughout this article, the signal sent out by the learning curve is believed to enjoy widespread consensus. When practice ceases to affect performance, and when awareness of this relationship exists, most individuals should quickly terminate their efforts. Thus, relative to other conditions, we predicted that choice of stopping values in the no-mask conditions would be characterized by low variability among participants. However, variability among participants in terms of choice of stopping value was predicted to be highest in the full-mask condition. Participants in this condition were hypothesized to rely primarily on set size, a stopping signal that does not yield an obvious stopping value. As a result, individual differences in the interpretation of the set-size signal should result in substantially greater variability in choice of stopping value, relative to other conditions. One cautionary note before proceeding: Because the conditions predicted to generate the lowest variability were also predicted to generate the lowest mean number of anagrams solved, the statistical relationship commonly observed between variance and the mean has the potential to serve as a trivializing explanation for our predicted results concerning within-cell variability. This problem is revisited shortly in the Results section. Method
Participants. Seventy-eight undergraduates (30 men and 48 women) at the University of Texas at Austin participated in exchange for course credit. Design and procedure. The experimental design took the form of a between-subjects design, with intraset location of the noise mask as the experimental factor and with a fifth experimental condition created by shortening the descending limb of the no-maskiong condition. Participants were randomly assigned to one of the following five conditions: no-masking, no-masks^, full-mask, mastery-mask, and improvementmask. The masking anagrams consisted of a randomly determined sequence of four-, five-, and six-letter anagrams (these were the same anagrams that were used in the variable-set condition of Experiment 1). The descending limb used in the no-mask|Ong and mastery-mask conditions was constructed of the same 15 anagrams used in Experiment 2, in order of descending magnitude of average solution time (recall that this ordering yielded an anagram sequence beginning with an anagram that generated an average solution time of 106.7 s and ending with one having an average solution time of 43.0 s). In the no-maskjong and improvement-mask conditions, Anagrams 16-30 ranged in solution time from 29 to 55 s (these anagrams were obtained from the normed anagram set created in the pretest used in Experiment 2). In the no-maskshon condition, the descending limb was composed of Anagrams 6-15 from the no-masking condition, with Anagrams 11-25 in the no-maskshon condition composed of the same anagrams as in the flat limbs of the no-masklong and improvement-mask conditions. The

520

JOSEPHS, SILVERA, AND GIESLER

remaining anagrams in these three conditions were the same four-, five-, and six-letter anagrams that were used in the full-mask and mastery-mask conditions. Unfortunately, the increased variability that would be encountered by participants continuing beyond Anagram 30 (Anagram 25 in the no-maskshort condition) would seriously compromise any interpretation of results in either no-mask or improvementmask condition. However, we assumed that the majority of participants in these conditions would terminate problem solving efforts prior to reaching Anagram 31 (Anagram 26 in the no-maskshort condition). As it turned out, only 1 of 49 participants in any of the conditions in which the mastery limb was not masked solved more than 15 anagrams past the juncture between the improvement and mastery limbs. This participant (from the improvement-mask condition) was included in all of the statistical analyses. Although they differed in structure, we did not expect the five anagram sets to differ in overall difficulty. To test for differences in difficulty, we compared the pretested solution times for Anagrams 1-30 (beyond Anagram 30, all conditions were identical) across the five experimental conditions. A one-way ANOVA revealed that no condition was, on average, easier or more difficult than any other (F < 1). In all other aspects, the procedure used in Experiment 3 was identical to that used in Experiment 1. To recap briefly, participants were told that the anagrams they were about to solve served as practice for an upcoming test of their anagram-solving ability. After leaving and then quickly returning to the laboratory cubicle, the experimenter handed participants a stack of 200 anagrams, each presented individually on index cards. Participants were told to stop working when they felt prepared enough to perform well on such a test, at which point they would be given the test. In actuality, when participants stopped work, they were debriefed and dismissed from the experiment.

Results Problem solving effort. As we had predicted, presence and location of the noise mask on the learning curve had a large effect on the number of anagrams participants solved, F{4, 73) = 12.45, p < .05, MSE = 219.02, As a Tukey's honestly significant difference (HSD) test revealed, participants in the no-maskjhort condition solved, on average, significantly fewer anagrams relative to all other groups, with the average stopping point occurring 5.7 anagrams past the point at which the flat limb began.5 Participants in the no-masktong condition solved significantly fewer anagrams than all but those in the no-maskShort condition, with the average stopping point occurring 4.9 anagrams past the descending-fiat limb juncture. Further confirmation of predictions was the finding of no difference in postjuncture problem solving efforts between the two no-mask conditions (d = 0.8), in spite of the sizable percentage difference (50%) in the magnitude of the descending limbs between the two conditions (\t[ < 1see Figure 5). As we hypothesized, participants did not appear to take the magnitude of the descending limb into account when deciding on a stopping point. In spite of the results observed in the no-mask conditions, it may be that participants were merely demonstrating a sensitivity to the flat limb, rather than a sensitivity to both limbs of the learning curve. If this is true, then problem solving effort in the improvement-mask condition should be equivalent to effort in the no-mask condition, because the flat limb in both conditions was equally accessible. However, a Tukey's HSD test revealed that average problem solving effort (in terms of number of anagrams solved) was greater in the improvement-mask condi-

tion than it was in either no-mask condition, confirming that sensitivity to the descending limb of the learning curve existed (see Figure 5). A second test of descending limb sensitivity was accomplished by comparing problem solving effort between the full-mask and mastery-mask conditions. If sensitivity to the descending limb does not exist, no problem solving differences should have been apparent between these two conditions. However, this was not the case. As predicted, a Tukey's HSD test revealed that problem solving effort was significantly greater in the full-mask condition than it was in the masterymask condition (see Figure 5), demonstrating that an accessible descending limb does serve a metacognitive function. Finally, as predicted, a Tukey's HSD test revealed that participants in the full-mask condition solved significantly more anagrams, relative to all other groups. Variability in choice of stopping value. We predicted that individual differences in choice of stopping value should be minimized when access to the learning curve is unobstructed. Thus, we predicted that among all experimental conditions, within-cell variance would be lowest in the no-mask conditions. We also hypothesized that in the full-mask condition, individual differences in the interpretation of the set-size signal should yield the greatest within-cell variance. Both predictions were strongly confirmed by means of a series of pairwise error variance comparisons (Howell, 1992, p. 187). These comparisons were analyzed by dividing the larger variance estimate by the smaller, with n - 1 degrees of freedom for each estimate. As anticipated, within-cell variance in the no-mask|Otlg condition was significantly lower than in the full-mask condition, F(14, 14) = 7.01, p < .05; the mastery-mask condition, F(\3, 14) = 2.88, p < .05; and marginally significantly lower than in the improvement-mask condition, F(13, 14) = 2.34, p < .08, Also as expected, we found that the full-mask condition generated more noise than any other experimental condition. As reported above, not only was within-cell variance in the full-mask condition greater than in either of the no-mask conditions, it was also significantly greater than in the improvement-mask condition, F(14,13) = 3.03, p < .06, and marginally greater than in the mastery-mask condition, F(14, 13) = 2.43,_p < .06. The larger cell size and equivalent within-cell variance in the no-maskshyrt condition, relative to the no-maskiotIg condition permitted the inference that all statistically significant differences observed in the no-masklong comparisons were shared or exceeded by the no-maskshort comparisons. More important, however, within-cell variances across the two no-mask conditions were found to be nearly identical, F(15, 19) = 1.08, ns. This is important because this lack of difference in variance between the two no-mask conditions, in light of the large difference in mean problem solving effort between these two conditions, strongly argues against an artifactual statistical explanation for these variance data. Thus, the low error variances observed in the two no-mask conditions appear to have been due to the presence of a well-understood metacognitive signal generated by the learning curve, rather than the result of a statistical artifact. ' The value of Tukey's HSD in the experiment was 4.49.

LEARNING CURVE AS STOPPING SIGNAL 20 _

521

15 -

10 _

5 -

no-mask short

no-mask long

improvementmask

masterymask

full-mask

Experimental condition Figure 5. Number of anagrams solved subsequent to the improvement-mastery juncture, by experimental condition.

Discussion By independently masking each of the two limbs of an engineered learning curve, the metacognitive functioning of the learning curve was clearly evident. Behavioral regulation was influenced quite strongly by both the descending limb and the flat limb of the curve. When neither limb was masked, thus maximizing access to both limbs of the curve, problem solving efforts quickly tailed off once the flat limb was encountered. Although participants clearly demonstrated a sensitivity to the descending limb, metacognitive regulation was unaffected by the length of the descending limb. Once they had encountered the flat limb, participants' problem solving efforts were terminated at roughly the same point, regardless of the length of the descending limb. In addition, the Low within-cell variances in the two no-mask conditions confirmed the hypothesis that individual differences in choice of stopping signal are minimized when learning curve access is possible. Apparently, the

signal that is sent out by an unobstructed learning curve is responded to in a remarkably uniform manner, yielding relatively little in the way of individual variation in choice of stopping signal and stopping value. When both limbs were masked, as was the case in the full-mask condition, set size was presumed to be the guiding stopping signal, although the significantly greater within-cell variance in this condition suggests that as a control cue, set size is quite susceptible to individual interpretation. Experiment 4 Afinalexperiment was conducted as a test of an alternative explanation for participants' use of set-size information under conditions of high intraset variability. Although the techniques used in the previous studies were designed to emphasize the arbitrary nature of the size of the problem set, it remains the case that the experimenter was the individual who selected the

522

JOSEPHS, SILVERA, AND GIESLER anagrams that would be placed before them. Participants were then informed that there was no specific number of anagrams that they should do, nor was there a time limit on their task. Rather, they were told that they should work until they felt sufficiently prepared to do well on an upcoming exam testing anagram solving performance. As in Experiment 1, participants were told that there was a virtually unlimited quantity of supplemental anagrams in the next room, and if they finished the anagrams sitting in front of them, more would be gladly provided.

number of anagrams ultimately presented to participants. Therefore, in spite of our attempts to downplay the role of the experimenter as the agent bearing primary responsibility for the size of the initial anagram set, it may be that participants were operating under the belief that the experimenter had privileged knowledge regarding the level of preparedness required for the ostensible upcoming exam and was communicating this knowledge through selection of the initial size of the study set. Thus, the observed set-size effect may have been produced, at least in part, by the participant's belief in the guidance and wisdom of the experimenter. We should make it clear that the determinants of the belief in set size as a metacognitive aid are far from obvious, and it may be that one's belief in the proximal agent (e.g., teacher, professor, experimenter) as privileged knower plays a significant role in the belief in the diagnostic nature of the initial size of the problem set. However, we believe that the diagnostic power ascribed to problem set size is multiply determined and is likely to occur without the presence of such a proximal agent. In the following experiment, the size of the initial anagram set was generated by means of a random procedure controlled and initiated by the participant. The experimenter played no role in determining the size of the anagram set. Participants determined the number of anagrams they were to work on by removing a slip of paper from a cardboard box. Participants were told that the box contained 100 slips numbered sequentially from 1 through 100. In actuality, half of the slips were numbered 25, and the other half were numbered 100. After "randomly" selecting the size of the problem set, participants were told to continue to practice solving anagrams until they felt that their practice efforts resulted in their being prepared to take a test of anagram problem solving. Unlike in the previous experiments, intraset variability was not manipulated. All participants were given a set of four-, five-, and six-letter anagrams (the same set that was used in the variable condition in Experiment 1). Only set size was manipulated, with half of the participants in a 25-anagram set and the other half in a 100-anagram set. We predicted a data pattern resembling that of the Experiment 1 high-variability condition, that is, participants in the 100-anagram set solving more anagrams than participants in the 25-anagram set.

Results and Discussion As predicted, participants in the 100-anagram set condition solved more anagrams than did participants in the 25-anagram set condition (28.80 vs. 20.22, respectively, or roughly 42% more). A one-way (100-anagram set vs. 25-anagram set) between-subjects ANOVA revealed this difference to be statistically significant, F(l, 17) = 16.05,/? < .05,MSE = 6.19. This difference was not due to a ceiling effect, as attested to by the finding that only 1 out of 9 participants in the 25-anagram set condition solved all 25 anagrams. By removing the experimenter from the problem-set size selection procedure, and by allowing participants to utilize a pseudorandom procedure to determine problem-set size, setsize information continued to act as a guidepost for metacognitive judgments of exam preparedness. This finding serves to argue against experimental demand as the sole explanation of the set-size effect. In addition, the number of anagrams solved in the 100-anagram set (M = 28.8) was roughly equivalent to the number solved in the 200-anagram set used in Experiment 1 (M = 30.2), supporting our earlier notion that participants are not computing proportions with any amount of precision, but rather are coarsely generating judgments of preparedness based on a rough "dent in a pile" estimate. The question of sensitivity to set size clearly requires the type of parametric exploration that, given the lack of difference between the 100and 200-anagram set size conditions, may not be justified (however, this comparison is between-experiments, and thus cannot be taken too seriously). General Discussion Access to, and subsequent use of the learning curve in regulating preparation was shown to depend on the within-set variability (i.e., statistical noise) of the problem set. When the arrangement of problems within a problem set generated low noise, participants were clearly able to recognize and use their learning curves to regulate effort and judge preparedness. As a metacognitive cue, the learning curve generated a clear and uniform response in participants. As exemplified by the results from Experiment 3, when both the descending and flat limbs of the learning curve were accessible, there was remarkable uniformity in participant's regulatory behavior. Problem solving efforts ceased soon after the flat limb of the learning curve was encountered. In addition, the point at which participants terminated their efforts was far less variable relative to the stopping point chosen under conditions in which access to one or both limbs of the learning curve was blocked. When intraset noise was high, learning curve access was presumably blocked. In this case, problem solving efforts and preparedness judg-

Method
Participants. Nineteen undergraduates (10 men and 9 women) at the University of Texas at Austin participated in exchange for course credit. Procedure. The procedures used in this experiment were identical to those used in Experiment 1, with the following exceptions. Participants were told that in order to simulate the arbitrary nature of many of the situations people find themselves in on a daily basis, "we are conducting an experiment in which the initial quantity of problems you will be given is either determined by the experimenter or by yourself using a random procedure. You are in the self-selection condition, and thus, you will determine the starting quantity of anagrams by randomly selecting a number from a box." Participants then drew a slip of paper from a small cardboard box. Unbeknownst to participants, half of the paper slips had the number 100 written on them and half of the paper slips had the number 25 written on them. Participants were informed at this point that the number they drew would be the number of

LEARNING CURVE AS STOPPING SIGNAL

523

ments were noticeably influenced by the size of the problem set. As mentioned at the beginning of this article, these mctacognitive effects were shown to depend on the overall characteristics of the problem set and were not noticeably affected by the individual problems composing the set. Thus, although the problem sets used in Experiments 2 and 3 differed markedly in composition from those used in Experiments 1 and 4, highly similar effects on judgment and behavior were observed across studies. Additionally, although measures of poststudy performance were not taken, the implications for performance are quite clear. If progress along the learning curve is used as a proxy for performance, then the utility of a given stopping signal or study strategy can be gauged. For example, the judgment of individuals working on a high-variability problem set would generally be expected to be less well calibrated to their performance, compared with participants working on a low-variability problem set. Although this extension to performance is clear, it is obvious that the validity of such assertions must await empirical verification. The notion of a descriptive mathematical performance function commonly referred to as the learning curve dates back at least to the turn of the century (e.g., Bryan & Harter, 1897). Furthermore, the processes by which learning occurs as described by the shape of the learning curve have been the subject of much discussion (e.g., Mazur & Hastie, 1978). However, to the best of our knowledge, there has been no research into the explicit metacognitive effects of the learning curve (cf. Brown, Campione, & Barclay, 1979; Kluwe & Friedrichsen, 1987; Paris, Wasik, & Turner, 1991; Pressley, Borkowski, & O'Sullivan, 1985). The series of experiments reported in this article provide the first evidence of the explicit use of the learning curve as a metacognitive tool, and also document the conditions that promote or inhibit access to the learning curve. As a caveat to this rather sweeping claim, we realize that the nature and complexity of a given performance function (e.g., Newell & Rosenbloom, 1981) likely also plays a significant role in determining the success that an individual has in the recognition and proper use of the function. Although smooth, monotonic, negatively, and positively accelerated functions are quite common, task differences (e.g., insight vs. analytic problems), as well as differences in participants' strategies and motivation levels have generated a wide variety of learning curve shapes (e.g., Hull, 1952; Maier, 1931). The effect of learning curve characteristics on skill acquisition, however, is a separate topic, well beyond the scope of this article. Our demonstrations using simple, two-part curves signaling improvement and mastery clearly have just scratched the surface of a potentially rich area of research. The importance of thesefindingsmay also be gauged against the context provided by the literatures associated with metacognitive skill acquisition and human judgment and decision processes. With the exception of ill-defined problems such as writing (e.g., Bryson, Bereiter, Scardamalia, & Joram, 1991), and highly domain-specific complex problems such as chess (e.g., Chase & Simon, 1973) or computer programming (e.g., Kay, 1991), an overview of the literature on metacognition leaves one with the general impression of humans as reasonably adept at abandoning poor learning strategies in favor of

superior ones as a function of age-related and experiential increases in general competence. This trend has been documented across most problem types commonly encountered within academia, including reading (e.g., Daneman, 1991; Paris et al., 1991), writing (e.g., Bryson et al., 1991; also Tierney & Shanahan, 1991), memory strategies (e.g., Brown et al., 1979; Pressley et al., 1985), and general problem solving strategies (e.g., Flavell, 1976; Kluwe, 1987; Simon & Simon, 1978). In general, executive control and regulation strategies have been shown to become more sophisticated, more appropriate, and hence more effective as children grow older (e.g., Kluwe, 1987) and as adults gain task familiarity (e.g., Josephs & Hahn, 1995; Maki & Serra, 1992; Weaver, 1990). In contrast to this picture of human competence is the body of research documenting the multiple and various errors, biases, and shortcomings associated with human decision making (see, e.g., Kahneman, Slovic, & Tversky, 1982, for a review). This literature questions the neoclassical economic view of humans as generally rational, competent, and thoughtful information processors (for counter-challenges to the heuristics and biases approach, see, e.g., Cohen, 1981; Gigerenzer, 1991; Koehler, in press). We believe that the experimental paradigm used in this article does not fit neatly within either literature, but rather can be seen as representing a bridge between the two. On the one hand, the problem under scrutiny (use of an appropriate metacognitive stopping strategy in skill acquisition) clearly falls within the boundaries defined by the literature on metacognitive control and regulation. On the one hand, the documentation of a computationally simple judgmental shortcut (viz., the anchoring effects of set size) is not typical of metacognition research but rather represents a good fit with the literature on judgmental heuristics. Thus, we would argue that this research represents a contribution to the judgment literature by demonstrating conditions under which a judgmental shortcut of dubious diagnostic value is used in the service of skill acquisition, as well as representing an important contribution to the literature on metacognition by documenting a pair of metacognitive stopping strategies (learning curve and setsize anchoring effects) that heretofore have not been topics of empirical scrutiny.

References
Brown, A. L., Campione, J. C , & Barclay, C. R. (1979). Training self-checking routines for estimating test readiness: Generalization from list learning to prose recall. Child Development, 50, 501-512. Bryan, W. L., & Harter, N. (1897). Studies in the physiology and psychology of the telegraphic language. Psychological Review, 4, 27-53. Bryson, M., Bereiter, C , Scardamalia, M., & Joram, E. (1991). Going beyond the problem as given: Problem solving in expert and novice writers. In R. J. Sternberg & P. A. Frensch (Eds.), Complex problem solving: Principles and mechanisms (pp. 61-84). Hillsdale, NJ: Erlbaum. Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4, 55-81. Cohen, L. J. (1981). Can human irrationality be experimentally demonstrated? Behavioral and Brain Sciences, 4, 317-370. Costermans, J., Lories, G., & Ansay, C. (1992). Confidence level and

524

JOSEPHS, SILVERA, AND GIESLER tion: Why is study time sometimes not effective? Journal of Experimental Psychology: General, 122, 47-60. Metcalfe, J. (1986). Feeling of knowing in memory and problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 288-294. Metcalfe, J., & Wiebe, D. (1987). Intuition in insight and non insight problem solving. Memory & Cognition, 15, 238-246. Nelson, T. O. (1993). Judgments of learning and the allocation of study time. Journal of Experimental Psychology: General, 122, 269-273. Nelson, T. O., Leonesio, R. J., Landwehr, R. S., & Narens, L. (1986). A comparison of three predictors of an Individual's memory performance: The individual's feeling of knowing versus the normative feeling of knowing versus base-rate item difficulty. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 279287. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 1-55). Hillsdaie, NJ: Erlbaum. Paris, S. G., Wasik, B. A., & Turner, J. C. (1991). The development of strategic readers. In R. Ban, M. L, Kami*, P. B. MosenthaL & P. D. Pearson (Eds.), The handbook of reading research (Vol. 2, pp. 609-640). White Plains, NY: Longman. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1990). The adaptive decision maker: Effort and accuracy in choice. In R. M. Hogarth (Ed.), Insights in decision making: A tribute to HiMJ. Einhorn (pp. 129-153). Chicago: University of Chicago Press, Poulton, C. E. (1968). The new psychophysics: Six models for magnitude estimation. Psychological Bulletin, 69, 1-19. Pressley, M., Borkowski, J. G., & O'Sullivan, J. (1985). Children's metamemory and the teaching of memory strategies. In D. L. Forrest-Pressley, G. E, MacKinnon, & T. G. Waller (Eds.), Metacognition, cognition, and human performance (Vol. l , p p . 111-153). New York: Academic Press. Simon, D. P., & Simon, H. A. (1978). Individual differences in solving physics problems. In R. S. Siegler (Ed.), Children's thinking: What develops? (pp. 325-348). Hillsdale, NJ: Erlbaum. Slovic, P., Fischhoff, B., & Lichtenstein, S. (1982). Facts versus fears: Understanding perceived risk. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 463-489). Cambridge, England: Cambridge University Press. Staszewski, J. (1988). Skilled memory and expert mental calculation. In M. T. H. Chi, R. Glaser, & M. J. Farr (Eds.), The nature of expertise (pp. 71-128). Hillsdale, NJ: Erlbaum. Tierney, R. J., & Shanahan, T. (1991). Research on the readingwriting relationship: Interactions, transactions, and outcomes. In R. Barr, M. L. KaffliL P. B. Mosenthal, & P. D. Pearson (Eds.), 77ie handbook of reading research (Vol. 2, pp. 246-280), White Plains, NY: Longman. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131. Weaver, C. A. (1990). Constraining factors in calibration of comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 214-222.

feeling of knowing in question answering: The weight of inferential processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 142-150. Daneman, M. (1991). Individual differences in reading skills. In R. Barr, M. L, Kamil, P. B. MosenthaL & P. D. Pearson (Eds.), The handbook of reading research (Vol. 2, pp. 512-538). White Plains, NY: Longman. Flavell, J. H. (1976). Metacognitive aspects of problem solving. In L. Resnick (Ed.), The nature of intelligence (pp. 231-235). Hillsdale, NJ: Erlbaum. Gigereszer, G. (1991). How to make cognitive illusions disappear: Beyond heuristics and biases. European Review of Social Psychology, 2, 83-115. Glenberg, A. M , Sanocki, T., Epstein, W., & Morris, C. (1987). Enhancing calibration of comprehension. Journal of Experimental Psychology: General, 116, 119-136. Howell, D. C. (1992). Statistical methods for psychology. Belmont, CA: Wadsworth, Hull, C L, (1952). A behavior system. New Haven, CT: Yale University Press. Josephs, R. A., Giesler, R. B., & Silvera, D. H. (1994). Judgment by quantity. Journal of Experimental Psychology: General, 123, 21-32. Josephs, R. A., & Hahn, E. D. (1995). Bias and accuracy in estimates of task duration. Organizational Behavior and Human Decision Processes, 61, 202-213. Kahneman, D., Slovic, P., & Tversky, A, (1982). Judgment under uncertainty: Heuristics and biases. Cambridge, England: Cambridge University Press. Kay, D. S, (1991). Computer interaction: Debugging the problems. In R. J. Sternberg & P. A. Frensch (Eds.), Complex problem solving: Principles and mechanisms (pp. 317-340). Hillsdale, NJ: Erlbaum. Kluwe, R, H. (1987). Executive decisions and regulation of problem solving behavior. In F. E. Weinert & R, H. Kluwe (Eds.), Metacognition, motivation, and understanding (pp. 31-64). Hillsdale, NJ: Erlbaum. Kluwe, R. H., & Friedrichsen, G. (1985). Mechanisms of control and regulation in problem solving. In J. Kuhl & J. Beckrnann (Eds.), Action control (pp. 183-218). Berlin, Germany: Springer. Koehler, J, J. (in press). The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges. Behavioral and Brain Sciences. Lehman, D, R., Lempert, R. O., & Nisbett, R. E. (1988). The effects of graduate training on reasoning: Formal discipline and thinking about everyday-life events. American Psychologist, 43, 431-442. Lerner, M J. (1980). The belief in a just world: A fundamental delusion. New York: Plenum. Maier, N. R. F. (1931). Reasoning in humans H: The solution of a problem and its appearance in consciousness. Journal of Comparative Psychology, 12, 181-194. Maki, R. H., & Berry, S. L. (1984). Metacomprehension of text material. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 663-679. Maki, R. H., & Serra, M. (1992). The basis of test predictions for text material. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 116-126. Mazur, J. E., & Hastie, R. (1978). Learning as accumulation: A reexamination of the learning curve. Psychological Bulletin, 85, 1256-1274. Mazzoni, G., & Cornoldi, C. (1993). Strategies in study time alloca-

Received August 16,1993 Revision received March 2,1995 Accepted March 6,1995

S-ar putea să vă placă și