Sunteți pe pagina 1din 12

Curriculum-Based Measurement in the Content Areas:

Vocabulary Matching as an Indicator of Progress in Social Studies Learning


Christine A. Espin, Jongho Shin, and Todd W. Busch

Abstract
The purpose of this study was to examine the reliability and validity of curriculum-based measures as indicator of growth in contentarea learning. Participants were 58 students in 2 seventh-grade social studies classes. CBM measures were student- and administratorread vocabulary-matching probes. Criterion measures were performance on a knowledge test, the social studies subtest of the Iowa Test of Basic Skills (ITBS), and student grades. Both the student- and examiner-read measures reflected change in performance; however, only the student-read measure resulted in interindividual differences in growth rates. Significant relations were found between the growth rates generated by the student-read vocabulary measure and student course grades, ITBS scores, and growth on the knowledge test. These results support the validity of a vocabulary-matching measure as an indicator of student learning in the content areas. The results are discussed in terms of the use of CBM as a system for monitoring performance and evaluating interventions for students with learning disabilities in content-area classrooms.

ne of the most important yet most difficult components of education is the measurement of change. By measuring change in performance, teachers can reliably evaluate student learning and the effects of instructional interventions on that learning. Yet despite its importance, change measurement is rarely the focus of educational assessment, where the measurement of performance at a single point in time is the dominant approach. In few other areas of education is this emphasis more prevalent than in the field of learning disabilities (LD), where the identification of students for services is often based on the discrepancy between two single scoresan intelligence score and an achievement score. The lack of attention given to change measurement in education has been due in part to the difficulties associated with measuring change in performance, including a lack of statistical methods for handling multiple

data points (Willet, 1989) and a lack of assessment tools available for producing repeated measures within short time periods (Francis, Shaywitz, Stuebing, Shaywitz, & Fletcher, 1994). With regard to statistical methods, recent developments have opened up new possibilities for incorporating students change in performance as part of educational assessment (see Bryk & Raudenbush, 1987, 1992). Francis et al. (1994) illustrated the use of these advanced statistical procedures in the area of LD. These authors and others (D. Fuchs, Fuchs, McMaster, & Al Otaiba, 2003; L. S. Fuchs & Fuchs, 1998) have proposed that change measurement be involved in defining and diagnosing LD as well as in determining students responses to interventions. With regard to the availability of assessment tools, there exists a measurement system specifically designed to measure change in student performance by producing repeated measures within short time periods. This

system of measurement, referred to as curriculum-based measurement (CBM), has a strong body of research to support its validity and reliability.

Curriculum-Based Measurement
Curriculum-based measurement is an ongoing data collection system that is designed to provide teachers with information on student progress and on the effects of instructional interventions on that progress. The measures developed for use as a part of CBM are simple, efficient, easy to understand, and inexpensive, and allow for repeated measurement of student performance over time (Deno, 1985). More than 25 years of research have supported the validity and reliability of CBM measures as indicators of performance for elementary school students in the basic skill areas of reading, mathematics, spelling, and written expression. Cor-

JOURNAL OF LEARNING DISABILITIES VOLUME 38, NUMBER 4, JULY/AUGUST 2005, PAGES 353363

354

JOURNAL OF LEARNING DISABILITIES

relations between CBM indicators and a variety of criterion measures generally range from .60 to .90, and test retest and alternate-form reliabilities are generally above .80 (see Marston, 1989, for a review). The treatment validity of CBM measures at the elementary school level has also been supported. When teachers use CBM measures to evaluate and modify their instruction, student achievement improves (L. S. Fuchs, Deno, & Mirkin, 1984; L. S. Fuchs, Fuchs, & Hamlett, 1989a, 1989b, 1989c; L. S. Fuchs, Fuchs, Hamlett, & Allinder, 1991; L. S. Fuchs, Fuchs, Hamlett, & Ferguson, 1992; L. S. Fuchs, Fuchs, Hamlett, & Stecker, 1990; Stecker & Fuchs, 2000; Wesson et al., 1988). Most recently, CBM has been combined with statistical techniques such as hierarchical linear modeling (HLM) to generate student growth curves and to use these growth curves to answer questions about the relation between student progress and instructional variables (Compton, 2000; Shin, Deno, & Espin, 2000). The initial work in the area of CBM was conducted at the elementary school level; later, this work was extended to the secondary school level (see Espin & Tindal, 1998, for a review). With that extension came an interest in the development of CBM measures in content areas such as social studies and science.

CBM in the Content Areas


The initial research on the development of curriculum-based measurement in the content areas was conducted by Tindal and Nolet (see Nolet & Tindal, 1993, 1994, 1995; Tindal & Nolet, 1995, 1996). Tindal and Nolet identified the critical thinking skills (e.g., explanation of concepts, illustration of facts) needed to understand and use content-area information and created measures to represent these thinking skills. The measures were appropriate for determining student learning within a given unit of study, but they were less appropriate for showing growth across

study units (see Tindal & Nolet, 1995). Espin and Deno (1993a, 1993b, 1994 1995) and Espin and Foegen (1996) took a somewhat different approach and focused on the identification of a single indicator that would represent general performance in the content areas. The first step in this research was to establish the reliability and validity of a single indicator for predicting student performance on content-area tasks. Espin and Deno (1993b, 19941995) examined the validity of two measures as potential indicators of performance on content-area study tasks in English and science. Tenth-grade participants read aloud for 1 minute from English and science textbook passages. In addition, participants were given 10 minutes to complete a vocabularymatching task with terms selected from each passage. The criterion task was a study task in which students searched through the text for answers to comprehension questions. Correlations between the predictor and criterion measures were in the low moderate to moderate range (r = .37.44). Correlations were similar for vocabulary matching and for reading aloud from text, but in a regression analysis, vocabulary matching accounted for the largest proportion of variance in the criterion task, with reading aloud not adding to the variance. In a follow-up study, Espin and Foegen (1996) compared the validity of three CBM measures for predicting student performance on three criterion tasks. CBM measures were reading aloud from text, vocabulary matching, and maze selection. The maze selection measure was included in this study because it was known to be a good predictor of reading performance at the elementary school level (Espin, Deno, Maruyama, & Cohen, 1989; L. S. Fuchs, Fuchs, & Maxwell, 1988). The criterion measures in the study were representative of the tasks required of students in content-area classes and included comprehension, acquisition, and retention of content information. Participants in the study were 186 middle school students. The results revealed moder-

ate to moderately strong correlations between the CBM and criterion tasks (r = .52.65). Once again, in a regression analysis, vocabulary matching accounted for the greatest proportion of variance in the criterion tasks, with neither of the other measures contributing substantially to the variance. Although the results of Espin and Denos (1993b, 19941995) and Espin and Foegens (1996) studies suggested that vocabulary matching was a valid indicator of content-area performance, neither study had been conducted in an actual content-area classroom. In our research, we wished to extend this early work to examine vocabulary matching in a middle school social studies classroom. We conducted two related studies. In the first, we examined the technical adequacy of a vocabulary-matching measure as an indicator of performance in social studies; in the second, we examined the technical adequacy of the same vocabulary-matching measure as an indicator of progress in social studies. In this article, we report the results of the progress study. We begin, however, by summarizing the results of the performance study (see Espin, Busch, Shin, & Kruschwitz, 2001, for details). Participants in the performance study were 58 seventh-grade students from two social studies classrooms. Based on the results of previous research, only the vocabulary-matching measure was included in this study. In order to examine the role of reading in the prediction of performance, two versions of the vocabulary-matching measure were compared: a studentread version, in which students read words and definitions to themselves, and an examiner-read version, in which the examiner read the words and definitions to the students. Criterion measures were a research-made pre- and post- knowledge test, social studies grades, and scores on the social studies subtest of a standardized achievement test. The results of the performance study revealed that alternate-form reliabilities for the vocabulary-matching

VOLUME 38, NUMBER 4, JULY/AUGUST 2005

355

measures ranged from .58 to .87, with a mean reliability of .70 for studentand administrator-read forms. Reliability was increased to .84 and .78 for student- and administrator-read probes, respectively, by combining scores across the two probes. Analysis with respect to the validity of the measures lent support to the use of both measures as indicators of student performance in social studies. Correlations between vocabulary matching and the knowledge and standardized achievement tests ranged from .56 to .84. Correlations with class grades were lower, ranging from .27 to .51, in part due to the restricted range of scores for course grades (most students earned grades of C to A); however, the correlation between the student-read probe and course grades was moderately strong (r = .51). Finally, although the sample of students with LD was small, performance on the vocabulary-matching probe differentiated students with and without LD.

nition of the importance of vocabulary for reading comprehension and contentarea learning, the way in which vocabulary terms are learned and the relation between such learning and the comprehension and acquisition of text material is not clear (see Baumann & Kameenui, 1991; Beck & McKeown, 1991; Blachowicz & Fisher, 2000; Nagy & Scott, 2000). Thus, in this second study, we wished to explore the question of whether student performance on the vocabulary measures would change and whether this change would occur concomitantly with learning. We addressed the following two research questions in the study: 1. What is the validity of vocabulary matching as an indicator of progress (i.e., learning) in a social studies class? 2. Does the validity of vocabulary matching differ as a function of the administration format (i.e., student vs. administrator read)? To address these questions, three issues were examined: (a) the sensitivity of the vocabulary-matching measures to improvement in student performance over time; (b) the sensitivity of the vocabulary-matching measures to interindividual differences in growth rates; (c) the validity of the growth rates generated by the vocabularymatching measures with respect to course grades, performance on a standardized achievement test, and improvement on a content knowledge test.

participants were European American (95%), with a small percentage of students who were African American or Asian American (5%); 28% of the school population received free or reducedprice lunches. Five of the participants were identified as having LD according to district standards (4 boys and 1 girl; mean age = 13.4 years). The identification standards for LD included a history of underachievement, a discrepancy between ability and achievement, and an information-processing deficit. All five students were receiving services in reading and written expression, and one was receiving additional services in mathematics. Percentile scores for the students with LD on the Iowa Test of Basic Skills, form K, Level 12 (ITBS; Hoover, Hieronymus, Frisbie, & Dunbar, 1993) were as follows: 30.4 (range = 346) on the Reading Vocabulary subtest, 26.8 (range = 249) on the Reading Total subtest, and 13.2 (range = 120) on the Social Studies subtest.

Purpose and Research Questions


The results of Espin et al. (2001) confirmed the validity of vocabulary matching as an indicator of performance. This result is not surprising, given the literature on the importance of vocabulary knowledge for both reading comprehension and content-area learning (e.g., Baumann & Kameenui, 1991; Beck & McKeown, 1991; Blachowicz, 1991; Blachowicz & Fisher, 2000; Konopak, 1989; Nagy & Scott, 2000; Scruggs, Mastropieri, & Boon, 1998). However, the 2001 study did not address a key question: Would vocabulary matching prove to be a reliable and valid indicator of student progress? In other words, would the growth trajectories produced by repeated measurement on alternate forms of the vocabularymatching probes adequately model student growth and learning in social studies? A review of the literature reveals that the answer to this question is not obvious; that is, despite the recog-

Procedure
During the winter and spring quarters, students were tested weekly with two types of vocabulary-matching probes: student read and administrator read. The student-read version of the probe consisted of 22 vocabulary terms, including two distractors, and 20 definitions. Terms and definitions were chosen at random from a master list of 146 terms created from the social studies textbook and the teachers lecture notes. Definitions were modified if necessary so that each definition would have fewer than 15 words. Vocabulary terms appeared on the left side of the page and were arranged alphabetically to help students easily locate terms. Definitions appeared on the right side of the page. Students were given 5 minutes to read the terms and definitions and to match each term with its definition. The administrator-read version of the vocabulary probe was developed from the same set of terms and definitions as the student-read version. On

Method
Participants
Participants in this study were 58 seventh-grade students (32 boys and 26 girls; mean age = 13.6 years). These students had also participated in the earlier study on vocabulary matching as a performance measure (Espin et al., 2001). Students were recruited from two social studies classes in a suburban school in the Midwest. The majority of

356

JOURNAL OF LEARNING DISABILITIES

the administrator-read version, only the vocabulary terms were given. The test administrator read the definitions, and the students identified which terms matched the definition being read. Definitions were read one at a time, with 15-second intervals between each item. Each probe lasted a total of 5 minutes. Vocabulary-matching probes were administered weekly by the third author. Students were given the two types of probes consecutively each week. To control for order effects, the order in which the probes were given was alternated each week. The number of correct matches on each probe was tallied and used in data analysis. In total, each individual completed 11 administrator-read and 11 studentread probes. In addition to the vocabularymatching probes, students were administered a knowledge test at preand posttest to measure the amount learned during the study. The knowledge test was composed of 36 questions in the areas of sociology, psychology, and geography. Questions were developed on the basis of textbook content, teachers lecture notes, and teacher-made worksheets and tests. The social studies teacher was asked to review all items on the knowledge test to ensure that the items matched the

information presented to the students in class. Items were classified into two types of questions: applied (27 items) and factual (9 items). Applied questions were those in which students were asked to apply social concepts and principles to specific social events or phenomena. Factual questions were those in which students were asked to make simple one-to-one relations between concepts and events (see Table 1 for examples of these two types of questions). A heavier emphasis was placed on the applied questions to ensure that the relation between the vocabulary-matching tasks and the knowledge test would not be solely a function of the similarity in the task requirements. Following the development of the knowledge test items, the items were given to a graduate student in special education, who was not involved in the study, and to the social studies teacher involved in the study. The graduate student and social studies teacher were asked to classify each item as either applied or factual. Interrater agreement between each rater and the third author was calculated by dividing agreements by agreements plus disagreements. Interrater agreement was .95 and .89 for the graduate student and the social studies teacher,

TABLE 1 Examples of Applied and Factual Questions on the Knowledge Test


Question type Applied Example Jos comes from a working class home. He married Judy who is very wealthy and moves into an upper class neighborhood. Joss change in status is an exmple of a. mobility b. sanctions c. mores d. primary group The process by which a member learns the rules of his or her group is called a. socialization b. community c. role play d. mobility

respectively. Items that were not correctly classified by the graduate student or the social studies teacher were modified. Students were given the knowledge test at the beginning and end of the study. The number of correct answers was used for data analysis. Students social studies grades and their scores on the ITBS were also collected. Social studies grades represented students mean grade in the class across three grading periods. Letter grades were assigned to each student. For our purposes, the letter grades were converted to numeric values on a 13-point scale, with 13 representing an A+, 12 an A, 11 an A, 10 a B+, and so on. A score of F was assigned a 0. If students failed a class, they were able to retake it in a 4-week makeup session. If students passed this makeup session, they were assigned a grade of P. These passing grades were assigned a value of 1. Course grades were based equally on homework, quizzes, unit tests, and current events reporting. Scores on the ITBS were obtained from students school records. Students completed the ITBS the spring prior to the beginning of the study. Form K, Level 12 of the ITBS was administered. The Social Studies subtest of the ITBS consists of 42 questions covering history, geography, economics, political science, sociology/ anthropology, and related social sciences (e.g., ethics, human values) Standard scores were used for all analyses. The internal consistency of the ITBS, as reported in the technical manual, ranges from .61 to .93. Salvia and Ysseldyke (1998) reported that the items of the ITBS were reviewed for content fit and item bias by field experts and then tested on a large sample across the United States. The results of this testing were used for final sample selection.

Factual

Statistical Analysis
Hierarchical linear models (HLM) were used to address three items with respect to the vocabulary-matching measures: (a) sensitivity to improve-

VOLUME 38, NUMBER 4, JULY/AUGUST 2005

357

ment of student performance over time, (b) sensitivity to interindividual differences in growth rates, and (c) validity of the growth rates produced by the measures with respect to the criterion measures. To address the first two issues, unconditional HLM models were employed to examine the sensitivity of vocabulary probes for measuring student growth over time and for revealing interindividual differences in growth rates. In these analyses, the significance of the mean growth rates and growth parameter variances estimated by each type of vocabulary measure were statistically investigated. To address the third issue, course grades, scores on the Social Science subtest of the ITBS, and performance change on the content knowledge test were used as Level 2 variables in HLM to examine the validity of the growth measures estimated on the vocabulary probes. In this analysis, the relations between growth rates estimated on the vocabulary probes and the criterion measures were statistically examined.

ences in growth rates were further examined using hierarchical linear models (Bryk & Raudenbush, 1987, 1992). Specifically, the statistical significance of the mean growth rate and of the growth parameter variance estimated by each type of vocabulary measure was tested. The statistical test of the significance of the mean growth rate addressed the question of whether the growth rate for the entire group of students was statistically different from a null growth rate (i.e., growth rate of zero). The statistical test of the significance of the growth parameter variance addressed the question whether individual students differed in their rates of growth over time. We hypothesized that as a group, the students would improve significantly in social studies knowledge over the school year due to the teachers instruction. Moreover, we hypothesized that students would not share the same growth rates because of individual characteristics (e.g., intelligence, prior knowledge, and motivation to learn). To test the validity of these hypotheses, the following unconditional models were used in this study: Yti = 0i + 1i Week = eti (within-individual model) and 0i = 00 + u0i, 1i = 10 + u1i (between-individual models), where Yti is the observed score for student i at time t, 0i the intercept of the growth line for student i, 1i the weekly growth rate for student i, 00 the mean intercept for the entire group of students, 10 the mean growth rate for the entire group of students, eti the error related to student i, and u0i and u1i the random errors related to the mean intercept and growth rate, respectively. In the analysis, the intercept was centered at the first occasion of data collection; therefore, it showed individual differences in the students vocabulary knowledge at the beginning of the study. The statistical test of the significance of the mean growth rates revealed that the mean growth rates estimated

Results
Sensitivity to Improvement Over Time and Individual Differences
The first step in our analysis was to determine whether student- and administrator-read vocabulary-matching measures were sensitive to improvement over time, and whether they revealed interindividual differences in growth rates. Descriptive statistics of repeated measures of students performance on the student- and administrator-read probes are displayed in Table 2. Observed mean scores on both types of vocabulary probes increased over time. Moreover, interindividual differences in student performance increased over time on the student-read probes, as evidenced by the increase in standard deviations. The improvement in performance scores and the interindividual differ-

by both student- and administratorread vocabulary probes were statistically different from a null growth rate; that is to say, they were both sensitive for detecting significant improvement of students performance over time (see Table 3). The mean growth rate estimated by the student-read vocabulary probe showed an increase of .65 correct matches per week, whereas the mean growth rate estimated by the administrator-read probe showed an increase of .22 correct matches per week. The statistical test of the significance of the growth parameter variance revealed that the growth parameter variance estimated by the student-read vocabulary probe was statistically different from no variability in students growth rates (see Table 4); that is, there were individual differences in growth rates on the student-read measure. In contrast, the growth parameter variance estimated by the administratorread probe was not statistically different from no variability, indicating that all students shared the same growth rate (i.e., the mean growth rate). Thus, the results of this analysis revealed that only the student-read vocabularymatching probe was sensitive enough to reveal interindividual differences in growth rates among students. Given this finding, only the student-read probe was entered into future analyses (see Bryk & Raudenbush, 1987, 1992). Regarding the performance at the beginning of the study, the mean intercepts were statistically significant for both types of probes (see Table 3). The mean intercept for the administratorread probe, however, was slightly higher than that for the student-read probe. Both types of probes also sensitively reflected the existence of interindividual differences in vocabulary knowledge at the beginning of the study (see Table 4).

Validity of Growth Rates


The validity of the growth rates estimated by the student-read vocabularymatching measure was examined by

358

JOURNAL OF LEARNING DISABILITIES

TABLE 2 Means and Standard Deviations for Student- and Administrator-Read Vocabulary-Matching Probes
Student-read Probe 1 2 3 4 5 6 7 8 9 10 11 Administrator-read

n
53 54 53 44 48 53 49 50 50 52 51

M
5.23 6.02 7.11 6.00 7.04 9.83 9.35 8.84 9.78 14.10 9.71

SD
3.41 3.48 5.35 4.09 4.13 5.24 4.99 5.26 5.14 5.38 5.67

M
8.41 8.75 6.25 10.80 8.62 10.64 8.62 8.88 10.24 9.47 10.88

SD
3.83 4.33 3.61 4.52 4.23 4.86 4.87 4.51 4.41 3.80 4.20

investigating the relations between the growth rates generated by the vocabulary-matching measures and the residualized gain scores on the content knowledge test, course grads in social studies, and scores on the Social Science subtest of the ITBS. Means and standard deviations for the criterion measures were as follows: knowledge pretest, M = 20.27, SD = 5.07; knowledge posttest, M = 24.86, SD = 5.62; ITBS, M = 218.72, SD = 32.27; social studies grades, M = 9.38, SD = 3.24. The three criterion measures were separately included in the betweenindividual model as a Level 2 predictor because the main interest of our analysis was in examining the direct relation between growth rates on the studentread vocabulary probe and each criterion measure, not the relative contribution of the criterion measures to predicting the students growth rates. The three between-individual models used in the analysis were as follows: 1i = 10 + 11 (GainScore)i + u1i 1i = 10 + 11 (CourseGrade)i + u1i 1i = 10 + 11 (ITBS)i + u1i where 1i is the linear growth rate for student i, 10 the mean growth rate for the entire group of students, 11 the regression coefficient showing the relation between growth rates on the student-read vocabulary probe and each corresponding criterion measure, and u1i the random error related to the mean growth rate. Prior to examining the relations between growth rates and criterion measures, the reliability of the growth rate parameter was explored. This was done to ensure that the relations between growth rates and criterion measures were examined reliably. The reliability of the growth parameter in HLM is defined as the proportion of observed variance of the parameter to true variance. Low parameter reliability (e.g., less than .30) indicates that estimates of the growth parameter are unstable and that their relations to other variables cannot be examined in a dependable way. The reliability of

TABLE 3 Sensitivity of Student- and Administrator-Read Vocabulary Probes for Revealing Growth Over Time (Fixed-Effect Model)
Probe/effect Student-read Intercept (00) Mean growth (10) Administrator-read Intercept (00) Mean growth (10) Coefficient Standard error

ta

5.16 .65

.46 .06

11.23 10.70

.00 .00

7.98 .22

.46 .04

17.24 5.86

.00 .00

TABLE 4 Sensitivity of Student- and Administrator-Read Vocabulary Probes for Revealing Interindividual Differences in Growth Rates (Random-Effect Model)
Probe/effect Student-read Intercept (00) Mean growth (10) Administrator-read Intercept (00) Mean growth (10)
Note. Chi-square df = 56, N = 57.

Variance

9.07 .12

222.01 133.06

.00 .00

9.84 .01

276.97 59.00

.00 .37

VOLUME 38, NUMBER 4, JULY/AUGUST 2005

359

the growth rate parameter was .52 in the null model, indicating that 52% of the total growth rate parameter variance estimated by the student-read vocabulary probe could be attributed to the true parameter variance (see Bryk & Raudenbush, 1987, 1992). This result for the reliability of the parameter variance suggests that the relations between growth rates and criterion measures would be examined reliably in this study. The results of the validity analyses revealed that the growth rates estimated by the student-read vocabulary probe were significantly related to residualized gain scores on the knowledge test, to students course grades in social studies, and to the ITBS Social Studies test scores (see Table 5). In other words, students who had larger gain scores on the knowledge test, higher course grades, and higher test scores on the ITBS also had higher growth rates on the student-read vocabulary probe. These results support the validity of the student-read vocabulary measure as an indicator of growth in learning.

Discussion
The results of this study indicate that only the student-read version of the vocabulary-matching probe produced growth trajectories that were valid and reliable predictors of student performance in social studies. Both the student- and administrator-read versions of the vocabulary-matching probes produced significant group growth rates, although the student-read measure revealed greater growth over time. However, only the student-read vocabulary-matching measure was sensitive to interindividual differences in growth rates. Because we can assume that not all students participating in the study had identical growth rates, our findings imply that only the student-read version is sufficiently sensitive to growth over time. Examination of Table 2 may help to explain the contrast between the two measures in terms of their sensitivity to

interindividual differences in growth rates. As illustrated in Table 2, standard deviations for the student-read probes tended to increase gradually across the duration of the study, whereas standard deviations for the administrator-read probes did not. If the vocabulary-matching measures were sensitive to individual changes in performance, one would expect the standard deviations for the measures to increase over the course of the year, as some students learn more whereas others learn less. The restricted variability in scores for the administratorread probes most likely served to artificially restrict the variability in the slopes for the administrator-read scores, leading to a lack of sensitivity to interindividual differences. In other words, reading the probes aloud to the students produced less individual variation in performance as the year progressed. Conceptually, our results imply that reading is an important factor in the measurement of student performance and progress in the content areas. Based on previous research (Espin et al., 2001), we had expected no differences in the validity of the growth trajectories created by the two types of vocabulary-matching measures. However, the vocabulary-matching task that incorporated reading was more sensitive to overall learning than the measure that removed reading as a factor. Recall, however, that we conducted this study in the classroom of only one teacher. It is possible that reading may be a more important factor in this

teachers classroom than in other classrooms. It will be important in future research to replicate these findings across different teachers and students and to directly examine the role of reading. Once we had established that the student-read measure was sensitive both to group growth and to individual differences in growth over time, we examined the reliability and validity of the growth trajectories created by the student-read measure. The results revealed that the growth trajectories created by the student-read measure were both reliable and valid. The growth trajectories were stable and proved to be significantly related to growth and performance on other criterion measures, including gain scores on the knowledge test, course grades, and scores on the ITBS. In other words, students who demonstrated greater growth on the student-read vocabularymatching measure also showed more growth on the knowledge test, had higher course grades, and had higher scores on the ITBS. This is the pattern of relations we would expect if the vocabulary-matching measures were valid measures of performance and progress.

Examples of Progress Monitoring


Although our research supports the technical adequacy of vocabularymatching as an indicator of performance and progress in the content areas, it does not address the treatment

TABLE 5 Relation Between Growth Rates on Student-Read Vocabulary Probes and Criterion Measures
Criterion measure Knowledge test gain score Course grades ITBS Social Science scores Coefficient .010 .053 .002 Standard error .005 .017 .001

ta
2.00 3.22 2.34

p
.05 .00 .02

Note. ITBS = Iowa Test of Basic Skills (Hoover, Hieronymus, Frisbie, & Dunbar, 1993). adf = 53.

360

JOURNAL OF LEARNING DISABILITIES

validity (Messick, 1994) of the measure; that is, our research does not address the effect that the use of progress monitoring might have on teacher instruction and student performance. In this section, we illustrate the ways in

which CBM measures could be used in the content areas to aid special education teachers in their decision making. Research is needed to address the effects of such implementation on teacher instruction and student perfor-

FIGURE 1. Progress graph and trendline for student with LD: Level and growth rate of performance commensurate with that of peers.

FIGURE 2. Progress graph and trendline for student with LD: Level below that of peers, but growth rate of performance commensurate with that of peers.

mance, especially for students with LD who spend a large portion of their school day in general education classes in the content areas (Lovitt, Plavins, & Cushing, 1999; Wagner, 1990). At the beginning of the school year, the special and general education teacher would administer vocabularymatching probes to all students in the classroom. These data would be used by the general education teacher to identify students who might experience difficulty in the class, and by the special education teacher to evaluate the appropriateness of the class for his or her students. Following this initial assessment, students who are identified as at risk for difficulty in class would be monitored by the special or general education teacher to evaluate student learning in the content class. One way to evaluate the learning of the students with disabilities would be to compare it to the learning of their peers without disabilities. For example, in Figures 1 through 3, the scores for three students with LD from our study are graphed with the mean score for all students without LD. Slope lines indicating rates of progress are displayed for each student and for the class mean. The data from our study indicate that Student 1 is performing successfully in this social studies class; as Figure 1 shows, level and rate of performance for Student 1 are commensurate with that of nondisabled peers. Student 2 is also performing successfully; although Student 2s level of performance is below that of nondisabled peers, the rate of growth is equal to that of nondisabled peers (see Figure 2). Student 3, in contrast, is not performing successfully in this social studies class; both level and rate of performance for this student are substantially below that of the class mean and, for that matter, below that of other peers with LD. The graph in Figure 3 indicates a need for additional accommodations or modifications for Student 3. If such accommodations or modifications do not result in improved growth, an alternative placement could be considered.

Number of Correct Matches

Number of Correct Matches

VOLUME 38, NUMBER 4, JULY/AUGUST 2005

361

This example illustrates how content-area and special education teachers can use progress monitoring data to make instructional decisions regarding students with LD in contentarea classes. Progress monitoring provides a data source different from but complementary to the typical evaluation based on course grades. Course grades often reflect factors other than learning, such as attendance, behavior, and homework completion (e.g., Miller, Leinhardt, & Zigmond, 1988), and the meaning of course grades is sometimes unclear, especially when there have been modifications in the grading system (Olson, 1989; Rojewski, Pollard, & Meers, 1991). Progress monitoring, on the other hand, focuses solely on learning and answers the question whether students good behavior, hard work, and homework completion, and teachers accommodations and modifications are having positive effects on student learning.

Conclusion
In conclusion, our results support the use of a student-read vocabularymatching probe as an indicator of student learning in social studies. Taken in combination with the earlier results of Espin et al. (2001), our results indicate that a simple vocabularymatching measure can be used as an indicator of student performance and progress over time in social studies. This measure can be administered to students in groups, takes only 5 minutes to administer, and can be scored relatively quickly. On a more general level, our results provide further support for the use of CBM measures as measures of change. As indicated by Francis et al. (1994), L. S. Fuchs and Fuchs (1998), and D. Fuchs et al. (2003), such measures have potential for use in decision making for students with LD. That is, the measures can be used not only to determine to what extent students are discrepant from their peers at a single timepoint, but also to examine to what extent students are progressing rela-

tive to their peers. Students who are discrepant both in performance and progress would be those most in need of intensive interventions. Our study is only a first step in the research on the implementation of progress monitoring procedures in content-area classes. Several questions remain, including (a) Will special and content-area teachers be willing to implement and rely on progress monitoring data? (b) Can progress monitoring data serve as a conduit for communication and collaboration between general and special education teachers? and (c) Will the implementation of progress monitoring procedures result in improved achievement for students at risk and students with learning disabilities?
ABOUT THE AUTHORS

Number of Correct Matches

FIGURE 3. Progress graph and trendline for student with LD: Level and growth rate of performance below that of peers.

sity. His current research interests include reading comprehension, learning strategies, and motivation. Todd W. Busch, PhD, is an assistant professor of special populations at Minnesota State University, Mankato. His current interests include teacher training, progress monitoring for secondary-level students, and reading comprehension. Address: Jongho Shin, Department of Education, Seoul National University, Shinrim-Dong Kwanak-Gu, Seoul 151-748, Korea; e-mail: Jshin21@snu.ac.kr

AUTHORS NOTES

Christine A. Espin, PhD, is a professor in the Department of Educational Psychology at the University of Minnesota. Her research focuses on the development of progress-monitoring procedures in reading, written expression, and content-area learning for secondary school students with learning disabilities. Jongho Shin, PhD, is an assistant professor in the Department of Education at Seoul National Univer-

1. The research reported here was funded in part by the Guy Bond Foundation, University of Minnesota. 2. We wish to thank the teachers, administrators, and students of the Maplewood schools for their participation in the study. We wish to thank Dana Frederick for assistance in data coding and Ron Kruschwitz for help with the data collection. Finally, we wish to acknowledge the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this article.

REFERENCES

Baumann, J. F., & Kameenui, E. J. (1991). Research on vocabulary instruction: Ode to Voltaire. In J. Flood, J. M. Jensen, D.

362

JOURNAL OF LEARNING DISABILITIES

Lapp, & J. R. Squire (Eds.), Handbook of research on teaching the English language arts (pp. 604632). New York: MacMillan. Beck, I., & McKeown, M. (1991). Conditions of vocabulary acquisition. In R. Barr, M. L. Kamil, P. B. Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research (Vol. II, pp. 784814), New York: Longman. Blachowicz, C. L. Z. (1991). Vocabulary instruction in content classes for special needs learners: Why and how? Journal of Reading, Writing, and Learning Disabilities International, 7, 297308. Blachowicz, C. L. Z. & Fisher, P. (2000). Vocabulary instruction. In M. L. Kamil, P. B. Mosenthal, P. D. Pearson, & R. Barr (Eds.), Handbook of reading research (Vol. III, pp. 503524), Mahwah, NJ: Erlbaum. Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147158. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Compton, D. L. (2000). Modeling the growth of decoding skills in first-grade children. Scientific Studies of Reading, 4, 219259. Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219232. Espin, C. A., Busch, T., Shin, J., & Kruschwitz, R. (2001). Curriculum-based measures in the content areas: Validity of vocabulary-matching measures as indicators of performance in social studies. Learning Disabilities Research & Practice, 16, 142151. Espin, C. A., & Deno, S. L. (1993a). Contentspecific and general reading disabilities of secondary-level students: Identification and educational relevance. The Journal of Special Education, 27, 321337. Espin, C. A., & Deno, S. L. (1993b). Performance in reading from content-area text as an indicator of achievement. Remedial and Special Education, 14(6), 4759. Espin, C. A., & Deno, S. L. (19941995). Curriculum-based measures for secondary students: Utility and task specificity of text-based reading and vocabulary measures for predicting performance on content-area tasks. Diagnostique, 20, 121 142. Espin, C. A., Deno, S. L., Maruyama, G., & Cohen, C. (1989, April). The Basic Academic Skills Samples: An instrument for

screening and identifying children at risk for failure in the regular education classroom. Paper presented at the American Educational Research Association Meeting, San Francisco, CA. Espin, C. A., & Foegen, A. (1996). Validity of three general outcome measures for predicting secondary students performance on content-area tasks. Exceptional Children, 62, 497514. Espin, C. A., & Tindal, G. (1998). Curriculum-based measurement for secondary students. In M. R. Shinn (Ed.), Advanced applications of curriculum-based measurement (pp. 214253). New York: Guilford Press. Francis, D. J., Shaywitz, S. E., Stuebing, K. K., Shaywitz, B. A., & Fletcher, J. M. (1994). Measurement of change: Assessing behavior over time and within developmental context. In G. R. Lyon (Ed.), Frames of reference for the assessment of learning disabilities: New views on measurement issues (pp. 2958). Baltimore: Brookes. Fuchs, D., Fuchs, L. S., McMaster, K. N., & Al Otaiba, S. (2003). Identifying children at risk for reading failure: Curriculumbased measurement and the dualdiscrepancy approach. In H. L. Swanson, K. R. Harris, & S. Graham (Eds.), Handbook of learning disabilities (pp. 431449). New York: Guilford Press. Fuchs, L. S., Deno, S. L., & Mirkin, P. (1984). Effects of frequent curriculum-based measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21, 449 460. Fuchs, L. S., & Fuchs, D. (1998). Treatment validity: A unifying concept for reconceptualizing the identification of learning disabilities. Learning Disabilities Research & Practice, 13, 204 219. Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989a). Effects of alternative goal structures within curriculum-based measurement. Exceptional Children, 55, 429 438. Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989b). Effects of instrumental use of curriculum-based measurement to enhance instructional programs. Remedial and Special Education, 10(2), 4352. Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989c). Monitoring reading growth using student recalls: Effects of two teacher feedback systems. Journal of Education Research, 83, 103111. Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Allinder, R. M. (1991). The contribution of

skills analysis within curriculum-based measurement in spelling. Exceptional Children, 57, 443 452. Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Ferguson, C. (1992). Effects of expert system consultation within curriculumbased measurement, using a reading maze task. Exceptional Children, 58, 436 450. Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Stecker, P. M. (1990). The role of skills analysis in curriculum-based measurement in math. School Psychology Review, 19, 622. Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special Education, 9(2), 2028. Hoover, H., Hieronymus, A., Frisbie, D., & Dunbar, S. (1993). Iowa tests of basic skills. Chicago: Riverside. Konopak, B. C. (1989). Effects of inconsiderate text on eleventh graders vocabulary learning. Reading Psychology, 10, 339 355. Lovitt, T. C., Plavins, M., & Cushing, S. (1999). What do pupils with disabilities have to say about their experience in high school? Remedial and Special Education, 20, 6776, 83. Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.), Curriculumbased measurement: Assessing special children (pp. 1878). New York: Guilford Press. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 1323. Miller, S. E., Leinhardt, G., & Zigmond, N. (1988). Influencing engagement through accommodations: An ethnographic study of at-risk students. American Educational Research Journal, 25, 465 487. Nagy, W. E., & Scott, J. A. (2000). Vocabulary processes. In M. L. Kamil, P. B. Mosenthal, P. D. Pearson, & R. Barr (Eds.), Handbook of reading research (Vol. III). Mahwah, NJ: Erlbaum. Nolet, V., & Tindal, G. (1993). Special education in content area classes: Development of a model and practical procedures. Remedial and Special Education, 14, 36 48. Nolet, V., & Tindal, G. (1994). Instruction and learning in middle school science classes: Implications for students with learning disabilities. The Journal of Special Education, 28, 166 187.

VOLUME 38, NUMBER 4, JULY/AUGUST 2005

363

Nolet, V., & Tindal, G. (1995). Essays as valid measures of learning in middle school science classes. Learning Disability Quarterly, 18, 311324. Olson, G. H. (1989). On the validity of performance grades: The relationship between teacher-assigned grades and standard measures of subject matter acquisition. (ERIC Document Reproduction Service No. ED 307 290). Rojewski, J. W., Pollard, R. R., & Meers, G. D. (1991). Grading mainstreamed special needs students: Determining practices and attitudes of secondary vocational educators using a qualitative approach. Remedial and Special Education, 12(1), 715, 28. Salvia, J., & Ysseldyke, J. (1998). Assessment (6th ed.). Boston: Houghton Mifflin. Scruggs, T. E., Mastropieri, M. A., & Boon, R. (1998). Science education for students

with disabilities: A review of recent research. Studies in Science Education, 32, 2144. Shin, J., Deno, S. L., & Espin, C. A. (2000). Technical adequacy of the maze task for curriculum-based measurement of reading growth. The Journal of Special Education, 34, 164 172. Stecker, P. M., & Fuchs, L. S. (2000). Effecting superior achievement using curriculum-based measurement: The importance of individual progress monitoring. Learning Disabilities Research & Practice, 15, 128134. Tindal, G., & Nolet, V. (1995). Curriculumbased measurement in middle and high schools: Critical thinking skills in content areas. Focus on Exceptional Children, 27(7), 122. Tindal, G., & Nolet, V. (1996). Serving students in middle school content classes: A

heuristic study of critical variables linking instruction and assessment. The Journal of Special Education, 29, 414 432. Wagner, M. (1990). The school programs and school performance of secondary students classified as learning disabled: Findings from the National Longitudinal Transition Study of Special Education Students. Paper presented at the annual meeting of the American Educational Research Association, Boston. Wesson, C., Deno, S., Mirkin, P., Maruyama, G., Sevcik, B., Skiba, R., et al. (1988). A causal analysis of the relationships among on-going measurement and evaluation, the structure of instruction, and student achievement. Journal of Special Education, 22(3), 330343. Willet, J. B. (1989). Questions and answers in the measurement of change. Review of Research in Education, 15, 345 421.

Cover Art for 2006 Journal of Learning Disabilities Sought


The six covers of this volume year of the Journal of Learning Disabilities feature original artwork by Gabriel Lovett, Rachael Seger, and Caitlin Zirkelbach. We plan to continue showcasing the artwork of individuals with learning disabilities on JLD covers; therefore, we are now soliciting art for the 2006 issue covers. Eligibility. Individuals with learning disabilities of any age are encouraged to submit their original work for consideration. The artwork may be a painting, drawing, photograph, sculpture, computer-generated graphic, or any comparable medium. Work must not exceed a maximum of 11" by 17"; three-dimensional pieces must not exceed 10 pounds. Two entries per participant may be submitted. Submissions. Each entry must include: the artists name, age, address, and contact information the title of the work the specific medium used (computer-generated pieces should include step-by-step information on software used) the size of the work All artwork, including photographic images, must be the original work of the submitting artist. Signed photo releases must accompany any work that includes photo images of people. The actual submission of the art should be a color reproduction (which will not be returned) in one of the following formats: color laser print photograph slide (35 mm) saved as an EPS or TIFF file on Zip disk, CD, or regular 3 12" floppy disk The winner(s) may be asked to send in original art, which will be returned. Judging. Work will be judged based on originality, creative use of materials, and overall composition and design. The age of the artist will be taken into account. Entries should be postmarked by October 1, 2005. Judging will take place on or about October 15, and artists will be notified of our selection by December 1, 2005. Entries, requests for more information, or questions should be directed to Judith K. Voress, Periodicals Director, PRO-ED, 8700 Shoal Creek Blvd., Austin, TX 78757-6897; 512/451-3246, ext. 630; FAX: 512/3029129; e-mail: jvoress@proedinc.com. PRO-ED assumes no responsibility for entries damaged in the mail.

S-ar putea să vă placă și