Case Study

The Challenges of Assessing Young Children Appropriately LORRIE SHEPARD is a professor of education at the University of Colorado, Boulder.
She is past president of the National Council on Measurement in Education, past vice president of the American Educational Research Association, and a member of the National Academy of Education. She wishes to thank Sharon Lynn Kagan, M. Elizabeth Graue, and Scott F. Marion for their thoughtful suggestions on drafts of this article. Proposals to "assess" young children are likely to be met with outrage or enthusiasm, depending on one's prior experience and one's image of the testing involved. Will an inappropriate paper-and pencil test be used to keep some 5-year-olds out of school? Or will the assessment, implemented as an ordinary part of good instruction, help children learn? A governor advocating a test for every preschooler in the nation may have in mind the charts depicting normal growth in the pediatrician's office. Why shouldn't parents have access to similar measures to monitor their child's cognitive and academic progress? Middle-class parents, sanguine about the use of test scores to make college-selection decisions, may be eager to have similar tests determine their child's entrance into preschool or kindergarten. Early childhood experts, however, are more likely to respond with alarm because they are more familiar with the complexities of defining and measuring development and learning in young children and because they are more aware of the widespread abuses of readiness testing that occurred in the 1980s. Given a history of misuse, it is impossible to make positive recommendations about how assessments could be used to monitor the progress of individual children or to evaluate the quality of educational programs without offering assurances that the abuses will not recur. In what follows, I summarize the negative history of standardized testing of young children in order to highlight the transformation needed in both the substance and purposes of early childhood assessment. Then I explain from a measurement perspective how the features of an assessment must be tailored to match the purpose of the assessment. Finally, I describe differences in what assessments might look like when they are used for purposes of screening for handicapping conditions, supporting instruction, or monitoring state and national trends. Note that I use the term test when referring to traditional, standardized developmental and pre-academic measures and the term assessment when referring to more developmentally appropriate procedures for observing and evaluating young children. This is a semantic trick that plays on the different connotations of the two terms. Technically, they mean the same thing. Tests, as defined by the Standards for Educational and Psychological Testing, have always included systematic observations of behavior, but our experience is with tests as more formal, one right-answer instruments used to rank and sort individuals. As we shall see, assessments might be standardized, involve paper-and-pencil responses, and so on, but in contrast to traditional testing, "assessment" implies a substantive focus on student learning for the purpose of effective intervention. While test and assessment cannot be reliably distinguished technically, the difference between these two terms as they have grown up in common parlance is of symbolic importance. Using the term assessment presents an opportunity to step away from past practices and ask why we should try to measure what young children know and can do. If there are legitimate purposes for gathering such data, then we can seek the appropriate content and form of assessment to align with those purposes. Negative History of Testing Young Children
In order to understand the negative history of the standardized testing of young children in the past decade, we need to understand some larger shifts in curriculum and teaching practices. The distortion of the curriculum of the early grades during the 1980s is now a familiar and well-documented story. Indeed, negative effects persist in many school districts today. Although rarely the result of conscious policy decisions, a variety of indirect pressures such as older kindergartners, extensive preschooling for children from affluent families, parental demands for the teaching of reading in kindergarten, and accountability testing in higher grades - produced a skill-driven kindergarten curriculum. Because what once were first grade expectations were shoved down to kindergarten, these shifts in practice were referred to as the "escalation of curriculum" or "academic trickle-down." The result of these changes was an aversive learning environment inconsistent with the learning needs of young children. Developmentally inappropriate instructional practices, characterized by long periods of seatwork, high levels of stress, and a plethora of fillintheblank worksheets, placed many children at risk by setting standards for attention span, social maturity, and academic productivity that could not be met by many normal 5year-olds. Teachers and school administrators responded to the problem of a kindergarten environment that was increasingly hostile to young children with several ill-considered policies: raising the entrance age for school, instituting readiness screening to hold some children out of school for a year, increasing retentions in kindergarten, and creating twoyear programs with an extra grade either before or after kindergarten. These policies and practices had a benign intent: to protect children from stress and school failure. However, they were ill-considered because they were implemented without contemplating the possibility of negative side effects and without awareness that retaining some children and excluding others only exacerbated the problems by creating an older and older population of kindergartners.1 The more reasonable corrective for a skill-driven curriculum at earlier and earlier ages would have been curriculum reform of the kind exemplified by the recommendations for developmentally appropriate practices issued by the National Association for the Education of Young Children (NAEYC), the nation's largest professional association of early childhood educators.2 The first response of many schools, however, was not to fix the problem of inappropriate curriculum but to exclude those children who could not keep up or who might be harmed. Readiness testing was the chief means of implementing policies aimed at removing young children from inappropriate instructional programs. Thus the use of readiness testing increased dramatically during the 1980s and continues today in many school districts.3 Two different kinds of tests are used: developmental screening measures, originally intended as the first step in the evaluation of children for potential handicaps; and preacademic skills tests, intended for use in planning classroom instruction.4 The technical and conceptual problems with these tests are numerous.5 Tests are being used for purposes for which they were never designed or validated. Waiting a year or being placed in a two-year program represents a dramatic disruption in a child's life, yet not one of the
existing readiness measures has sufficient reliability or predictive validity to warrant making such decisions. Developmental and pre-academic skills tests are based on outmoded theories of aptitude and learning that originated in the 1930s. The excessive use of these tests and the negative consequences of being judged unready focused a spotlight on the tests' substantive inadequacies. The widely used Gesell Test is made up of items from old I.Q. tests and is indistinguishable statistically from a measure of I.Q.; the same is true for developmental measures that are really short-form I.Q. tests. Assigning children to different instructional opportunities on the basis of such tests carries forward nativist assumptions popular in the 1930s and 1940s. At that time, it was believed that I.Q. tests could accurately measure innate ability, unconfounded by prior learning experiences. Because these measured "capacities" were thought to be fixed and unalterable, those who scored poorly were given low-level training consistent with their supposedly limited potential. Tests of academic content might have the promise of being more instructionally relevant than disguised I.Q. tests, but, as Anne Stallman and David Pearson have shown, the decomposed and decontextualized prereading skills measured by traditional readiness tests are not compatible with current research on early literacy.6 Readiness testing also raises serious equity concerns. Because all the readiness measures in use are influenced by past opportunity to learn, a disproportionate number of poor and minority children are identified as unready and are excluded from school when they most need it. Thus children without preschool experience and without extensive literacy experiences at home are sent back to the very environments that caused them to score poorly on readiness measures in the first place. Or, if poor and minority children who do not pass the readiness tests are admitted to the school but made to spend an extra year in kindergarten, they suffer disproportionately the stigma and negative effects of retention. The last straw in this negative account of testing young children is the evidence that fallible tests are often followed by ineffective programs. A review of controlled studies has shown no academic benefits from retention in kindergarten or from extra-year programs, whether developmental kindergartens or transitional first grades. When extrayear children finally get to first grade, they do not do better on average than equally "unready" children who go directly on to first grade.7 However, a majority of children placed in these extra-year programs do experience some short- or long-term trauma, as reported by their parents.8 Contrary to popular belief that kindergarten children are "too young to notice" retention, most of them know that they are not making "normal" progress, and many continue to make reference to the decision years later. "If I hadn't spent an extra year in kindergarten, I would be in ____ grade now." In the face of such evidence, there is little wonder that many early childhood educators ask why we test young children at all. Principles for Assessment and Testing The NAEYC and the National Association of Early Childhood Specialists in State Departments of Education have played key roles in informing educators about the harm of developmentally inappropriate instructional practices and the misuse of tests. In 1991 NAEYC published "Guidelines for Appropriate Curriculum Content and Assessment in Programs Serving Children Ages 3 Through 8."9 Although the detailed recommendations are too numerous to be repeated here, a guiding principle is that assessments should bring
about benefits for children, or data should not be collected at all. Specifically, assessments "should not be used to recommend that children stay out of a program, be retained in grade, or be assigned to a segregated group based on ability or developmental maturity."10 Instead, NAEYC acknowledges three legitimate purposes for assessment: 1) to plan instruction and communicate with parents, 2) to identify children with special needs, and 3) to evaluate programs. Although NAEYC used assessment in its "Guidelines," as I do, to avoid associations with inappropriate uses of tests, both the general principle and the specific guidelines are equally applicable to formal testing. In other words, tests should not be used if they do not bring about benefits for children. In what follows I summarize some additional principles that can ensure that assessments (and tests) are beneficial and not harmful. Then, in later sections, I consider each of NAEYC's recommended uses for assessment, including national, state, and local needs for program evaluation and accountability data. I propose a second guiding principle for assessment that is consistent with the NAEYC perspective. The content of assessments should reflect and model progress toward important learning goals. Conceptions of what is important to learn should take into account both physical and social/emotional development as well as cognitive learning. For most assessment purposes in the cognitive domain, content should be congruent with subject matter in emergent literacy and numeracy. In the past, developmental measures were made as "curriculum free" or "culture free" as possible in an effort to tap biology and avoid the confounding effects of past opportunity to learn. Of course, this was an impossible task because a child's ability to "draw a triangle" or "point to the ball on top of the table" depends on prior experiences as well as on biological readiness. However, if the purpose of assessment is no longer to sort students into programs on the basis of a one-time measure of ability, then it is possible to have assessment content mirror what we want children to learn. A third guiding principle can be inferred from several of the NAEYC guidelines. The methods of assessment must be appropriate to the development and experiences of young children. This means that - along with written products -- observation, oral readings, and inter-views should be used for purposes of assessment. Even for large-scale purposes, assessment should not be an artificial and decontextualized event; instead, the demands of data collection should be consistent with children's prior experiences in classrooms and at home. Assessment practices should recognize the diversity of learners and must be in accord with children's language development - both in English and in the native languages of those whose home language is not English. A fourth guiding principle can be drawn from the psychometric literature on test validity. Assessments should be tailored to a specific purpose. Although not stated explicitly in the NAEYC document, this principle is implied by the recommendation of three sets of guidelines for three separate assessment purposes. Matching the Why and How of Assessment The reason for any assessment - i.e., how the assessment information will be used affects the substance and form of the assessment in several ways. First, the degree of technical accuracy required depends on use. For example, the identification of children for special education has critical implications for individuals. Failure to be identified could mean the denial of needed services, but being identified as in need of special services may also mean removal from normal classrooms (at least part of the time) and a
potentially stigmatizing label. A great deal is at stake in such assessment, so the multifaceted evaluation employed must have a high degree of reliability and validity. Ordinary classroom assessments also affect individual children, but the consequences of these decisions are not nearly so great. An inaccurate assessment on a given day may lead a teacher to make a poor grouping or instructional decision, but such an error can be corrected as more information becomes available about what an individual child "really knows." The intended use of an assessment will determine the need for normative information or other means to support the interpretation of results* Group assessment refers to uses, such as program evaluation or school accountability, in which the focus is on group performance rather than on individual scores. Although group assessments may need to meet very high standards for technical accuracy, because of the high stakes associated with the results, the individual scores that contribute to the group information do not have to be so reliable and do not have to be directly comparable, so long as individual results are not reported. When only group results are desired, it is possible to use the technical advantages of matrix sampling - a technique in which each participant takes only a small portion of the assessment - to provide a rich, indepth assessment of the intended content domain without overburdening any of the children sampled. When the "group" is very large, such as all the fourth-graders in a state or in the nation, then assessing a representative sample will produce essentially the same results for the group average as if every student had been assessed. Purpose must also determine the content of assessment. When trying to diagnose potential learning handicaps, we still rely on aptitude-like measures designed to be as content-free as possible. We do so in order to avoid confusing lack of opportunity to learn with inability to learn. When the purpose of assessment is to measure actual learning, then content must naturally be tied to learning outcomes. However, even among achievement tests, there is considerable variability in the degree of alignment to a specific curriculum. Although to the lay person "math is math" and "reading is reading," measurement specialists are aware that tiny changes in test format can make a large difference in student performance. For example, a high proportion of students may be able to add numbers when they are presented in vertical format, but many will be unable to do the same problems presented horizontally. If manipulatives are used in some elementary classrooms but not in all, including the use of manipulatives in a mathematics assessment will disadvantage some children, while excluding their use will disadvantage others. Assessments that are used to guide instruction in a given classroom should be integrally tied to the curriculum of that classroom. However, for large-scale assessments at the state and national level, the issues of curriculum match and the effect of assessment content on future instruction become much more problematic. For example, in a state with an agreed upon curriculum, including geometry assessment in the early grades may be appropriate, but it would be problematic in states with strong local control of curriculum and so with much more curricular diversity. Large-scale assessments, such as the National Assessment of Educational Progress, must include instructionally relevant content, but they must do so without conforming too closely to any single curriculum. In the past, this requirement has led to the problem of
achievement tests that are limited to the "lowest common denominator." Should the instrument used for program evaluation include only the content that is common to all curricula? Or should it include everything that is in any program's goals? Although the common core approach can lead to a narrowing of curriculum when assessment results are associated with high stakes, including everything can be equally troublesome if it leads to superficial teaching in pursuit of too many different goals. Finally, the intended use of an assessment will determine the need for normative information or other means to support the interpretation of assessment results. Identifying children with special needs requires normative data to distinguish serious physical, emotional, or learning problems from the wide range of normal development. When reporting to parents, teachers also need some idea of what constitutes grade-level performance, but such "norms" can be in the form of benchmark performances - evidence that children are working at grade level - rather than statistical percentiles. To prevent the abuses of the past, the purposes and substance of early childhood assessments must be transformed. Assessments should be conducted only if they serve a beneficial purpose: to gain services for children with special needs, to inform instruction by building on what students already know, to improve programs, or to provide evidence nationally or in the states about programmatic needs. The form, substance, and technical
features of assessment should be appropriate for the use intended for assessment data. Moreover, the methods of assessment must be compatible with the developmental level and experiences of young children. Below, I consider the implications of these principles for three different categories of assessment purposes. Identifying Children with Special Needs I discuss identification for special education first because this is the type of assessment that most resembles past uses of developmental screening measures. However, there is no need for wholesale administration of such tests to all incoming kindergartners. If we take the precepts of developmentally appropriate practices seriously, then at each age level a very broad range of abilities and performance levels is to be expected and tolerated. If potential handicaps are understood to be relatively rare and extreme, then it is not necessary to screen all children for "hidden" disabilities. By definition, serious learning problems should be apparent. Although it is possible to miss hearing or vision problems (at least mild ones) without systematic screening, referral for evaluation of a possible learning handicap should occur only when parents or teachers notice that a child is not progressing normally in comparison to age-appropriate expectations. In-depth assessments should then be conducted to verify the severity of the problem and to rule out a variety of other explanations for poor performance. For this type of assessment, developmental measures, including I.Q. tests, continue to be useful. Clinicians attempt to make normative evaluations using relatively curriculum-free tasks, but today they are more likely to acknowledge the fallibility of such efforts. For
such difficult assessments, clinicians must have specialized training in both diagnostic assessment and child development. When identifying children with special needs, evaluators should use two general strategies in order to avoid confounding the ability to learn with past opportunity to learn. First, as recommended by the National Academy Panel on Selection and Placement of Students in Programs for the Mentally Retarded, 11 a child's learning environment should be evaluated to rule out poor instruction as the possible cause of a child's lack of learning. Although seldom carried out in practice, this evaluation should include trying out other methods to support learning and possibly trying a different teacher before concluding that a child can't learn from ordinary classroom instruction. A second important strategy is to observe a child's functioning in multiple contexts. Often children who appear to be impaired in school function well at home or with peers. Observation outside of school is critical for children from diverse cultural backgrounds and for those whose home language is not English. The NAEYC stresses that "screening should never be used to identify second language learners as 'handicapped,' solely on the basis of their limited abilities in English."12 In-depth developmental assessments are needed to ensure that children with disabilities receive appropriate services. However, the diagnostic model of special education should not be generalized to a larger population of below-average learners, or the result will be the reinstitution of tracking. Elizabeth Graue and I analyzed recent efforts to create "atrisk" kindergartens and found that these practices are especially likely to occur when resources for extended-day programs are available only for the children most in need.13 The result of such programs is often to segregate children from low socioeconomic backgrounds into classrooms where time is spent drilling on low-level prereading skills like those found on readiness tests. The consequences of dumbed-down instruction in kindergarten are just as pernicious as the effects of tracking at higher grade levels, especially when the at-risk kindergarten group is kept together for first grade. If resources for extended-day kindergarten are scarce, one alternative would be to group children heterogeneously for half the day and then, for the other half, to provide extra enrichment activities for children with limited literacy experiences. Classroom Assessments Unlike traditional readiness tests that are intended to predict learning, classroom assessments should support instruction by modeling the dimensions of learning. Although we must allow considerable latitude for children to construct their own understandings, teachers must nonetheless have knowledge of normal development if they are to support children's extensions and next steps. Ordinary classroom tasks can then be used to assess a child's progress in relation to a developmental continuum. An example of a developmental continuum would be that of emergent writing, beginning with scribbles, then moving on to pictures and random letters, and then proceeding to some letter/word correspondences. These continua are not rigid, however, and several dimensions running in parallel may be necessary to describe growth in a single content area. For example, a second dimension of early writing - a child's ability to invent increasingly elaborated stories when dictating to an adult s - not dependent on mastery of writing letters, just as listening comprehension, making predictions about books, and story retellings should be developed in parallel to, not after, mastery of letter sounds.
Although there is a rich research literature documenting patterns of emergent literacy and numeracy, corresponding assessment materials are not so readily available. In the next few years, national interest in developing alternative, performance-based measures should generate more materials and resources. Specifically, new Chapter I legislation is likely to support the development of reading assessments that are more authentic and instructionally relevant. For example, classroom-embedded reading assessments were created from ordinary instructional materials by a group of third-grade teachers in conjunction with researchers at the Center for Research on Evaluation, Standards, and Student Testing.14 The teachers elected to focus on fluency and making meaning as reading goals; running records and story summaries were selected as the methods of assessment. But how should student progress be evaluated? In keeping with the idea of representing a continuum of proficiency, third-grade teachers took all the chapter books in their classrooms and sorted them into grade-level stacks, I -I (first grade, first semester), 1-2, 2-1, and so on up to fifth grade. Then they identified representative or marker books in each category to use for assessment. Once the books had been sorted by difficulty, it became possible to document that children were reading increasingly difficult texts with understanding. Photocopied pages from the marker books also helped parents see what teachers considered to be grade-level materials and provided them with concrete evidence of their child's progress. Given mandates for student-level reporting under Chapter 1, state departments of education or test publishers could help develop similar systems of this type with sufficient standardization to ensure comparability across districts. In the meantime, classroom teachers - or preferably teams of teachers - are left to invent their own assessments for classroom use. In many schools, teachers are already working with portfolios and developing scoring criteria. The best procedure appears to be having grade-level teams and then cross-grade teams meet to discuss expectations and evaluation criteria. These conversations will be more productive if, for each dimension to be assessed, teachers collect student work and use marker papers to illustrate continua of performance. Several papers might be used at each stage to reflect the tremendous variety in children's responses, even when following the same general progression. Benchmark papers can also be an effective means of communicating with parents. For example, imagine using sample papers from grades K-3 to illustrate expectations regarding "invented spelling." Invented spelling or "temporary spelling" is the source of a great deal of parental dissatisfaction with reform curricula. Yet most parents who attack invented spelling have never been given a rationale for its use. That is, no one has explained it in such a way that the explanation builds on the parents' own willingness to allow successive approximations in their child's early language development. They have never been shown a connection between writing expectations and grade-level spelling lists or been informed about differences in rules for first drafts and final drafts. Sample papers could be selected to illustrate the increasing mastery of grade-appropriate words, while allowing for misspellings of advanced words on first drafts. Communicating criteria is helpful to parents, and, as we have seen in the literature on performance assessment, it also helps children to understand what is expected and to become better at assessing their own work. Monitoring National and State Trends In 1989, when the President and the nation's governors announced "readiness for school"
as the first education goal, many early childhood experts feared the creation of a national test for school entry. Indeed, given the negative history of readiness testing, the first thing the Goal I Technical Planning Subgroup did was to issue caveats about what an early childhood assessment must not be. It should not be a one-dimensional, reductionist measure of a child's knowledge and abilities; it should not be called a measure of "readiness" as if some children were not ready to learn; and it should not be used to "label, stigmatize, or classify any individual child or group of children."15 However, with this fearsome idea set aside, the Technical Planning Subgroup endorsed the idea of an early childhood assessment system that would periodically gather data on the condition of young children as they enter school. The purpose of the assessment would be to inform public policy and especially to help "in charting progress toward achievement of the National Education Goals, and for informing the development, expansion, and/or modification of policies and programs that affect young children and their families." 16 Assuming that certain safeguards are built in, such data could be a powerful force in focusing national attention and resources on the needs of young children. Beginning in 1998-99, a representative sample of 23,000 kindergarten students will be assessed and then followed through grade 5. Unlike past testing practices aimed at evaluating individual children in comparison with normative expectations, a large-scale, nationally representative assessment would be used to monitor national trends. The purpose of such an assessment would be analogous to the use of the National Assessment of Educational Progress (NAEP) to measure major shifts in achievement patterns. For example, NAEP results have demonstrated gains in the achievement of black students in the South as a result of desegregation, and NAEP achievement measures showed gains during the 1980s in basic skills and declines in higher-order thinking skills and problem solving. Similar data are not now available for preschoolers or for children in the primary grades. If an early childhood assessment were conducted periodically, it would be possible to demonstrate the relationship between health services and early learning and to evaluate the impact of such programs as Head Start. In keeping with the precept that methods of assessment should follow from the purpose of assessment, the Technical Planning Subgroup recommended that sampling of both children and assessment items be used to collect national data. Sampling would allow a broad assessment of a more multifaceted content domain and would preclude the misuse of individual scores to place or stigmatize individual children. A national early childhood assessment should also serve as a model of important content. As a means to shape public understanding of the full range of abilities and experiences that influence early learning and development, the Technical Planning Subgroup identified five dimensions to be assessed: 1) physical well-being and motor development, 2) social and emotional development, 3) approaches toward learning, 4) language usage, and 5) cognition and general knowledge. Responding to the need for national data to document the condition of children as they enter school and to measure progress on Goal 1, the U.S. Department of Education has commissioned the Early Childhood Longitudinal Study: Kindergarten Cohort. Beginning in the 1998-99 school year, a representative sample of 23,000 kindergarten students will be assessed and then followed through grade 5. The content of the assessments used will
correspond closely to the dimensions recommended by the Technical Planning Subgroup. In addition, data will be collected on each child's family, community, and school/program. Large-scale studies of this type serve both program evaluation purposes (How effective are preschool services for children?) and research purposes (What is the relationship between children's kindergarten experiences and their academic success throughout elementary school?). National needs for early childhood data and local needs for program evaluation information are similar in some respects and dissimilar in others. Both uses require group data. However, a critical distinction that affects the methods of evaluation is whether or not local programs share a common curriculum. If local programs, such as all the kindergartens in a school district, have agreed on the same curriculum, it is possible to build program evaluation assessments from an aggregation of the measures used for classroom purposes. Note that the entire state of Kentucky is attempting to develop such a system by scoring classroom portfolios for state reporting. Fearing that "assessment" is just a euphemism for more bad testing, many early childhood professionals have asked, Why test at all? If programs being evaluated do not have the same specific curricula, as is the case with a national assessment and with some state assessments, then the assessment measures must `` order, more easily said than done. For this reason, the Technical Planning Subgroup recommended that validity studies be built into the procedures for data collection. For example, pilot studies should verify that what children can do in one-on-one assessment settings is consistent with what they can do in their classrooms, and assessment methods should always allow children more than one way to show what they know. Conclusion In the past decade, testing of 4-, 5-, and 6-year-olds has been excessive and inappropriate. Under a variety of different names, leftover I.Q. tests have been used to track children into ineffective programs or to deny them school entry. Prereading tests held over from the 1930s have encouraged the teaching of decontextualized skills. In response, fearing that "assessment" is just a euphemism for more bad testing, many early childhood professionals have asked, Why test at all? Indeed, given a history of misuse, the burden of proof must rest with assessment advocates to demonstrate the usefulness of assessment and to ensure that abuses will not recur. Key principles that support responsible use of assessment information follow. No testing of young children should occur unless it can be shown to lead to beneficial results. Methods of assessment, especially the language used, must be appropriate to the development and experiences of young children. Features of assessment -- content, form, evidence of validity, and standards for interpretation -- must be tailored to the specific purpose of an assessment. Identifying children for special education is a legitimate purpose for assessment and still requires the use of curriculum-free, aptitude-like measures and normative comparisons. However, handicapping conditions are rare; the diagnostic model used by special education should not be generalized to a larger population of below-average learners. For both classroom instructional purposes and purposes of public policy making, the content of assessments should embody the important dimensions of early learning and
development. The tasks and skills children are asked to perform should reflect and model progress toward important learning goals. In the past, local newspapers have published readiness checklists that suggested that children should stay home from kindergarten if they couldn't cut with scissors. In the future, national and local assessments should demonstrate the richness of what children do know and should foster instruction that builds on their strengths. Telling a story in conjunction with scribbles is a meaningful stage in literacy development. Reading a story in English and retelling it in Spanish is evidence of reading comprehension. Evidence of important learning in beginning mathematics should not be counting to 100 instead of to 10. It should be extending patterns; solving arithmetic problems with blocks and explaining how you got your answer; constructing graphs to show how many children come to school by bus, by walking, by car; and demonstrating understanding of patterns and quantities in a variety of ways. In classrooms, we need new forms of assessment so that teachers can support children's physical, social, and cognitive development. And at the level of public policy, we need new forms of assessment so that programs will be judged on the basis of worthwhile educational goals. 1. Lorrie A. Shepard and Mary Lee Smith, "Escalating Academic Demand in Kindergarten: Counterproductive Policies," Elementary School Journal, vol. 89, 1988, pp. 135-45. 2. Sue Bredekamp, ed., Developmentally Appropriate Practice in Early Childhood Programs Serving Children from Birth Through Age 8, exp. ed. (Washington, D.C.: National Association for the Education of Young Children, 1987). 3. M. Therese Gnezda and Rosemary Bolig, A Notional Survey of Public School Testing of Pre-Kindergarten and Kindergarten Children (Washington, D.C.: National Forum on the Future of Children and Families, National Research Council, 1988). 4. Samuel J. Meisels, "Uses and Abuses of Developmental Screening and School Readiness Testing," Young Children, vol. 42, 1987, pp. 4-6, 68-73. 5. Lorrie A. Shepard and M. Elizabeth Graue, "The Morass of School Readiness Screening: Research on Test Use and Test Validity," in Bernard Spodek, ed., Handbook of Research on the Education of Young Children (New York: Macmillan, 1993), pp. 293305. 6. Anne C. Stallman and P. David Pearson, "Formal Measures of Early Literacy," in Lesley Mandel Morrow and Jeffrey K. Smith, eds., Assessment for Instruction in Early Literacy (Englewood Cliffs, N.J.: Prentice-Hall, 1990), pp. 7-44. 7. Lorrie A. Shepard, "A Review of Research on Kindergarten Retention," in Lorrie A. Shepard and Mary Lee Smith, eds., Flunking Grades: Research and Policies on Retention (London: Falmer Press, 1989), pp. 64-78. 8. Lorrie A. Shepard and Mary Lee Smith, "Academic and Emotional Effects of Kindergarten Retention in One School District," in idem, pp. 79- 107. 9. "Guidelines for Appropriate Curriculum Content and Assessment in Programs Serving Children Ages 3 Through 8," Young Children, vol. 46, 1991, pp. 21-38. 10. Ibid., p. 32. 11. Kirby A. Heller, Wayne H. Holtzman, and Samuel Messick, eds., Placing Children in Special Education (Washington, D.C.: National Academy Press, 1982). 12. "Guidelines," p. 33.
13. Shepard and Graue, op. cit. 14. The Center for Research on Evaluation, Standards, and Student Testing is located on the campuses of the University of California, Los Angeles, and the University of Colorado, Boulder. 15. Goal 1: Technical Planning Subgroup Report on School Readiness (Washington, D.C.: National Education Goals Panel, September 199 1 ). 16. Ibid., p. 6. CONTACT the Webma. .. . .. . ..
........ assessment report
Assessing Young Children

Marcy Guddemi, Ph.D. Betsy J. Case, Ph. D.
February 2004
Copyright 2004 by Pearson Education, Inc. or its affiliate(s). All rights reserved Pearson and the Pearson logo are trademarks of Pearson Education, Inc. or its affiliate(s).
. .. . .. . ASSESSMENT REPORT Assessing Young Children

2
Assessing Young Children

(This updated version was originally published as: Guddemi, M. P. (2003). The important role of quality assessment in young children ages 38. In Wall, J. & Walz, G. (Eds.) (2003). Measuring up: Assessment issues for teachers, counselors, and administrators. Greensboro, NC: ERIC Counseling and Student Services Clearinghouse.)
Introduction
Todays educational climate of standards and accountability extends even to preschool programs (Bowman, Donovan, and Burns, 2001). The No Child Left Behind Act of 2001 (NCLB) mandates assessment and accountability at all levels of public school, even in early childhooddefined as birth through age 8 (NAEYC, 1987). Additionally, the current preschool initiative Good Start, Grow Smart requires a demonstration of positive child outcomes and ongoing assessment efforts. The initiative dramatically affects accountability measures for Head Start (Horn, 2003). In light of this background, it is critical to understand how both formal and informal assessments, when developmentally appropriate in design and purpose, are beneficial for early childhood. This age period is often broken into three groups for discussion: infants/toddlers (ages 0 through 2), preschoolers (ages 3
through 5), and primary children (kindergarten through grade 3). This report will focus on young children aged 3 through 8 years. It will examine the perspectives of various national organizations on the essential role of assessment and accountability during early childhood, and will also describe an appropriate assessment system for this age group.
The Challenge of Early Childhood Assessment

The assessment of young children is very different from the assessment of older children and adults in several ways. The greatest difference is in the way young children learn. They construct knowledge in experiential, interactive, concrete, and hands-on ways (Bredekamp and Rosegrant, 1992, 1995) rather than through abstract reasoning and paper and pencil activities alone. To learn, young children must touch and manipulate objects, build and create in many media, listen and act out stories and everyday roles, talk and sing, and move and play in various ways and environments. Consequently, the expression of what young children know and can do would best be served in ways other than traditional paper and pencil assessments.

3
Assessment is also challenging during early childhood because a childs development is rapid, uneven, episodic, and highly influenced by the environment (Shepard, Kagan, and Wurtz, 1998). A developing child exhibits periods of both rapid growth and frequent rest. Children develop in four domainsphysical, cognitive, social, and emotionaland not at the same pace through each. No two children are the same; each child has a unique rate of development. In addition, no two children have the same family, cultural, and experiential backgrounds. Clearly, these variables mean that a one-size-fits-all assessment will not meet the needs of most young children (Shepard, et al.). Another assessment challenge for young children is that it takes time to administer assessments properly. Assessments primarily should be administered in a one-on-one setting to each child by his or her teacher. In addition, a childs attention span is often very short and the assessment should therefore be administered in short segments over a period of a few days or even weeks. While early childhood educators demand developmentally appropriate assessments for children, they often complain about the time it takes to administer them and the resulting loss of instructional time in the classroom. However, when quality tests mirror quality instruction, assessment and teaching become almost seamless, complementing and informing one another (Neuman, Copple, and Bredekamp, 2000).
NAEYC Position Statement on Early Childhood Assessment (1987)

In the position statement Standardized Testing of Young Children 3 Through 8 Years of Age, the National Association for the Education of Young Children
(NAEYC) (1987) summarized a number of challenges faced when assessing young children. First, the NAEYC stressed the importance of quality instruments and emphasized that not all assessments are detrimental to young children. According to NAEYC, quality assessments are those that meet the guidelines for reliability and validity as established by the Standards for Educational and Psychological Testing (American Educational Research Association, 1999), are appropriate for the childs age and stage of development, and rely heavily on demonstration or expression of skills and knowledge. These assessments also should be individually administered to elicit the most accurate and useful information for the teacher. The NAEYC position statement also emphasizes that administrators play an important role in using the information generated by assessments. When interpreting assessment results, administrators must be aware and sensitive to each young childs unique rate of development. Decisions about a childs placement or special needs should never be based on a single test result. The

4
appropriate use of information from early childhood assessments is to guide instruction and to determine what the child is ready for next in terms of knowledge and skills. Administrators also use information from assessments and other sources to evaluate, strengthen, and monitor educational programs.
National Education Goals Panel on Early Childhood Assessment (1998)

Advice published in Principles and Recommendations for Early Childhood Assessments (Shepard, et al., 1998) by the National Education Goals Panel (NEGP), a government-appointed committee and extension of the Goals 2000 education movement, still has meaning today. According to the NEGP guidelines, assessments should: bring about benefits for children; be tailored to a specific purpose; be reliable, valid, and fair; bring about and reflect policies that acknowledge that as the age of the child increases, reliability and validity of the assessment increases; be age-appropriate in both content and methodology; be linguistically appropriate because all assessments measure language; and value parents as an important source of assessment information. In addition, the NEGP very clearly stated that assessments should be used for a specific purpose and that a single assessment more than likely could not serve multiple purposes. The purposes of assessments are to support learning, identify special needs, evaluate a program, monitor trends, and serve high stakes
accountability requirements. The NEGP recommends that standardized assessments for high stakes purposes not be administered until grade 3 and preferably not until grade 4 (Shepard, et al., 1998).
IRA / NAEYC Position Statement on Reading and Writing (1998)

In response to the nations growing interest and commitment to literacy, the International Reading Association (IRA) and the NAEYC jointly published the position statement Overview of Learning to Read and Write: Developmentally Appropriate Practices for Young Children (1998). Because these two organizations have at times been at odds with each other over what are appropriate instructional techniques for early childhood, this document is an especially significant agreement between the two groups concerning how children learn to read and write. The position statement describes for the early childhood and reading communities that developmentally appropriate means setting

5
achievable yet challenging goals. Furthermore, it emphasizes that: (1) the foundation of reading consists of basic skills which can (and should) be taught and (2) quality ongoing diagnostic assessment is essential in knowing how to help young children become good readers.
National Research Council (1999)

The National Research Council (NRC) is a national panel convened by the National Academy of Sciences to study the issue of literacy development. After an extensive and exhaustive review of literacy and reading research, the NRC published a sweeping report, Preventing Reading Difficulties in Young Children (Burns, Griffin, and Snow, 1999), which set forth guidelines and recommendations for literacy development and the role of assessment for young children. The report states that it is absolutely essential for teachers to know how to use ongoing in-class assessments and how to interpret norm-referenced and individually referenced assessment outcomes, including both formal and informal in-class assessments and progress-monitoring measures used by specialists (Burns, et al., p. 123). According to the NRC report, high-quality assessments should be child-friendly, include developmentally appropriate activities, and mirror quality instruction. In addition, they should be individually and orally administered so as to provide immediate, diagnostic information to the teacher. The assessment program should be based on benchmarks or standards of achievement. Quality assessments benefit the classroom teacher in real ways by providing certainty of each childs initial and continued literacy levels. Quality assessments provide detailed diagnostic information to guide planning for instruction and monitoring of individual student progress over time.
NAEYC / NAECS / SDE Position Statement on Early Childhood (2003)
The most recently published position statement is Early Childhood Curriculum, Assessment, and Program Evaluation (National Association for the Education of Young Children and National Association of Early Childhood Specialists in State Departments of Education, 2003), which draws on all the prior works discussed above. It emphasizes linking assessment information to the family. It also points out the importance of professional development for teachers and parents in understanding and using assessment for: (1) making sound decisions about teaching and learning; (2) identifying significant concerns that may require focused intervention for individual children; and (3) helping programs improve their educational and developmental interventions (p. 3). The statement recommends [making] ethical, appropriate, valid, and reliable assessment a central part of all early childhood programs (p. 1).

6
A Quality Early Childhood Assessment and Accountability System

An assessment and accountability system for young children should incorporate the characteristics of quality discussed above. The following are examples of early childhood assessment tools, one or more of which could be included in a quality assessment system for young children. When used together in an assessment system, these tools will yield meaningful and useful information to teachers, parents, and administrators. 1. Observations and Checklists A well-defined checklist with observation training is critical and essential for an assessment system. Observations of child behaviors and skills provide the teacher with a powerful measure of a childs abilities. For example, a teacher observation of a child retelling what happened last night at home with a big smile and expressive language is a truer measure of oral language skills than asking the child to retell a story in an unfamiliar setting. 2. Anecdotal Records Anecdotal records are short, factual, narrative descriptions of child behaviors and skills over time. Anecdotal records should be as objective as possible and only a few sentences long. Gina, age 4.10, chose the library center today. She pretended to read Peter Rabbit to two doll babies and Jessica. She turned each page and recited with expression the memorized words on each page. She showed the picture at each page turn. 3. Running Records Running records are similar to anecdotal records but are much longer. An observer objectively writes in a narrative format everything the child did and said for a specific time period such as thirty minutes. Running records are especially helpful in analyzing social skill development or behavior concerns. Running records also can be narrowly focused to a subject area such as a running record
that documents the accuracy and miscue strategies of a child reading a specific passage. 4. Portfolios A portfolio is a flexible and adaptable collection over time of various concrete work samples showing many dimensions of the childs learning. This type of assessment tool is particularly ideal for use in the primary grades when children are developing knowledge and skills in several subject areas at different rates. This type of assessment also focuses on the childs strengths and demonstrations of knowledge and skills.

7
5. Home Inventories Parents may see behaviors and skills that children demonstrate in only the home setting. Home inventories collect valuable information through a survey or set of short, open-ended response items completed by the adult at the childs home. 6. Developmental Screenings Developmental screenings are a short (1520 minutes) set of age- and contentappropriate performance items based on a developmental continuum and linked to ages typical for the behavior. This type of assessment is helpful in identifying major developmental delays that indicate the need for a more thorough diagnostic assessment. Screening assessments should not necessarily screen out a child as not ready, but rather serve as a guide for instruction that reveals the subject areas for which the child is ready to begin learning. This type of assessment can also provide guidance for the program needs. 7. Diagnostic Assessments A diagnostic assessment identifies a range of strengths and weaknesses in the child and suggests specific remedial actions. Classroom diagnostic assessments are not direct measures of academic outcome and should never be used for accountability purposes alone. 8. Standardized Assessments Standardized assessments are typically administered in groups and provide normative and scalable data that can be aggregated and reported to administrators and policymakers. Standardized assessments are direct measures of childrens outcomes and are administered under very stringent protocols. Standardized assessments are also used to monitor trends and for program evaluation and accountability. Typically, standardized assessments are paper/pencil-based and designed to capture only the childs response without administrator bias. Quality standardized tests are developed in accordance with guidelines in Standards for Educational and Psychological Testing (AERA, 1999). For young children, standardized tests also should contain authentic content and mirror classroom instruction. They should incorporate an inviting use of color
and graphics and include manipulatives when appropriate. Screening and diagnostic assessments may also be standardized in the way they are administered. Because standardized assessments are not as accurate, valid, and reliable for young children as they are for older children, they should not be used solely to make high stakes decisions until grade 3 and preferably not until grade 4 (Shepard et al., 1998).

8
Pearsons Recommendations
Pearson recommends that the following guidelines are followed when testing children in preschool and the early grades: Administer tests in a one-on-one setting The child should know the test administrator (preferably the teacher) If it is not possible for the child to be tested by a someone familiar, the test administrator should use warm-up activities to build rapport with the child Keep each testing session short Reinforce the child throughout the testing session
Conclusion
Quality formal and informal assessments are essential parts of a sound early childhood program and are mandated in federal programs such as Head Start and Reading First. Educators, administrators, and policy makers responsible for the education of young schoolchildren should not fear a carefully planned assessment program. Quality assessments have the following benefits: They give teachers valuable and individualized information about childrens developing skills and knowledge They lead the teacher to select quality early childhood activities and instruction They provide information that helps administrators strengthen existing programs and hold them accountable Most of all, developmentally appropriate assessments benefit young children by helping teachers ensure that a young childs educational journey springs from a solid foundation of basic skills
References
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author. Bowman, B. T., Donovan, M. S., & Burns, M. S. (Eds.). (2001). Eager to learn: Educating our preschoolers. Washington, DC: National Academy Press. Bredekamp, S., & Rosegrant, T. (Eds.). (1992). Reaching potentials: Appropriate curriculum and assessment for young children (Vol. 1). Washington, DC: National Association for the Education of Young Children.
Copyright 2004 by Pearson Education, Inc. or its affiliate(s). All rights reserved
Pearson and the Pearson logo are trademarks of Pearson Education, Inc. or its affiliate(s).

9
Bredekamp, S., & Rosegrant, T. (Eds.). (1995). Reaching potentials: Transforming early childhood curriculum and assessment (Vol. 2). Washington, DC: National Association for the Education of Young Children. Burns, S. M., Griffin, P., & Snow, C. E. (1999). Starting out right: A guide to promoting childrens reading success. (Abbreviated National Research Council report Preventing Reading Difficulties in Young Children.) Washington, DC: National Academy Press. Harp, B., & Brewer, A. (2000). Assessing reading and writing in the early years. In Strickland, D. and Morrow, L. (Eds.). Beginning reading and writing. New York: Teachers College Press. Horn, W. F. (2003). Improving Head Start: A common cause. Head Start Bulletin, Issue No. 76. Washington, DC: U.S. Department of Health and Human Services. International Reading Association & National Association for the Education of Young Children. (1998). Learning to read and write: Developmentally appropriate practices for young children. (Position statement). Retrieved from http://www.naeyc.org/resources/position_statements/psread0.htm McAfee, O., & Leong, D. (1997). Assessing and guiding young childrens development and learning. Boston: Allyn and Bacon. Meisels, S. (1989). Developmental screening in early childhood: A guide. Washington, DC: NAEYC. National Association for the Education of Young Children. (1987). Standardized testing of young children 3 through 8 years of age. (Position statement). Washington, DC: NAEYC. Retrieved from http://www.naeyc.org/resources/position_statements/pstestin.htm National Association for the Education of Young Children & National Association of Early Childhood Specialists in State Departments of Education. (1990). Guidelines for appropriate curriculum content and assessment in programs serving children ages 3 through 8. (Position statement). Washington, DC: NAEYC. Retrieved from http://www.naeyc.org/resources/position_statements/pscuras.htm National Association for the Education of Young Children & National Association of Early Childhood Specialists in State Departments of Education. (2003). Early childhood curriculum, assessment and program evaluation. (Position statement). Washington, DC: NAEYC. Retrieved from http://www.naeyc.org/resources/position_statements/pscape.pdf (available from the web only). Neuman, S. B., Copple, C., & Bredekamp, S. (2000). Learning to read and write. Developmentally appropriate practice for children. Washington, DC: National Association for the Education of Young Children. Shepard, L. A., Kagan, S. L., & Wurtz, E. (Eds.). (1998). Principles and recommendations for early childhood assessments. Washington, DC: National Goals Panel.
Additional copies of this and related documents are available from: Pearson Inc., 19500 Bulverde Rd.,
San Antonio, TX 78259, 1-800-211-8378, 1-877-576-1816 (fax), http://www.pearsonassess.com

Copyright 2004 by Pearson Education, Inc. or its affiliate(s). All rights reserved Pearson and the Pearson logo are trademarks of Pearson Education, Inc. or its affiliate(s).ster
(webmaster@cse.ucla.edu) concerning this
Maps | Directory | Site Index | Help Skip links CSULB Home Students Faculty and Staff Alumni Parents Giving to CSULB Community Home > AA > Grad Undergrad > Senate > Committees > Assessment > DEV > Info > What Assessment Home Assessment Policy Assessment Resources CSU Accountability Assessment FAQs Academic Senate Home Division Offices Academic Affairs Home Office of the University Provost Academic Personnel Academic Technology and Facilities Planning Strategic Planning and Enrollment Graduate and Undergraduate Programs Research and External Support Need Help? Technology Service Desk Instructional Technology Support Services BeachBoard Support BeachID Account Manager MyCSULB Support AA Webmaster Assessment Definitions Assessment Home CSULB Assessment Campus Accountability Committee News Assessment Resources Awards Upcoming Conferences What Is Assessment? Assessment is the systematic collection , review , and use of information about educational programs undertaken for the purpose of improving student learning and development (Palomba & Banta). Assessment is an ongoing process aimed at understanding and improving student learning. It involves making our expectations explicit and public; setting appropriate criteria and high standards for learning quality; systematically gathering, analyzing, and interpreting evidence to determine how well performance matches those expectations and
standards; and using the resulting information to document, explain, and improve performance. When we do assessment we essentially ask: What do the educational experiences of our students add up to? Can our students integrate learning from individual courses into a coherent whole? Do our students have the knowledge, skills, and values a graduate should possess? How can student learning be improved? Common Assessment Terminology A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z ASSESSMENT OF STUDENT LEARNING Assessment is a systematic process for gathering information about student learning; it answers the question, "How do we know what students are learning, and how well they are learning it?" As a process, it has five steps: 1) specify learning objectives; 2) select teaching and learning strategies; 3) gather data on student learning; 4) evaluate the data; and 5) make decisions and implement them. ASSESSMENT, FORMATIVE Formative assessment is the gathering of data on student learning during an instructional encounter. It helps the instructor to identify concepts or skills that students are not learning well, and to take steps to improve student learning while the course is still in progress. (Also called classroom assessment.) ASSESSMENT, SUMMATIVE Summative assessment is the gathering of data on student learning at the conclusion of a course, as a basis for judging student knowledge and skills. It helps the instructor to plan for the next offering of the course. ASSESSMENT TOOLS Assessment tools are the instruments used to gather data about student learning. Tools can be both quantitative and qualitative, and refer to both traditional paper-and-pencil tests, as well as to alternative forms of assessment such as oral examinations, group problem-solving, performances and demonstrations, portfolios, peer observations, and others. BENCHMARK A benchmark is an example of student performance at a given level of competence. Examples of actual student work are used to illustrate different levels of competence on a performance scale. (Also called anchors or exemplars).
CAPSTONE PROJECT A capstone is a project planned and carried out by the student during the final semester as the culmination of the educational experience. These projects typically require higher-level thinking skills, problem-solving, creative thinking, and integration of learning from various sources. COMPETENCY TEST A test intended to establish that a student has met established minimum standards of skills and knowledge and is thus eligible for an acknowledgment of achievement such as graduation, certification, etc. CRITERIA Criteria are statements about the dimensions of competency that will be assessed; they specify important components of the desired knowledge or skill that the student should learn and be able to demonstrate. For example, for oral communication, one criterion could be maintaining eye contact with the audience. COURSE-EMBEDDED ASSESSMENT Data gathering about learning that occurs as part of the course, such as tests, papers, projects, or portfolios; as opposed to data gathering that occurs outside the course, e.g., student placement testing. EVALUATION A value judgment about the results of data collected on student learning. Evaluation of student learning requires that the instructor compare data collected on student performance to a predefined outcome expectation in order to determine what the student has learned and how well. HOLISTIC ASSESSMENT Making a judgment about a student's learning by using an overall appraisal of a student's entire performance, rather than by scoring or analyzing separate dimensions of the performance individually. Used in situations where the demonstration of learning is considered to be more than the sum of its parts, and so the complete final product or performance is evaluated as a whole. The instructor matches his or
her overall impressions to pre-defined expectations for learning outcomes and makes a judgment. INDICATOR An indicator is a piece of information about the performance of a student. For example, a score on a test; the number of absences per semester; the student's GPA in the major; etc. ITEM An individual question or exercise in a test. JOURNAL A journal is a written record made by a student on a regular basis, for example, daily or weekly. It may also be called a log, notebook, diary, or progress sheet. It may be a collection of facts, an account of experiences, and/or reflective comments on facts or experiences. May be kept on paper or by computer. METHODS OF ASSESSMENT Methods of assessment are selected procedures used to gather data on student learning. These methods are selected in relation to the specified learning outcome to be assessed; the type of evidence of learning available; the type of performance to be observed; and the agreed-upon scoring procedures. Methods may involve paper-and-pencil tests; alternative methods are often referred to as performance based, authentic, or complex-generated. LEARNING OUTCOME A broad educational goal that the student is expected to achieve by the end of the course, relative to some knowledge or skill. Outcomes may be broken down into smaller and more specific learning objectives. OUTCOME-BASED EDUCATION Teaching and learning strategies are selected which will assist the student to meet the stipulated learning outcomes at an acceptable level. Students must demonstrate an acceptable level of mastery of the expected outcomes in order to be awarded educational credit. PERFORMANCE-BASED ASSESSMENT
Evidence of student achievement of the knowledge and skill components of a course is collected from students in the form of a performance or product. The process the student uses reveals as much about the student's understanding of the knowledge and ability to apply it as the final outcome. It is part of the teaching and learning process, a continuous interaction between instructor and student. It requires clear statements of expected learning outcomes and clearly articulated and communicated criteria and standards. PORTFOLIO A systematic and organized collection of a student's work that exhibits to others the direct evidence of a student's efforts, achievements, and progress over a period of time. Portfolios may include a variety of demonstrations of learning in the form of papers, projects, videos, web pages, CD-ROMs, journals, etc. RUBRIC A description of the standards that will be used to judge a student's work on each of the criteria or important dimensions of learning. It is a scoring guide that is used in subjective appraisals of student work. It makes explicit statements about the expected qualities of performance at each point on a scale or at each rank in an ordered scoring system, for example, excellent, good, fair, poor, unacceptable. SELF-ASSESSMENT A process in which a student engages in a systematic review of his or her own performance or learning, usually for the purpose of improving in the future. May involve comparison with a standard, or established criteria. Students learn to set goals and monitor their own progress toward goals. STANDARDS A description of the expected level of student performance on the important dimensions of the learning objectives specified for the course. The instructor develops the standards to describe the proficiency level that must be attained by each student. Each student's work is compared to the standard, rather than to the work of other students.
STANDARDIZED TEST A standardized test is a measure of student learning (or other ability) that has been widely used with other students. Standardized scores (e.g., mean, standard deviation, percentiles) have been developed so that a student taking the test can compare his or her score to the historical data. These are also sometimes called achievement tests. Examples are the SAT, GRE, GMAT, LSAT, MCAT, etc. Division of Academic Affairs, CSULB 1250 Bellflower Blvd - Long Beach, CA 90840 Privacy Policy | Website Credits | Feedback
Download this for printing Download presentation
Five assessment challenges created by large classes Resources on Teaching Large Classes
Assessing large classes
After a decade of rapid expansion in Australian higher education, student numbers have grown considerably in many courses and subjects, especially at the undergraduate level.
Larger class sizes pose significant teaching challenges, not least in the assessment of student learning. Perhaps most troubling, large classes may limit the amount of feedback provided to students. In response to the pressures and challenges of assessing larger groups of students, academic staff are responding through: greater attention to the communication of clear assessment criteria to students; the development and use of marking guides to be used by teaching and assessing teams; the increasing use of various forms of exemplars to guide student efforts as well as to guide marking and grading including the modelling of discipline-based thinking, writing and performance; and the continuous refinement and dissemination of assessment policy and practice in relation to large student groups. The issue of workload is central in any decisions about assessment of large classes for it is a serious one for students and staff alike. Staff teaching large student groups invariably undertake an informal, qualitative weighing-up of the efficiency of assessment tasks vis-vis their educational effectiveness. There is little doubt that establishing an effective assessment program developing criteria, guides, exemplars and models; discussing and refining them and communicating them to students and other staff will have an initial negative impact on workload for staff with coordinating responsibilities. However, this preparatory work is likely to lead to three gains. The first is a reduction in the time required for marking due to a higher quality of student submission. The second is a resolution of some of the potential issues likely when many staff are involved in marking and grading, through a streamlining of marking and grading practices. Finally, the availability of clear, transparent criteria and examples of work will contribute positively to the overall quality of teaching and learning. Five assessment challenges created by large classes The assessment of large student cohorts presents five distinct though interrelated challenges: Avoiding assessment that encourages shallow learning Providing high quality, individual feedback Fairly assessing a diverse mix of students Managing the volume of marking and coordinating the staff involved in marking Avoiding plagiarism In an effort to manage these challenges, academic staff have increasingly turned to group and on-line assessment. Carefully planned and managed group work does appear to help address many of the assessment challenges listed above. (Detailed information about
creating effective group work and group assessment is in the section Assessing group activities). Similarly, the use of appropriate on-line assessment can also help address some of the challenges of assessing large classes (for example, multiple-choice and/or short answer questions which can be automatically marked can provide feedback to students that is otherwise not possible). On-line assessment is also likely to assist, to some extent, in managing a diverse mix of students and the time required for marking. However, on-line assessment may not necessarily avoid the problems of low-level learning or plagiarism. (A more extended discussion of these issues is in the section On-line assessment). Ultimately, however, while group and on-line assessment have much to offer in dealing with the challenges of assessing large classes, neither is a panacea for all the issues inherent in assessing large classes. 1. Avoiding assessment that encourages shallow learning There is little doubt that growing class sizes encourage academic staff to focus on timeefficient assessment techniques. One unwelcome consequence of a focus on efficiency would be any tendency toward assessing learning at the lower levels of intellectual complexity; that is, assessment tasks that merely reward superficial, shallow or reproductive approaches to learning and that fail to direct students into the type of study that leads to the higher-order learning objectives of university education. Assessment methods demanding less complex analysis and synthesis than in the past, or demanding less rich forms of student response, may significantly diminish the quality of learning in higher education. Attempts to assess large numbers of students in time-efficient ways may have resulted in approaches to assessment that might not be educationally desirable. For example, in some disciplines there appears to be a growing reliance on exam-based assessment with large classes, with an increased use of multiple-choice and short-answer or tick-a-box questions. Of course, well-developed written examinations can provide a high level of validity and reliability in measurement of some types of learning. However, academic staff need to judge the appropriate proportion of assessment that should be conducted through this method alone. The efficiencies of assessing learning through exams, particularly if the marking is routine or automated, are counterbalanced by the limitations of a single method of assessment, particularly one that might not encourage the development of the full range of higher-order cognitive skills. Even at their best, many students find examinations as a sole assessment method impersonal, particularly in first year. Another response to the pressures of larger classes, often in disciplines where examinations are less commonly used, is to lower the word-length requirements on written assignments. One staff member has commented about this tendency that it is a distinct disadvantage to students, especially those going on to write 100 000 word postgraduate theses.
As with many complex issues, there are no simple answers to these and other challenges in assessing large classes. Awareness of the limitations and possible negative consequences for the quality of student learning of particular approaches to assessment tasks is crucial, as this is likely to guide assessment-related decisions toward compromises that reflect both efficiency and educational effectiveness. The employment of less frequent and where possible, cumulative summative tasks with more formative feedback that guides student efforts on the next task might be useful in some circumstances. 2. Providing high quality, individual feedback that guides student learning Timely, individual feedback is central to guiding learning. But to provide such feedback to hundreds of students simultaneously within a timeframe that ensures such feedback can be incorporated into student learning is a daunting prospect. Students appreciate detail in the feedback they receive to identify weaknesses and to understand how they might improve future efforts. The structure of the overall assessment regime is therefore critical. If feedback is given on an early assessment task but later assessment tasks within the same subject offer little or no opportunity to incorporate learning from this feedback, students are likely to feel disadvantaged. Timing of feedback is also critical. There is little point, from a student point of view, in receiving feedback at the end of a subject when there may be no opportunity to apply the improved understanding. One approach to providing feedback for large students groups is to use on-line assessment item banks with marking provided either automatically or by a graduate assistant or tutor. While this might be a time and resource efficient method and appropriate in some circumstances, there is one significant limitation in terms of feedback: under such an arrangement teaching staff will receive little if any direct feedback themselves about students levels of understanding. In addition, students often find automated or anonymous marking impersonal and prefer more personal interaction with their teachers, even if this interaction is limited to written communication in the form of comments and/or grades. Notwithstanding these issues, the following suggestions might provide assistance for staff teaching large groups of students and who are looking for ways to provide formative feedback to them: Assess early in the semester this gives time for feedback and possible improvement Provide students with marking criteria prior to their undertaking the assignment to guide progress and help develop independent learning skills Prepare a list of the most common or typical problems in assignment submissions and/or exam responses along with explanations/model answers: publish a single sheet containing these on the subject homepage prepare and make available multiple copies of an audiotape detailing these provide brief, general feedback on these verbally to students as a group in lectures/tutorials
Use a standardised feedback sheet that incorporates the stated criteria Where possible and appropriate, use on-line tutors Use on-line discussion boards with a framework and initial model for discussions so students can assist each other with assignments be clear about how collaboration, collusion and copying differ Use on-line products that provide hints/help and feedback on student attempts at problem-solving, answering quiz questions and other assignment tasks Use a website/subject homepage to provide basic information and FAQs and answers related to assessment After using and marking multiple-choice tests, provide students with written rationale and explanation for correct or high scoring answers and/or resources for further reading 3. Fairly assessing a diverse mix of students Generally speaking, larger classes mean a more diverse and complex student mix. Diversity in educational background and ability is particularly significant in larger classes partly because of the critical mass of differences. The issue of varying levels of student ability or readiness and that of marking workload in large classes are closely related. Sometimes large classes are used to teach service or compulsory subjects to students from a wide range of courses. In these situations, student diversity in backgrounds, prerequisite knowledge, expectations and level of interest in the subject matter can be profound. Some suggestions: Require first year students to undertake a foundation unit already compulsory in some universities to develop necessary academic/study skills and/or skills to successfully undertake assessment tasks Early in the semester, briefly survey students about their prior knowledge and expectations to identify possible issues that may adversely affect assessment Set an early hurdle task where students at risk of failing written assessments are identified and offered assistance from the university learning support/development centre Organise the provision of support tutorials supplementary workshops for essay writing or other necessary assessment-related skills from the appropriate university service Ensure the provision of English-language assistance from the appropriate university service for students who need such help Where possible in assessment tasks (assignments or exams), ask students to consider how concepts relate to their discipline/vocational area (i.e. accept more than one right answer) Assign students to tutorials on the basis of their discipline/course, rather than randomly the focus of these smaller classes are then more likely to be aligned with their interests Ensure that tutorials follow lectures (rather than vice versa) and that assessment-related issues are discussed and addressed in detail in these smaller groups Develop variations in the assessment tasks that target the discipline background of the different sub-groups of students.
4. Managing the volume of marking and coordinating the staff involved in marking The time required for the sheer volume of marking for large student groups can be significant. However, some steps can be taken to optimise the use of staff time. As discussed in the section on the complex student mix, it is useful where possible to develop student skills and understanding related to the assessment requirements prior to their undertaking assessment tasks in order to lessen the marking workload associated with poor quality submissions. Other strategies likely to be helpful include: providing clear marking criteria to students making past exam papers and model answers readily available providing exemplars of various levels of work (Below acceptable through to High Distinction or equivalent) to illustrate the differences for students For written assessment (assignments or essay-based exams): modelling in, for example, critical analysis, essay writing and use of appropriate style and format Directing all students to resources and support for academic/study skills (including printed and on-line resources, workshops and individual tuition) and articulating an expectation that they will be used by students Other strategies that might be helpful in optimising the task of marking include: On-line, computer-based or web-based exams or tests see On-line assessment Developing joint assessment with another subject in the course this may help to link concepts and develop coherence as well as lessen the load. A common response to larger class sizes is the employment of sessional staff to assist with teaching and assessment. While at one level this trend might appear to resolve the issue of marking for academic staff with the overall responsibility for subjects, it also brings a new set of issues associated with the coordination, training and support of a subject team. There are well-known problems associated with the use of teams of sessional staff, especially if they are inexperienced teachers, including disparate understandings of assessment requirements, differences in the level of experience of marking, and a lack of consistency in methods of marking and grading practices. Some of these problems can be reduced or eliminated through the following suggestions: Provide paid initial training in assessment for new staff Provide paid professional development in the area of assessment for all staff Provide consistent criteria to all staff involved in marking Ensure the marking criteria are understood by all staff Provide model answers, including examples of very good, moderate and poor assignments/exam answers Provide marking guides Ask all staff to use a standardised feedback sheet incorporating stated criteria Ensure avenues of clear communication between staff are in place Provide assessment mentoring for inexperienced markers
Hold weekly paid meetings for sessional staff to discuss assessment-related issues Make participation in assessment training, professional development and/or meetings a condition of employment for sessional staff and pay them for attendance Require sessional staff to attend 10-15 minutes of a lecture in which assignment requirements are discussed so everyone hears the same information Use moderation if necessary 5. Avoiding plagiarism There is a general perception that the likelihood of plagiarism is exacerbated by large classes. If this is the case, one reason students may deliberately cheat in a large class is because they may feel somewhat anonymous and lost in the crowd and therefore believe they are less likely to be caught. Alternatively, if students in large classes plagiarise unintentionally this might be as a result of having limited or no opportunity to check referencing and/or collaboration conventions with a lecturer or tutor. A lack of clear understanding of assessment requirements is particularly an issue for some first year and many international students, for whom higher education referencing and collaboration rules are unfamiliar. The key to minimising plagiarism in large classes is in the design of assessment tasks. For suggestions, see the section Minimising Plagiarism and the 36 strategies that can be considered.
Resources on Teaching Large Classes An Australian Universities Teaching Committee Project: Identifying and supporting effective methods of enhancing learning teaching large classes managed by the University of Queensland has developed suggestions, help and resources related to the teaching of large classes, including assessment issues. See www.tedi.uq.edu.au/largeclasses, in particular the sections Teaching and Assessment in Large Classes and Large Classes Across the Disciplines. Back to Top
This work is copyright. It may be reproduced in whole or in part for study or training purposes subject to the inclusion of the source and no commercial usage or sale. Reproduction for purposes other than those indicated above, require the written permission from the Commonwealth available through AusInfo. Copyright 2002, AUTC.

Case Study

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Case Study

Încărcat de

Drepturi de autor:

Formate disponibile

The Challenges of Assessing Young Children Appropriately LORRIE SHEPARD is a professor of education at the University of Colorado, Boulder.

........ assessment report

Assessing Young Children

. .. . .. . ASSESSMENT REPORT Assessing Young Children

Assessing Young Children

The Challenge of Early Childhood Assessment

. .. . .. . ASSESSMENT REPORT Assessing Young Children

NAEYC Position Statement on Early Childhood Assessment (1987)

. .. . .. . ASSESSMENT REPORT Assessing Young Children

National Education Goals Panel on Early Childhood Assessment (1998)

IRA / NAEYC Position Statement on Reading and Writing (1998)

. .. . .. . ASSESSMENT REPORT Assessing Young Children

National Research Council (1999)

NAEYC / NAECS / SDE Position Statement on Early Childhood (2003)

. .. . .. . ASSESSMENT REPORT Assessing Young Children

A Quality Early Childhood Assessment and Accountability System

. .. . .. . ASSESSMENT REPORT Assessing Young Children

. .. . .. . ASSESSMENT REPORT Assessing Young Children

. .. . .. . ASSESSMENT REPORT Assessing Young Children

San Antonio, TX 78259, 1-800-211-8378, 1-877-576-1816 (fax), http://www.pearsonassess.com

(webmaster@cse.ucla.edu) concerning this

Download this for printing Download presentation

Assessing large classes

S-ar putea să vă placă și