Sunteți pe pagina 1din 26

2015 ISTEP+ Review

Findings and Recommendations on Testing Time

2/25/2015
Edward Roeber
William Auty

EXECUTIVE SUMMARY
The Indiana Department of Administration, on behalf of the Governor of Indiana, contracted with us to
investigate the issue of testing time for the 2015 ISTEP+ assessments in Indiana. Although limited time was
available for this review, we were able to make several short-term recommendations regarding how testing
time for the 2015 ISTEP+ could be reduced. The purpose of this review was not to determine the causes for
the proposed testing time, nor to affix blame. Time was too short and we were not sufficiently versed in the
history of the events to engage in such discussions. This review of short-term issues was conducted in two
days so that our recommendations would be timely given that testing was about to begin and any changes
would have to be communicated to school corporations quickly.

RECOMMENDATIONS FOR IMMEDIATE IMPLEMENTATION


Recommendation 1: The Department should not release the open-ended (OE) items used in the 2015
ISTEP+ and the 2016 ISTEP+ programs. Instead, we recommend the releases of example items that are highly
similar to the OE items. We recommend the OE item release policy be restored once the state has a sufficient
pool of items for use in the assessment in the future. It is our hope that this will be for the 2016 ISTEP+
program, but that decision should await analysis of the results of the 2015 administration to determine if the
item pool is large enough to build assessments for 2016, 2017 and beyond.
Recommendation 2: IDOE should administer some parts of the 2015 ISTEP+ to only a sample of students
being tested this year.
Recommendation 3: The Social Studies portion of the test should be suspended for one year.
Recommendation 4: IDOE should identify now which 2015 ISTEP+ mathematics and ELA test items best
align to the Indiana standards.
Recommendation 5: IDOE should identify the assessment design and anticipated testing time for the 2016
ISTEP+ program and release this information publicly this spring to demonstrate that the testing time issues
this year are a one-time event.
Recommendation 6: We recommend that vertical scaling items be removed from the 2015 online
assessments.
In addition to our recommendations for immediate implementation, we also made longer-term
recommendations regarding the operation of the ISTEP+ program by the Indiana Department of Education
with support and oversight from the Indiana State Board of Education. The focus of these recommendations is
to improve the ISTEP+ program in 2016 and beyond. These recommendations are based on best practices in
1|P a g e

measurement and large-scale assessment. For each recommendation, we also provided the rationale and a
summary of the additional work needed to implement these recommendations.
We made recommendations in five areas:

Determining Test Length for 2016 and Beyond

Technical Assistance (Both TAC and Operational Support)

Test Blueprint as a Planning and Communication Document

Transition Planning

Improving Agency Communication

RECOMMENDATIONS FOR LONG-TERM ISTEP+ QUALITY AND EFFICIENCY


Recommendation 7: Based on the results of 2015 tests, IDOE should investigate the feasibility of shortening
the ISTEP+ tests in 2016 and beyond.
Recommendation 8: We recommend that Indiana establish a technical advisory committee that includes
individuals who have specific expertise to provide technical advice to the SBOE and IDOE. We also
recommend that IDOE establish a standing Indiana assessment advisory committee.
Recommendation 9: We recommend that IDOE develop test specifications and blueprint documents for the
2015 and 2016 versions of ISTEP+ as soon as possible.
Recommendation 10: We recommend that Indiana (the SBOE and IDOE) seek external assistance to guide
the transition of the ISTEP+ and other assessment components, should the state select new vendors for any of
its assessment components.
Recommendation 11: We recommend that the SBOE and IDOE review inter-agency communication, both at
the state level and with local school corporations and that both agencies commit to improvement being made
to ensure the best possible assessment system for students, educators, parents and citizens of Indiana.

SUMMARY
We commend the state for tackling this thorny issue and working together to resolve it. We believe that if
these recommendations are followed, testing time can be reduced to more manageable levels. We also believe
that implementing our long-term recommendations will improve the design and implementation of the
ISTEP+ program in the future. We remain willing to assist in and perhaps monitor efforts to implement these
recommendations.
Edward Roeber
William Auty
2|P a g e

The Indiana Department of Administration employed us to conduct a review of the ISTEP+ program,
particularly the testing time issue. Our report is divided into four sections:

Recommendations for Immediate Implementation

Recommendations for Long-Term ISTEP+ Quality and Efficiency

Additional Observations

Appendices

RECOMMENDATIONS FOR IMMEDIATE IMPLEMENTATION


The Indiana Department of Administration, on behalf of the Governor of Indiana, contracted with us to
investigate the issue of testing time for the 2015 ISTEP+ assessments in Indiana. Although limited time was
available for this review, we were able to make several short-term recommendations regarding how testing
time for the 2015 ISTEP+ could be reduced. The purpose of this review was not to determine the causes for
the proposed testing time, nor to affix blame. Time was too short and we were not sufficiently versed in the
history of the events to engage in such discussions. This review of short-term issues was conducted in two
days so that our recommendations would be timely given that testing was about to begin and any changes
would have to be communicated to school corporations quickly.
The Indiana Department of Education (IDOE) and its contractor, CTB-McGraw-Hill (CTB), have been
forthcoming and helpful in this review process. This has included providing considerable information and
virtual and in-person meetings.
We based our review on the four principles listed below. We came to several conclusions and made several
recommendations and identified additional work will be needed to implement these recommendations. This
section provides a summary of the short-term review.

REVIEW PRINCIPLES
Four principles guided our review of the testing time issue in Indiana. These are:
1.

The results of 2015 tests should be sufficiently reliable and valid to enable the intended purposes of
the assessment program to be achieved. The assessment design and implementation should also
meet state and Federal standards for assessment and accountability. Professional judgment is
required to determine whether a test is sufficiently reliable or valid.

2.

Changes made in the 2015 program should not unduly impact the 2016 program, since it is essential
that this years issues not continue next year and beyond.

3.

Our recommendations should not be overly-prescriptive. IDOE has the responsibility for and
understanding of the details of ISTEP+ design and implementation. What we are proposing are

3|P a g e

parameters for how the testing time could be reduced. We expect that IDOE and its contractors will
use these guidelines to effect the suggested changes.
4.

We are willing to continue to assist the Department as it implements these recommendations.

FINDINGS
Although our time to reach conclusions and make recommendations was short, we were able to determine
several things. These are:

Testing times in excess of 12 hours were scheduled for the mathematics, English language arts,
science and social studies tests in ISTEP+.

We believe that it is unnecessary to require young childrenindeed, any studentsto take an


assessment of 12 hours in length.

The mathematics test contributes about 4 hours of this time and the ELA test contributes over 8
hours of testing time. Thus, we found the ELA tests is the real issue, although we think steps should
be taken to reduce the length of both the mathematics and ELA tests. Major contributors to the extra
testing time in the English language arts assessment are the policies on the release of open-end or
constructed-response items.

States and the state assessment consortia across the country are adding significant testing times to
their programs, due to more comprehensive standards and the increased use of performance
assessments to better gauge student achievement. Even reduced ISTEP+ tests may be longer than
those used in the past.

The lack of previously pilot-tested or field-test items requires the use of more items than normal this
year, to make sure that there is a set of items for producing this years test information.

We believe that the item-overage levels being used are not excessive (in the 50% range), given the
structure of the tests and the nature of the intended score reports.

We believe that there are ways the Department can reduce testing time and accomplish its
assessment purposes, as explained below.

RECOMMENDATIONS FOR IMMEDIATE IMPLEMENTATION


Based on the principles cited above and recognizing the findings listed, we made the following
recommendations for actions to reduce testing this year. These were presented to the State Board of
Education during their emergency meeting on February 13, 2015.
Recommendation 1: The Department should not release the open-ended (OE) items used in the 2015
ISTEP+ and the 2016 ISTEP+ programs. Instead, we recommend the releases of example items that are highly
similar to the OE items. We recommend the OE item release policy be restored once the state has a sufficient
pool of items for use in the assessment in the future. It is our hope that this will be for the 2016 ISTEP+

4|P a g e

program, but that decision should await analysis of the results of the 2015 administration to determine if the
item pool is large enough to build assessments for 2016, 2017 and beyond.
Rationale: Our preliminary review indicates that the greatest contributor to the increased testing time is the
need to operationally administer and release enough Part 1 OE items in 2015 as well as pilot items for
comparable 2016 tests. We recognize that educators in Indiana rely on released items to guide instruction.
We would therefore include the recommendation that high-quality example items be produced this spring
and released publicly when the results are reported. This policy should be used for 2015 and for the 2016
assessment, unless the item pool is robust enough to permit the release of OE items in 2016. This needs to be
determined after the 2015 OE items have been analyzed, making sure that there are enough items for use in
2016 as well as 2017.
Work to be done: IDOE and the State Board of Education (SBOE) should determine what changes would be
required to implement this recommendation. How many items are needed to build this years final form and
the comparable forms next year, if items are reused that worked this year?

Recommendation 2: IDOE should administer some parts of the 2015 ISTEP+ to only a sample of students
being tested this year.
Rationale: Another significant contributor to the test length is the requirement that all students take all
items (which is the major down-side of an operational field test), including items that will be used in future
testing. This is because none of the items in this years assessment have been used previously so the state
needs a pool of items from which to construct the final test for 2015 and to build the tests for 2016 and
beyond.
We recommend that items in test sessions be identified as core or sample items. All students would take
the core items and half would take each set of sample items. This is a standard testing method called matrix
sampling that has been used in Indiana in the past. Field testing of items usually can be done with many
fewer student responses.
The easiest place to implement this recommendation is the Part 1 OE assessments, where currently there are
two parallel forms in the testing for all students. By giving each student only one of these sets, an estimated 3
hours and 5 minutes testing (two days/four test sessions) can be eliminated, reducing testing time for both
the mathematics and ELA assessments.
One way to do this most easily would be for IDOE to designate two comparable groups of school corporations
or schools and then direct each group to take the appropriate sessions. The Department could use a list of all
schools across the state to designate the odd schools on the list to take some sessions, while the even

5|P a g e

schools on the list to take the other sessions. This procedure will require detailed communications to the
school corporations as soon as possible.
Work to be done: IDOE should designate the Part 1 OE assessment sessions for sampling, with half of the
schools administered half of the Part 1 sessions. The Indiana legislature and appropriate agencies should
change state policies/regulations on the release of all of OE items for two years, unless the OE item pool is
large enough to release the 2016 OE items. IDOE should communicate to schools any needed changes in test
administration procedures as soon as possible.

Recommendation 3: The Social Studies portion of the test should be suspended for one year.
Rationale: Since these tests are not required by NCLB, nor apparently used in school accountability, this
change will reduce testing time by 75 minutes (one day and two sessions) for student in grades 5 and 7. An
option that the state might want to explore is to permit schools to use the social studies test on a voluntary
basis, rather than suspending this assessment in its entirety. This would permit those who are very interested
in the assessment to still use them. This is feasible because these items are contained in printed tests already
in the schools in Indiana.
Work to be done: IDOE and SBOE should determine what changes in policy or regulation are required to
implement this recommendation as soon as possible. If required, they should also seek legislative authority to
implement this recommendation.

Recommendation 4: IDOE should identify now which 2015 ISTEP+ mathematics and ELA test items best
align to the Indiana standards.
Rationale: The Department and its contractor CTB-McGraw-Hill should now identify the core mathematics
and English language arts assessment within the 2015 ISTEP+ tests. We recommend that the core set of items
be identified from those used in the 2015 program so that if they work, the state can be assured that the set of
items work together and are aligned to Indianas standards, thus meeting one of the key Federal peer review
criteria. If any of these items do not work, then IDOE can replace these items from the overage that did work
well.
Work to be done: IDOE should be identifying the intended assessment now, in order both to guide the
analysis of the field-tested items later this spring, as well as to inform the likely test length for 2016 and
beyond. IDOE staff and its contractor should select the set of operational field test items for use as the actual
assessment, assuming that the items work. This can be used to assure that this assessment is aligned to the
Indiana standards.
6|P a g e

Recommendation 5: IDOE should identify the assessment design and anticipated testing time for the 2016
ISTEP+ program and release the information publicly this spring to demonstrate that the testing time issues
this year are a one-time event.
Rationale: Because we hope that the testing time issue for 2015 assessment is a one-time phenomenon, we
suggest IDOE verify this by producing an assessment design for the 2016 assessment program. In this design,
the use of core and matrix sampling by Part and Test Session should be illustrated, along with the number of
items of each type and the testing time by Part and Session. We think that IDOE should announce the
parameters for the 2016 program soon so as to assure local educators that 2015 is a one-time only event.
Work to be done: IDOE should describe the 2016 assessments so as to show the number of Parts, Sessions,
assessment items and testing times. IDOE should also indicate how matrix sampling will be implemented in
both the paper/pencil tests and the online assessments for 2016 and beyond. This information should be
released by this spring. This will serve to illustrate to educators, parents and other members of the public that
the testing time issue is a one-time event, limited to the 205 assessment.

Recommendation 6: We recommend that vertical scaling items be removed from the 2015 online
assessments.
Rationale: There are other options for calculating growth in 2015 and the vertical scale could be constructed
in 2016. However, each student takes only 5 items for vertical scaling and the testing time would be reduced
by only a few minutes, which is not a significant reduction.
Work to be done: IDOE should determine what changes would be required to implement this
recommendation.

7|P a g e

RECOMMENDATIONS FOR LONG-TERM ISTEP+ QUALITY AND EFFICIENCY


In addition to our recommendations for immediate implementation, we also make longer-term
recommendations regarding the operation of the ISTEP+ program by the Indiana Department of Education
with support and oversight from the Indiana State Board of Education. The focus of these recommendations is
to improve the ISTEP+ program in 2016 and beyond. These recommendations are based on best practices in
measurement and large-scale assessment. For each recommendation, we provide a rationale and a summary
of the additional work needed to implement these recommendations.
We identified recommendations in five areas:

Determining Test Length for 2016 and Beyond

External Assistance (TAC, Operational Support, Statewide Advice and Feedback)

Test Blueprint as a Planning and Communication Document

Transition Planning

Improving Agency Communication

DETERMINING TEST LENGTH FOR 2016 AND BEYOND


Recommendation 7: Based on the results of 2015 tests, IDOE should investigate the feasibility of shortening
the ISTEP+ tests in 2016 and beyond.
Rationale: Test length is a complicated issue that will inevitably be resolved as a compromise between
competing demands. Test reliability is directly related to test length: the longer a test, the more reliable it will
be. However, student learning is directly related to instructional time on task: the more instruction a student
receives, the more learning occurs. Testing and instruction come out of the same time in school. Therefore,
test designers must balance these and other interests.
ISTEP+ is newly revised for 2015. The performance of students on its new item types is not known at this
time. When those items have been calibrated and we know how long students spent responding to them this
year, it will be possible to predict the reliability of shorter versions of the test. If a shorter test can produce
sufficiently reliable results, testing time can be reduced. Many states administer shorter tests to students in
grades 3 and 4 than to students in upper grades. This approach may also work for Indiana.
When creating this assessment design, IDOE should consider strategies to trim assessment times even
further, looking especially at the ELA assessment, since this assessment is currently the longest assessment
component. The number of reading passages, items and writing prompts should be carefully determined and
a strong rationale for those numbers created. Since a separate reading score is not reported in Indiana, it may
be the case that the number of reading passages and items used to report by ELA standard can be reduced.
8|P a g e

Attention should be paid to the mathematics test, since it is long as well. The goal should be to produce a solid
assessment of ELA and mathematics with the fewest items possible and with a number of embedded field test
items, each administered to small samples of students.
Another possibility is to develop computer adaptive tests (CAT). These are tests in which a computer is
programed to customize the test by selecting items for each student based on their answers to previous
questions. By selecting items that are optimally informative of the students ability, the test can be shorter
than a test designed for all students in the state. However, CAT assessments require a larger item pool and
thus may not work for Indiana in the short term.
Work to be done: IDOE must decide who can do the analysis. The current contractor, the new contractor
and an independent consultant are all possibilities. Whoever does the analyses, the results must be ready
quickly so that the Department has time to evaluate the advantages and disadvantages of a shortened test.
The review and decision-making process that begins after the psychometric analysis is complete should not
be short-changed. Test length is inevitably a compromise of conflicting interests, so we cannot expect the
technical analysis to answer the question fully. We suggest that a decision-making group be identified and
their work scheduled in advance such that they have time to deliberate and make decisions before 2016
forms are built.

EXTERNAL ASSISTANCE
Recommendation 8: We recommend that Indiana establish a technical advisory committee that includes
individuals who have specific expertise to provide technical advice to the SBOE and IDOE. We also
recommend that IDOE establish a standing Indiana assessment advisory committee.
Rationale: The field of large-scale assessment is advancing rapidly in response to increased demands on
assessment systems to support school and educator accountability as well as instruction of more rigorous
and comprehensive academic standards. Most states find it unrealistic to fund staff positions to obtain all the
requisite expertise. Most states obtain technical expertise through a Technical Advisory Committee (TAC).
Generally such committees are composed of members with a variety of backgrounds and experiences who are
chosen to support the specific assessments the state is administering or developing. An advantage of TACs
lies in their independence. They do not have the financial interest of a testing contractor and they should have
no political allegiance to any state agency. Therefore, their advice for resolving a testing issue or opinion on a
proposal would be more valued by both supporters and critics of the state's assessment system.
Another option is for the agency responsible for developing and administering the assessment to contract for
psychometric services on an as-needed basis. This makes sense during intense periods of development or
transition when day-to-day interactions with agency staff and the test contractor are required. The

9|P a g e

consultant(s) can represent the state's interests in ensuring the assessment is designed or administered as
correctly and efficiently as possible.
IDOE should also establish and maintain an assessment advisory committee comprised of representatives of
various educational and other organizations with a strong interest in ISTEP+. This includes representatives of
teacher, parent, administrator, policymakers, business leaders and others. The committee should serve to
facilitate two-way communication between the IDOE and the groups they represent. IDOE should use the
group not only as a means of communicating with the parties with an interest in assessment, but also as a
sounding board for receiving feedback on new ideas and new designs for assessment. This group should
review any major proposed changes and provide its input to both IDOE and the State Board of Education as
the state considers proposed changes. This should be an official part of the charge to the group from IDOE,
with support of the SBOE.
Work to be done: We suggest that establishing a TAC be a priority. The first step in doing that is to
determine where responsibility for hiring and convening the TAC should lie. In Indiana, this would most likely
be the SBOE or IDOE. (A less common option is used in Kentucky, where the legislature convenes the TAC.) In
most states, the agency responsible for administering the assessment convenes that TAC. Since the SBOE has
an oversight role with the Department, it might also work for them to convene the TAC.
Once that decision is made, the process of identifying desired areas of expertise and finding qualified TAC
members can begin. Since there are key decisions to be made regarding the design of the 2016 assessment
and the possible transition to a new contractor or contractors, we suggest an aggressive schedule that will
allow the first meeting of the new TAC to occur by late spring or early summer.
The TAC should be comprised of individuals with psychometric, statistical, practical assessment backgrounds
and include one or more individuals with a background of working with students with disabilities and English
language learners. There are a number of persons who have focused on assisting state assessment programs
successfully carry out the technical work that underlies programs such as ISTEP+. These are individuals who
are or have been employed at the university level or who are or have been working in assessment-related
organizations.
The size of most state assessment TACs is 5 or 6 individuals. They typically meet 3 or 4 times per year, usually
for 1 to 2 days. Such a group might meet more often during times of new assessment design work and less
often when the program work is being conducted successfully.
TACs usually review assessment plans, the procedures used by states to implement these plans and reviews
the results of the work of each contractor. The goal is to provide an independent technical overview of the
work of the contractor(s) and the Department.

10 | P a g e

The assessment advisory committee should be comprised of individuals nominated by various education and
education-related organizations and individuals with a strong interest in assessment in Indiana These may
include teachers, administrators (building and school corporation levels), school boards, subject-matter
organizations (mathematics, ELA, science and social studies), universities, parents, students and business
groups.

TEST BLUEPRINT AS A P LANNING AND COMMUNICATION DOCUMENT


Recommendation 9: We recommend that IDOE develop test specifications and blueprint documents for the
2015 and 2016 versions of ISTEP+ as soon as possible.
Rationale: A test blueprint is an essential part of test design. It can also be an effective tool for
communicating the intent, qualities and interpretations of a test to educators and to more general audiences.
Conversely, the lack of test blueprint can lead contractors to rely on oral communication or scattered
documents to guide the design. Also, stakeholders who are not directly involved in the development can
become confused or suspicious about the test that is produced.
We've attached a comprehensive test specification and test blueprint document published by Oregon as an
example of what such a document could look like (Appendix A). Some information in that document is specific
to Oregon's adaptive online assessment, so would not apply to Indiana. However, other sections would be
helpful in avoiding the issues that arose around this year's testing. Note that the Introduction and Background
sections provide an overview of the assessment system. The Score Reporting Category section provides the
link between the content standards and the scores produced by the test. The largest section is the Content
Standards Map. Here, the specific content standards and strands that are assessed in each reporting category
are described in detail. Note the Boundaries of Assessable Content and Sample Items. This information
provides clear guidance to item writers and also communicates to teachers precisely how the content will be
assessed. The Test Blueprint section includes the weighting chart that is Indiana's current blueprint and there
is additional information about item specifications, content coverage as well as the Achievement Level
Descriptors, which, in Indiana, will describe how performance on the test relates to college and career
readiness.
Oregon's example is not the only model for a test blueprint, nor is it reasonable to create a document as
comprehensive as this one right away. It is provided as an example of how the test blueprint can serve as a
communication tool for a variety of audiences.
Work to be done: As a first step, the design of the 2015 operational test should be documented in as much
detail as possible. Using information provided by CTB, we have put together a table showing the number and
types of items by grade that will be used to report results this spring (Appendix B). The points generated by
11 | P a g e

each item type and the total points are also included. This is important because the number of items alone
does not describe a test. Items vary in the number of points they generate for scoring. Simple multiple-choice
items that are scored as correct or incorrect generate one point. Complex items like the writing prompt
generate 10 points. It is the points that determine the weight of items in a test score. Therefore, the percent of
points in each reporting category is used to verify that the test covers the content as intended. (These
percentages do match the existing blueprint weighting charts.)
We suggest that IDOE develop a similar chart for the 2016 test as soon as possible. This chart will be
important to the new contractor to use to construct next year's tests. It can also be helpful to communicate to
educators what to expect next year, particularly what will be different next year from this year's experience.
Testing time is not included in the charts, but such information should be added to the tables or listed in
accompanying tables.
When developing this chart, we also recommend that IDOE and its contractor carefully examine the number
of reading passages and items, as well as OE writing prompts used. This could serve to shorten the ELA
assessment and yet yield reliable score information on the ELA content standards. The Department should
also revisit the reporting categories and their weighting. Currently, there is a separate reporting category for
Reading Vocabulary that is weighted at 3% - 13%. If the goals of the assessment can be met by including the
vocabulary content in the other reporting categories, a highly reliable and significantly shorter test could be
produced.
We suggest that IDOE direct the new contractor to develop a more comprehensive test specifications and
blueprint document. The goal would be to have a preliminary version ready for distribution by late fall this
year so that it would supplement other Department communications about the 2016 ISTEP+.

TRANSITION P LANNING
Recommendation 10: We recommend that Indiana (the SBOE and IDOE) seek external assistance to guide
the transition of the ISTEP+ and other assessment components, should the state select new vendors for any of
its assessment components.
Rationale: The transition from one vendor for an assessment program component to another one is a
significant event in the operation of the assessment program. There are a myriad of details that the
current/outgoing contractor has been successfully handling, all of which need to be transferred to the
new/in-coming vendor. In addition, there are scoring routines, statistical analyses and various analysis and
reporting programs that the incoming vendor needs to replicate in order to provide seamless reporting of
current and prior assessment results at the student, classroom, school and school corporation levels. In

12 | P a g e

addition, it is not uncommon for the outgoing vendor to slowly lose interest in the successful transition, since
they are not implementing the assessment program in the future.
Many state agency staff involved in these transition activities are not experienced in how to successfully
transition their assessment program from the outgoing to the incoming contractors. Making matters worse,
these staff is occupied more than full time with making sure that the current year assessment activities
proceed flawlessly and do not have the time to fully attend to the transition activities. The result is that
necessary transition activity may not occur or may not occur as needed or not be carried out error-free. The
net result is that the initial year of work of the new contractor may not be carried out on a timely basis, such
that tests are not available when needed, testing may be delayed, analyses are not carried out accurately and
test results may contain errors or not be produced when needed.
The solution that states with small assessment staff that may not have the time or experience in transitioning
assessment programs is for the agency to hire individuals or organizations experienced in successful
assessment program transition. These persons or organization can be tasked with securing the needed
information from the outgoing contractor, providing these resources to the incoming contractor, assuring
that the new contractor successfully incorporates these into its operational assessment systems. The
transition specialists can also serve as shuttle diplomats between the two vendors to assure that
information is provided as needed by the new contractor(s).
Work to be done: Once IN has determined which contractor(s) it will use going forward (and resolved any
disputes or challenges to these awards), the state should seriously consider who it will use as the transition
specialist(s). While this person(s) or organizations will cost the state, the avoidance of the typical transition
issues can be priceless.

IMPROVING AGENCY COMMUNICATION


Recommendation 11: We recommend that the SBOE and IDOE review inter-agency communication, both at
the state level and with local school corporations and that both agencies commit to improvement being made
to ensure the best possible assessment system for students, educators, parents and citizens of Indiana.
Rationale: While we did not investigate the causes of the widespread concerns about test length, it is likely
that better communication would have reduced the problems. Effective communication is difficult in all
complex organizations. Effective leaders and project managers constantly strive to improve communication.
Therefore any actions to avoid future problems with the states assessment system should include efforts to
better coordinate agency planning, decision-making and implementation of those decisions.

13 | P a g e

Work to be done: It is always difficult to find time to review the communication effectiveness of public
agencies. However despite that fact, the SBOE and IDOE should establish a process of review and implement
any improvements identified during that process. One way in which this could be accomplished is to share
accountability for advisory groups. For example, the IDOE could convene the TAC and both the IDOE and the
SBOE could identify issues or questions to be discussed. The in-state assessment advisory committee could be
managed by IDOE and the committee could be required to send members to report to the SBOE on a
scheduled or as-needed basis.
One way to enhance communication with the field, including school corporation boards, administrators and
other educators, is for the IDOE to communicate frequently with those affected by the assessment program.
The Michigan assessment staff, for example, publishes a twice-monthly electronic newsletter and sends it to
anyone who signs up for its listservs (assessment coordinators, principals, curriculum specialists, etc.).
Program changes, new procedures and updates on each assessment component are handled in this manner.
This is in addition to ongoing superintendents letters, which serve as the official communication method
between MDE and its local school districts.
Teachers can be a challenging group to communicate with directly, if the state does not maintain the names
and physical or email addresses of teachers in the state. Thus, states need to rely on administrators to provide
information to its teachers. One way to facilitate communication in this instance is for the state to provide
information (e.g., a teacher newsletter) to administrators and for them to provide it to the teachers in their
districts.
Similarly, parents are an important and challenging audience to communicate with directly. As with the
suggestion for teachers above, IDOE might develop communication materials (e.g., flyers or newsletters) that
administrators and teachers can use to communicate with parents. IDOEs and the SBOEs communication
offices could consider developing a communication plan on assessment and then prepare the press releases
and the online communication pieces to be used as the communication plan is implemented.
It may be most helpful if the IDOE and SBOE staff work to create a comprehensive and coordinated
assessment communication plan. This plan could include a careful consideration of target audiences,
information needed by each audience, the mechanisms to be used to communicate with each group, the
resources (print, video, online, etc.) needed for each group and when the information needs to be provided.
Such a plan should address both what each member of the target audience (e.g., a building-level
administrator) needs to know and understand about assessment, but also the resources that he or she needs
to communicate to others (e.g., teachers, parents and others in the school community). The latter resources
are important because they help the state use these individuals as secondary communicators, and make it
easier for them to pass along information without being an expert on assessment.

14 | P a g e

Note that the new ISTEP+ is being driven by the new Indiana Academic Standards. IDOE should be
coordinating explanations of the new assessments with ongoing support for teachers who are implementing
the Academic Standards.

15 | P a g e

ADDITIONAL OBSERVATIONS
In the course of our review of the ISTEP+ program testing time issue, other ideas, thoughts and concerns
occurred to us. We raise these with the intent that they allow the state to identify and avoid other issues that
may affect the program in the future.
1. The re-use of previously developed and used, as well as previously developed but not used items,
apparently was not examined.
When an assessment program transitions to measuring a new or revised set of standards, it is customary to
consider whether some or all of the items that already existpreviously used or notmight be used to
measure at least parts of the new tests. Sometimes these existing items can continue to be used, while in
other instances, the change in standards is such that the items are not aligned to the new standards and are of
no use for the new assessment. It is advantageous to use such items because not only has the state already
paid for their development, the items are of proven quality and should require less extensive field testing for
use in future versions of the ISTEP+ instruments. We were puzzled by the apparent decision not to at least
examine their suitability to measure the new Indiana standards.
Assessment staff could convene content specialists (university content specialists, curriculum specialists from
school corporations and classroom teachers) under secure conditions to review the item pools for possible
alignment to the new Indiana standards, either at the grade that the items were originally written to, or at a
higher or lower grade. Reviewers could identify items that match a standard strongly, as well as recommend
how the connection of an item to a standard could be strengthened. This may have permitted existing items
that Indiana already owns to be used in the new assessments, potentially reducing the amount of operational
field testing necessary.
We were not given evidence that such a review was done, but we believe that it should have been carried out.
And if it hasnt, it may still be a useful exercise as new tests forms are created annually.

2.

Pilot testing of items is one way to make sure that operational tests field test items that are

likely to work. However, some local educators don't like the disruption of state testing twice during
the school year.
We heard the issue of pilot testing raised several times. Pilot testing the trial of items with a small number
of students - is useful to assure that the items to be field tested are likely to work. This can reduce either the
total number of items to be field tested, the number of field test items given to any one student, or both. It is
better to determine that an item doesn't work in a small scale pilot than in a large statewide field test.

16 | P a g e

Informal pilot testing might occur in a couple of ways. When new items are first created and especially when
new item formats are created, it can be helpful to administer the items to 20-30 students to see if they work.
It is possible to learn a lot from this sort of informal use of the items. After items have been written and
edited, they might next be tried out by approximately 100 students who are generally chosen to include highand low-achieving students because of the perspectives that they bring to testing.
However, the use of students to pilot test in the fall is of concern to the school corporations. One way to lessen
the impact is to package items into smaller units that would be about the length of an ISTEP+ session. This
would limit the testing time per classroom or student to 30-40 minutes. The downside of this approach is
more schools will need to participate, but at least the participation of each school would be minimized.

3.

The types of measures used in the ELA assessments can affect testing time.

We observed that the ELA assessment contains a number of open-ended items, as well as some clusters of
items that may affect testing time. In general, we support the use of authentically long reading texts with a
cluster of assessment items (for efficiency sake), as well as the more involved processes of writing
assessment (e.g., read a passage, answer multiple-choice items, then answer open-ended items). These are
more complex item types, but they also mirror the types of language tasks that adults are asked to do. Thus,
we feel that they should remain, even though they contribute to a longer overall test. The results of the 2015
assessment should be carefully reviewed to determine the time it actually takes for students to complete
these new items and to evaluate if the information they provide is worth the time they require.
The greater the number of reading passages with attendant items, the longer the assessment. An essential
part of the assessment blueprint to be developed and disseminated to the field is a description of the number
of passages and items required to adequately cover the states content standards. The rationale for these
numbers should be provided as well. This is particularly critical since Indiana does not intend to report a
reading score separate from ELA scores. Thus, the ELA assessment could be assembled using fewer reading
passages and yet yield reliable and valid ELA scores. This possibility should be investigated via the blueprint
that we are recommending. We have not seen such a blueprint or a rationale statement for the number of
passages and items anticipated in the final operational test. If it does not exist, it should be created soon.
Finally, these types of ELA measures can affect the overall level of performance. It is reasonable to speculate
that a more involved ELA assessment using open-end items would result in somewhat lower performance at
first. The actual difference will not be known until after standard setting, but teacher, parents and other
interested parties should be prepared for that possibility. Federal assessment regulations do not specify
details such as type of assessment prompts, their number or their length. Currently, it is the match between

17 | P a g e

the wording of the standards and the format and content of the items measuring the standards that is
important.

4.

Have adequate numbers of items been operationally field tested in 2015 to construct the

operational 2015 and the operational 2016 tests?


The reason for this concern on our part is that the test to be used in 2016 will be constructed from items used
on the 2015 operational test that can be re-used, combined with 2015 operationally field-tested items that
appeared to work. However, this is an untested assumption, since the state wont know the survival rate of
the operationally field tested items until students responses are analyzed. Thus, there is a risk that some
operational field testing may still need to occur in 2016. This risk is most acute with the Part I OE items, since
these are some of the most likely not to work. It is not improbable, given that only two versions of the Part 1
OE assessments are being field-tested, that one or more item type may not work in either version. That adds
risk to not only the 2015 test administration, but to 2016 testing as well. This is why we recommended that
the OE items not be released for two years. While our preference is to resume the release of the Part 1 OE
items in 2016, we strongly believe that this should occur only if there are sufficient OE items to build the
2017 assessments, including items from 2015 as well as items field tested in 2016. We also recommended
that IDOE consider a shorter version of the test. It would be prudent to begin now to consider what would be
the minimum number of raw score points that would be required given the purposes of the test.

5.

Meeting Federal testing requirements including alignment, peer review, etc.

The Federal government does not dictate testing time minimums or maximums, how many test items are to
be used, or which item types are to be used or not used. Federal assessment requirements do not restrict
state decisions about test design at this level of detail.
USED does require that states tests be technically sound and that the states provide evidence that the tests
support their intended uses (are valid) and that they are sound measures (produce reliable information about
students achievement). States must demonstrate this through information submitted for peer review.
Currently, it is the match between the number of standards, the content and processes included in the
standards, the wording of the standards and the design of the test that is important.
This is why we included a longer-term recommendation for sound assessment blueprints for the 2015 and
the 2016 ISTEP+ assessments. We believe that such blueprints are the starting point for the demonstration of
the validity of the assessment instruments and the documents we have found are not sufficient in scope and

18 | P a g e

detail to be of use in defining the operational ISTEP+ assessments or in seeking USED approval of the ISTEP+
program.
If a core set of items has been identified, this core test could be used to determine its alignment to the Indiana
standards. The Webb alignment tool is often used to help the state determine this key aspect of peer review
requirements.
Ultimately, the U.S. Department of Education (USED) will agree or disagree as to the reliability and validity of
the assessments, as well as to the adequacy of the information provided. If USED believes that there are
deficiencies, it will permit the state time to correct those issues.

6.

Setting college- and career-ready standards

This is an issue that a number of states have faced, since assuring readiness for college and careers is a
common goal nationally. One challenge in judging readiness for college and careers is the wide definition of
what constitutes readiness. For example, is college readiness defined as success in any post-secondary
institution, success in a community college, or just success in a highly selective four-year university? Is it just
grade-point average (GPA), or graduation in a reasonable number of years? Is career success the ability to
work well in entry-level jobs (many of which require relatively low skill levels) or the ability to take part in
and succeed in entry-level job training programs with advancement opportunities?
There are a variety of methods of organizing standard setting and obtaining useful cut scores. One way is to
use experts to judge the relationship between ISTEP+ tests and the skills needed to succeed in college and on
the job. The cut points should be set by panels of citizens who are familiar with K-12 education, as well as the
requirements for success in college and in a career. The other strategy is to seek external data sets and set
standards on the ISTEP+ tests at levels comparable to the external data. It is likely that the professional
judgment methods will be more useful for younger grade levels. External data such as NAEP or ACT results
are more relevant at the secondary level. The state should rely on the new testing vendor to propose a
suitable method and on outside technical advice to review and approve the method.

7.

Transitioning the assessment programs to new vendor(s)

The transition of contract work from one vendor to another is a significant activity for an assessment
program. Transitions involve considerable risks, including late delivery of testing materials, errors in testing
materials, errors in scoring and reporting and late delivery of and inaccurate assessment results. For the
incoming contractor(s) to succeed, it will be necessary for the outgoing contractor to assist with the
transition. This is especially true in the case of ISTEP+ because the incumbent contractor has held the
19 | P a g e

contract for so many years. The state must rely on the vendor to document scoring and reporting decisions,
for example, if IDOE does not have that documentation in hand already. There are a myriad of decisions that
the new vendor needs to succeed in implementing the ISTEP+ assessments.
The first step would be for this assessment staff to request the outgoing contractor to carefully and
thoroughly document all decisions in its IT systems, test development procedures, statistical analyses and
reporting metrics. The challenge here is that the out-going contractor, having lost the contact, may devote
fewer resources to helping the state to successfully transition the work to new vendors.
A key activity in states that transition from one vendor to another is to assure that open-ended items are
scored in a comparable manner. The outgoing contractor should provide the complete scoring guides
(containing the scoring rubric(s) and examples of student work used in training, certification and validity
checks) to the incoming contractor. The incoming contractor should use these materials to train and qualify
raters for scoring the 2016 items. Before the new contractor scores 2016 student responses, it may be
advantageous to have the new vendor re-score a sample of 2015 student papers to verify that the new vendor
can score students responses in a comparable manner.
Some states hire a consultant or an organization to assist in this effort to serve as a facilitator of the
conversations and work of the incoming and outgoing contractor. This can assist the assessment staff to
understand the information needed for the success of the new vendor and make requests to acquire the
information from the outgoing contractor. This transition consultant can also serve as a shuttle diplomat
between the outgoing and incoming vendors.
If the state chooses to award the work currently being done by one contractor to multiple different incoming
vendors, the use of a facilitator is even more important, since each new vendor will need assistance in
transition activities and the work of the different incoming contractors will need to be coordinated to provide
a coherent experience for local educators in Indiana. Both of us have first-hand experience with these types of
transition work, so we know the level of work required to make the transition run smoothly.
Many states also contract with independent psychometric consultants or organizations to replicate the
statistical analyses of the test data, item calibrations, reporting and linking of forms within and across years.
While errors in these calculations are rare, they are very disruptive and undermine public confidence in the
system.
8.

Level of staffing

We are concerned about the level of staffing of the assessment unit of IDOE. The number of assessment
components and their complexity drive this concern. We believe that the sponsoring agency (i.e., IDOE)
should be in charge of making the key decisions about the assessment program and the procedures used. This
requires the staff to do more than simply turn over the responsibility to the contractor. And, as suggested
20 | P a g e

above, if the assessment contractors employed by the state change that will mean significant transition
activities will be added to the IDOEs work.
To assure the successful transition discussed earlier, we believe that two or more additional assessment staff
FTE are needed to handle the transition. The number of staff members reflects the number of vendors that
will be used in the future and whether the current vendors are among them. New vendors require
substantially more staff work than would be the case if the state continued to use existing contractors. For
example, the Michigan Department of Education (MDE) has about 10 individuals who work in the assessment
administration unit alone. In Michigan, one person is assigned full-time to each of the states five assessment
programs, plus there is a program manager and support staff. Thus, two new staff members for the
assessment unit in Indiana is a conservative estimate of staffing levels necessary.
These assessment administration specialists would focus on documenting the assessment procedures of the
current contractor, transferring this information to the new contractor(s) and working with the new
contractor(s) to assure that they are using past procedures with their work going forward. We believe in the
motto: trust, but verify. In order to do this, the assessment program needs to be staffed adequately and they
must take responsibility for making key assessment decisions.

SUMMARY
We commend the state for tackling this thorny issue and working together to resolve it. We believe that if
these recommendations are followed, testing time can be reduced to more manageable levels. We also believe
that implementing our long-term recommendations will improve the design and implementation of the
ISTEP+ program in the future. We remain willing to assist in and perhaps monitor efforts to implement these
recommendations.

21 | P a g e

APPENDIX A
The sample Test Blueprint from Oregon is a large document. It was sent as a separate file. The link to
Oregons webpage is:
http://www.ode.state.or.us/wma/teachlearn/testing/dev/testspecs/asmtmatestspecsg5_2011-12.pdf

APPENDIX B

2015 Operational ELA Item Number and Type with Points Awarded by Reporting Category
Grade/Reporting Category
3

Part 1

Part 2

CR (2) ER (8) WP (10) MC/TE (1)


3 (6)

1 (8)

1 (10)

Reading: Literature
Reading: Nonfiction and Media Literacy

3 (6)

Reading: Vocabulary
Writing: Genres, Writing Process, Research Process
Writing: Conventions of Standard English
4

3 (6)

Reading: Literature

3 (6)

(4)
1
(4)
1 (8)

(6)
1
(4)
1 (10)

Reading: Nonfiction and Media Literacy


Reading: Vocabulary
Writing: Genres, Writing Process, Research Process
Writing: Conventions of Standard English
5

3 (6)

(4)
1
(4)
1 (8)

(6)
1
(4)
1 (10)

Reading: Literature
Reading: Nonfiction and Media Literacy

3 (6)

Reading: Vocabulary
Writing: Genres, Writing Process, Research Process
Writing: Conventions of Standard English
6

3 (6)

Reading: Literature

3 (6)

(4)
1
(4)
1 (8)

(6)
1
(4)
1 (10)

Reading: Nonfiction and Media Literacy


Reading: Vocabulary
Writing: Genres, Writing Process, Research Process
Writing: Conventions of Standard English
7

3 (6)

(4)
1
(4)
1 (8)

(6)
1
(4)
1 (10)

Reading: Literature
Reading: Nonfiction and Media Literacy

3 (6)

Reading: Vocabulary
Writing: Genres, Writing Process, Research Process
Writing: Conventions of Standard English
8

3 (6)

Reading: Literature

3 (6)

Reading: Nonfiction and Media Literacy


Reading: Vocabulary

(4)
1
(4)
1 (8)

(6)
1
(4)
1 (10)

Total
TE (2)

Items Points

Points
%

40 (40)

5 (10)

50

74

19 (19)

1 (2)

20

21

28%

12 (12)

1 (2)

16

20

27%

5 (5)

1 (2)

11%

2 (2)

1 (2)

14

19%

2 (2)

1 (2)

5*

11

15%

40 (40)

5 (10)

50

74

12 (12)

1 (2)

16

20

27%

19 (19)

1 (2)

20

21

28%

5 (5)

1 (2)

11%

2 (2)

1 (2)

14

19%

2 (2)

1 (2)

5*

11

15%

40 (40)

5 (10)

50

74

19 (19)

1 (2)

20

21

28%

12 (12)

1 (2)

16

20

27%

5 (5)

1 (2)

11%

2 (2)

1 (2)

14

19%

2 (2)

1 (2)

5*

11

15%

40 (40)

5 (10)

50

74

12 (12)

1 (2)

16

20

27%

19 (19)

1 (2)

20

21

28%

5 (5)

1 (2)

11%

2 (2)

1 (2)

14

19%

2 (2)

1 (2)

5*

11

15%

40 (40)

5 (10)

50

74

18 (18)

1 (2)

19

20

27%

13 (13)

1 (2)

17

21

28%

5 (5)

1 (2)

11%

2 (2)

1 (2)

14

19%

2 (2)

1 (2)

5*

11

15%

40 (40)

5 (10)

50

74

12 (12)

1 (2)

16

20

27%

19 (19)

1 (2)

20

21

28%

1 (2)

11%

1 (2)

14

19%

1 (2)

5*

11

15%

5 (5)
(4)
(6)
Writing: Genres, Writing Process, Research Process
2 (2)
1
1
Writing: Conventions of Standard English
2 (2)
(4)
(4)
The number of TE Items and points per TE item may vary depending on field test results.

* Includes double scored items

2015 Operational Math Item Number and Type with Points Awarded by Reporting Category
Grade/Reporting Category

Part 1

Part 2

Total Points

Points
%

CR (4)

ER (6)

MC/TE (1)

TE (2)

Items

Points

4 (16)

1 (6)

37 (37)

9 (18)

51

77

Number Sense

9 (9)

1 (2)

10

11

14%

Computation

9 (9)

1 (2)

10

11

14%

7 (7)

4 (8)

11

21

27%

12 (12)

3 (6)

15

23

30%

5*

11

14%

Algebraic Thinking/Data Analysis

3 (6)

Geometry/Measurement

1 (2)

1 (3)

Mathematical Process

(8)

(3)

4 (16)

1 (6)

Number Sense

37 (37)

9 (18)

51

77

1 (2)

9 (9)

1 (2)

11

13

17%

Computation

1 (2)

10 (10)

2 (4)

13

16

21%

Algebraic Thinking/Data Analysis

2 (4)

8 (8)

3 (6)

13

18

23%

10 (10)

3 (6)

14

19

25%

5*

11

14%

Geometry/Measurement

1 (3)

Mathematical Process

(8)

(3)

4 (16)

1 (6)

Number Sense

46 (46)

9 (18)

51

77

6 (6)

1 (2)

10%

Computation

1 (2)

14 (14)

3 (6)

15

22

29%

Algebraic Thinking/Data Analysis

2 (4)

14 (14)

3 (6)

16

24

31%

Geometry/Measurement

1 (2)

1 (3)

12 (12)

2 (4)

14

21

27%

Mathematical Process

(8)

(3)

5*

11

14%

4 (16)

1 (6)

Number Sense/Computation

1 (2)

Algebra/Functions

1 (2)

Geometry/Measurement

2 (4)

1 (3)

Data Analysis/Statistics
Mathematical Process

(8)

(3)

4 (16)

1 (6)

Number Sense/Computation

1 (2)

Algebra/Functions

1 (2)

Geometry/Measurement

2 (4)

1 (3)

Data Analysis/Statistics/Probability
Mathematical Process

(8)

(3)

4 (16)

1 (6)

Number Sense/Computation

1 (2)

Algebra/Functions

1 (2)

Geometry/Measurement

46 (46)

9 (18)

51

77

21 (21)

4 (8)

22

27

35%

12 (12)

3 (6)

14

20

26%

7 (7)

1 (2)

12

16%

6 (6)

1 (2)

9%

5*

11

14%

46 (46)

9 (18)

51

77

16 (16)

4 (8)

17

22

29%

13 (13)

3 (6)

15

21

27%

9 (9)

1 (2)

11

14

18%

8 (8)

1 (2)

12%

5*

11

14%

46 (46)

9 (18)

51

77

9 (9)

2 (4)

10

13

17%

14 (14)

2 (4)

16

21

27%

1 (2)

14 (14)

3 (6)

15

19

25%

Data Analysis/Statistics/Probability

1 (2)

9 (9)

2 (4)

10

13

17%

Mathematical Process

(8)

5*

11

14%

1 (3)

(3)

The number of TE Items and points per TE item may vary depending on field test results.

* Includes double scored items

S-ar putea să vă placă și