Sunteți pe pagina 1din 5

7/11/2012

Study Overview Measurement and Evaluation Perspectives On Scaling Teacher Affect with Multiple Measures
Judy R. Wilkerson, Ph.D., Florida Gulf Coast University The International Conference on Educational Measurement and Evaluation (ICEME 2012) SEAMEO-INNOTECH Manila, Philippines August 9-11, 2012
(c) 2012. Judy R. Wilkerson, Ph.D. All rights reserved.

Teacher dispositions include the values, attitudes, and beliefs about children, subject matter, and the skills of teaching that cause teachers to act in positive or negative ways. While measurement and evaluation interact closely, they are typically reported separately. The Dispositions Assessment Aligned with Teacher Standards (DAATS) scale of commitment to teaching skills is reviewed using both measurement and evaluation professional standards. Evidence of validity and reliability (measurement) and evidence of utility and feasibility (evaluation) are presented.
1
(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Measurement vs. Evaluation vs. Assessment

Measurement vs. Evaluation vs. Assessment


Measurement. Stevens (1946) defined measurement as the assignment of numerals to objects or events according to some rule. Georg Rasch (1960) established the mathematical relationship between a persons ability and the difficulty of an item. Evaluation. Stufflebeam (2001) defined evaluation as a study designed and conducted to assist some audience to assess an objects merit and worth. Assessment. McMillan (2007) differentiated measurement from evaluation, defining assessment as a process with four sequential steps: purpose, measurement, evaluation, and use.
(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

3 Sets of Standards and 1 Taxonomy


Content: U.S. national set of pre-service teaching standards developed by the Interstate New Teacher Assessment Consortium (INTASC). Measurement : The Standards for Educational and Psychological Testing. Evaluation: The Program Evaluation Standards of the Joint Committee on Standards for Educational Evaluation Krathwohl and Bloom Affective Taxonomy.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Krathwohl/Bloom Operationally Defined

Need for Multiple Measures


Association for Supervision and Curriculum Development in its position paper on The Case for Multiple Measures (Fuller, Fitzgerald, & Lee, 2008): "Multiple measures are needed to address the full depth and breadth of our expectations for student learning" (p. 2). Beyond the multiple-choice and short-answer items that are typical of current assessments, "other types of performance measuresessays, applied projects, portfolios, demonstrations, oral presentations, etc.are needed to represent and guide students' progress" (Herman, Baker, & Linn, 2004, p. 2).

(c) Judy R. Wilkerson, Ph.D. All rights reserved.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

7/11/2012

DAATS: Dispositions Assessment Aligned with Teacher Standards


1. Beliefs About Teaching Scale (BATS): a 60 item Thurstone agreement scale* 2. Experiences in Teaching Questionnaire (ETQ): 10 constructed response items about prior experiences* 3. Situational Reflection Assessment (SRA): 20 constructed response items, using picture prompts (Slitkin, 2007), in a thematic apperception format* 4. Classroom Dispositions Checklist (CDC): 50 paired statements of positive and negative behaviors. 5. K-12 Dispositions Impact (KIDS): focus group with 10 clustered prompts measuring childrens perceptions
(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Research Purposes
1. To determine if the 2008 results could be replicated and improved through enhanced scoring rubrics, a more diverse sample of students (undergraduate, masters level, and advanced graduate students), and better connectivity (higher completion rate for all three instruments). 2. To model and describe the integration of measurement and evaluation standards in the review and use of assessment instruments

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Significance
To illustrate the potential for using a mix of a quantitative and qualitative analysis identify high and low levels of in a critical area often underassessed teacher dispositions in the identification, celebration, and remediation of teachers and teacher candidates.

Research Questions
1. What are the psychometric qualities of three DAATS instruments when combined into a single decision-making measure? 2. To what extent do measurement and evaluation standards support the use of the DAATS battery?

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

10

Sample
3 instruments: BATS, ETQ, and SRA 190 students in two public universities in Florida.

Rasch Measures
Instruments were calibrated using the Andrich rating scale model (Andrich, 1988) of IRT and Winsteps software, version 3.71 (Wright & Linacre, 1998; Linacre, 2011). Items were combined into a single scale that included both dichotomous items (BATS) and rating scale items (ETQ and SRA). A linear transformation of the traditional mean of zero and scale of one was used, providing a mean of 50 and a scale of 10 to facilitate use.

92 undergraduates 49 masters level 10 alternative certification 19 advanced graduate (Ed.S. candidates) 3 other and 17 with unknown student status.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

11

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

12

7/11/2012

Selected Statistics
Means: 50 for items and 58 persons. Ranges: 11 to 84 for items 43-83 for persons. Standard deviations: for items is almost two logits (18.4) and for people about one logit (10.9). Mean fit statistics: near the expected ranges of 1.0 for mean squares and .0 for standardized zs. Of the 70 items, only three exceeded the 1.5 outfit MNSQ expectation(the highest was 2.05), and none exceeded 1.5 in infit. Cronbachs alpha (KR-20) is estimated at .96. The person reliability is .87 with separation of about three levels (2.67). Item reliability and separation are .98 and 7.63 respectively.

Category Structure
For all 6 categories, the range of MNSQs are all near the expected 1.0 (.83-1.02), with only 3 dropping below a.9. Category probabilities as expected.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

13

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

14

Validity and Absence of Bias


All items map to the INTASC Principles (content validity), and those expected to be more difficult or easier met expectations.
Collaboration (#10) was most difficult. Diverse learners (#3) was easiest.

Utility
Examined through review of four students whose scores matched expectations. Low scoring student should be counseled; high scoring student could be an effective leader.

Faculty perceptions of individual students and DAATS results matched (construct validity). Scores rise with degree level (another study by Quinn), indicating predictive validity. No statistically significant differences between gender and ethnic categories.
(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

15

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

16

Utility (cont.)

Feasibility
BATS is easy to administer and score; other instruments require a commitment to taking the time. Rater reliability requires training, but FACETS results (next study) indicate it is working.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

17

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

18

7/11/2012

Conclusions: General
1. 2. 3. 4. 5. 6. The INTASC Principles provide a useful construct definition that can be measured holistically and by Principle. The Thurstone agree/disagree scale contributes to the identification of strongly and weakly committed teachers. The Bloom and Krathwohl affective taxonomy works in assessment, yielding proficiency levels with a credible category structure. Combining affective instruments using different methods into a single Rasch scale overcomes weaknesses inherent in the instrument types. A well-designed measurement device leads to useful, feasible, and accurate evaluation decisions. A qualitative analysis of individual constructed response items enhances Rasch score interpretations, making them more useful for evaluation at the individual and program levels.
19

Conclusions: Research Questions


1. Psychometric properties were supported:
Fit statistics, Cronbachs alpha, reliability and separation, category structure, among others.

2. Use of measurement and evaluation standards.

Empirical and judgmental evidence of validity (construct, content, predictive). Reliability (see RQ 1). Utility: high quality data for individual and program evaluation (see next study, too.) Feasibility: requires time commitment and rater training.
20

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Study Overview Teacher Dispositions: Moving from Assessment to Improvement


Deirdre S. Englehart, University of Central Florida Heather L. Batchelder, University of Central Florida Kelly L. Jennings, University of Central Florida Judy R. Wilkerson, Ph.D., Florida Gulf Coast University W. Steve Lang, University of South Florida St. Petersburg David Quinn, Lee County Public Schools
The International Conference on Educational Measurement and Evaluation (ICEME 2012) SEAMEO-INNOTECH, Manila, Philippines, August 9-11, 2012
(c) 2012. Judy R. Wilkerson, Ph.D. All rights reserved.

A mixed methods approach was used to assess the dispositions of 40 early childhood pre-service teachers using four instruments from the DAATS (Dispositions Assessments Aligned with Teacher Standards) battery. BATS: Beliefs About Teaching Scale ETQ: Experiential Teaching Questionnaire SRA: Situational Reflection Assessment CDC: Candidate Disposition Checklist Quantitative and qualitative analysis of two case studies (one excellent and one needing improvement) and an analysis of four INTASC Principles targeted for program improvement.
21
(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

22

Vignette: A Tale of 2 Teachers and 1 Child

Purposes
Determining the extent to which:
(1) quantitative and qualitative data about teacher candidates dispositions converged with faculty perceptions and (2) the instruments and measures provided useful information for candidate counseling and program improvement efforts

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

23

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

24

7/11/2012

Persons 18 and 22:

Persons 18 and 22
Person 18:
Faculty perceptions: Enthusiastic about teaching and high in the cognitive domain. DAATS pinpointed specific, but limited, needs for improvement. Faculty perceptions: Average student whose interactions with faculty and students have not always shown positive affect toward the teaching profession or people. Lacks enthusiasm for, and knowledge of, the essential skills to become a successful teacher. Results from 4 instruments converged. Needed remediation with Principles 3 and 9 (diverse learners and reflection/CI) with a specific focus on people interactions issues.
26

Person 22:

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

25

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Program Evaluation
3 INTASC Principles were targeted for program improvement 1, 7, 9.

Use of Data for Program Improvement: 9 Goals and 4 Strategies

1. 2. 3. 4. 5. 6. 7. 8. 9. 1. 2. 3. 4.

Goals:

Take responsibility for continuous learning about content and skills Make content meaningful to support learning Use non-verbal behaviors, including demeanor, to motivate and teach Work collegially with others to plan, counsel, advocate, and teach Reflect and remain flexible Adapt to evolving needs Think positively about all (students, parents, colleagues, supervisors) Remain open to feedback and input from all stakeholders Report ethical violations of colleagues and friends Lessons and activities in courses Focused discussions with supervising teachers and univ. coordinators Systematic faculty observations Monthly discussions of progress

Strategies:

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

27

(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

28

Conclusions
Research Purpose #1: Quantitative and qualitative data about teacher dispositions converged with faculty perceptions in the three cases analyzed. This supports the use of a mixed methods approach and provides judgmental evidence of construct validity. Research Purpose #2: The instruments and measures provided useful information for candidate counseling and program improvement efforts. Faculty are now implementing minor and major remedial efforts based on data rather than intuition with these cases as well as others not described in this article. The study supported the utility of the instruments.
(c) 2012, Judy R. Wilkerson, Ph.D.. All rights reserved.

Recommendations for Future Research


1. What is the relationship between teacher affect and teacher effectiveness? 2. How do teachers with negative predispositions about children and teaching impact childrens motivation and achievement? 3. Does faculty counseling improve teachers dispositions when initial measures are low? 4. Do subsequent cohorts of teacher candidates have score increases after program improvement efforts are implemented?
29
(c) Judy R. Wilkerson, Ph.D. All rights reserved.

30

S-ar putea să vă placă și