Sunteți pe pagina 1din 29

In

 this  module,  we  will  consider  performance  assessment.    

1
Like  objec:ve  tes:ng,  performance  assessment  has  some  strengths  and  some  weaknesses.  
Both  measurement  approaches  should  be  in  a  classroom  teacher's  set  of  measurement  
tools,  available  to  be  used  in  the  right  situa:on.  

2
Performance  assessment  is  one  type  of  assessment in which students are involved in
activities where they demonstrate skills and/or create products.  

Performance  assessment  differs  from  tradi:onal  assessment  in  the  degree  to  which  the  
assessment  task  matches  the  behavior  domain  to  which  you  want  to  make  inferences.    

Performance  assessment  is  a  very  good  way  to  directly  measure  learning.    

This  type  of  assessment  is  also    called  “alterna:ve  assessment”  or  “authen:c  assessment.”  

3
Recall  module  5  where  we  compared  different  types  of  objec:ve  items.  Now  will  we  
compare  objec:ve  tests  to  performance  assessments.    

Objec:ve  tests  have  several  advantages  over  performance  assessment.    


For  instance,  objec:ve  tests  have  quick  and  objec:ve  scoring.  They  can  contain  informa:on  
from  a  large  number  of  content  and  can  measure  learning  outcomes  from  knowledge  to  
evalua:on  level.    Objec:ve  items  are  amenable  to  the  use  of  item  analysis.            

In  contrast,  performance  assessment  has  some  advantages  over  objec:ve  tests.  


Performance  assessment  can  provide  a  direct  measure  of  student  learning.  It  can  assess  the  
process  of  doing  as  well  as  the  final  product  and  it  also  can  measure  greater  depth  of  
understanding.      

4
There  are  also  some  disadvantages  to  both  objec:ve  tests  and  performance  assessment.      

It  can  be  :me-­‐consuming  to  write  good  objec:ve  test  items.  Without  a  well-­‐designed  test  
blueprint,  the  objec:ve  test  may  overemphasize  knowledge  level.  Objec:ve  items  may  have  
more  than  one  defensible  answer.  The  objec:ve  test  can  be  an  unfamiliar  format.    

OOen  :mes  performance  assessment  has  very  few  items  with  high  task-­‐specificity  and,  as  a  
result,  the  assessment  results  will  have  low  generalizability.    Performance  assessment  oOen  
contains  narrow  domains  so  choosing  appropriate  domains  is  very  crucial  for  performance  
assessment.  The  scoring  of  performance  assessment  is  subjec:ve  in  nature,  which  may  
decrease  the  consistency  (or  reliability)  of  test  scores.    

5
Now  we  focus  on  developing  a  performance  assessment.  Generally  speaking,  there  are  four  
steps  for  developing  a  performance  assessment.    

Step  1  is  to  “decide  what  to  test”  


Step  2  is  to  “design  the  assessment  context”  
Step  3  is  to  “create  scoring  rubrics”  
Step  4  is  to  “specify  constraints”  

6
The  first  step  of  developing  a  performance  assessment  is  “deciding  what  to  test”.    

The  easy  way  to  do  this  is  to  create  a  list  of  instruc:onal  objec:ves  that  you  would  like  to  
assess.  This  step  is  similar  to  developing  a  test  plan  for  objec:ve  tests.  Once  you  complete  
step  1,  you  will  have  iden:fied,  important  knowledge,  skills,  and  habits  of  mind  that  will  be  
the  focus  of  performance  assessment.  

In  addi:on  to  the  objec:ves  in  the  cogni:ve  domain,  instruc:onal  objec:ves  for  the  
affec:ve  and  social  domains  should  also  be  taken  into  considera:on.      

To  determine  which  objec:ves  to  include  from  the  cogni:ve  domain,  find  out  if  anything  is  
missing  from  your  tradi:onal    tests  or  if  there  are  any  skills  that  would  require  students  to  
acquire,  organize,  and  use  informa:on  –  such  as  inves:ga:ng  and  problem  solving.      

7
Here  are  some  examples  of  instruc:onal  objec:ves  in  the  cogni:ve  domain  for  
performance  assessment.    

“Draw  a  physical  map  of  North  America  from  memory  and  locate  10  ci:es”    

“Construct  an  electrical  circuit  using  wires,  a  switch,  a  bulb,  resistors,  and  a  ba_ery”    

“Describe  two  alterna:ve  ways  to  solve  a  mathema:cs  word  problem”    

"Program  a  calculator  to  solve  an  equa:on  with  one  unknown”  

8
Objec:ves  for  the  affec:ve  and  social  domain  can  include  habits  of  mind,  which  would  
include  construc:ve  cri:cism,  respect  for  reason,  and  apprecia:on,  and  social  skills  which  
would  include  coopera:on,  sharing,  and  nego:a:on.  

9
Examples  of  items  for  the  Affec:ve  and  Social  Domain  could  be:  

• Willingness  to  modify  explana:ons  

• Coopera:ng  in  answering  ques:ons  and  solving  problems;  or  working  together  to  pool  
ideas,  explana:ons,  and  solu:ons  

• Apprecia:ng  that  mathema:cs  is  a  discipline  that  helps  solve  real-­‐world  problems  

•   Recognizing  that  there  is  more  than  one  way  to  solve  a  problem.  

10
Step  2  is  designing  the  assessment  context,  which  means  to  “create  a  task  for  learners  to  
demonstrate  their  knowledge,  skills,  or  adtudes.”    

It  may  be  as  straigheorward  as  asking  students  to  complete  an  art  project  or  write  an  essay  
on  their  favorite  hobby.    

It  is  important  to  know  that  tasks  created  should  focus  on  “real-­‐world”  issues,  concepts  or  
problems.  The  ques:ons  you  can  ask  could  be:  

“What  does  the  doing  of  (art,  music,  design)  look  like  to  professionals  in  the  real  world?”  

“How  can  their  real-­‐world  tasks  be  adapted  to  the  school  sedng?”  

11
Regarding  the  tasks  in  performance  assessment,  the  following  things  should  be  noted.  

Make  sure  that  the  requirements  for  task  mastery  are  clear  without  revealing  the  solu:on.  
For  instance,  learners  should  be  able  to  tell  when  they  are  finished.  

The  specific  ac:vity  is  from  which  generaliza:ons  can  be  made  about  knowledge  and  skills.  
Task  should  be  complex  enough  to  provide  wide  range  of  behavior  in  a  narrow  skill  domain.    

Tasks  should  also  be  complex  enough  to  allow  for  mul:-­‐modal  assessment,  such  as  
observa:ons,  oral  reports,  journals,  exhibits,  and  so  on.  

12
Tasks  should  yield  mul:ple  solu:ons  such  as  judgment  and  interpreta:on,  each  with  costs  
and  benefits.  

Tasks  should  require  mental  effort  and  self-­‐regulated  learning.    

Tasks  in  performance  assessment  should  require  “persistence  and  determina:on”  as  well  as  
“the  use  of  cogni:ve  strategies”  rather  than  depending    on  coaching.  

13
When  performance  tasks  have  been  developed,  the  following  criteria  can  be  used  to  
evaluate  these  tasks.      

Generalizability:  Can  performance  tasks  be  generalizable  to  comparable  tasks?      

Authen:city:  How  authen:c  is  the  task,  in  other  words  is  the  task  similar  to  a  real-­‐world  
ac:vity?      

Mul:ple  Foci:  Have  you  included  mul:ple  foci?  Or  Does  it  measure  mul:ple  outcomes?    

Teachability:  How  teachable  is  the  content?    Is  it  likely  that  students  will  be  proficient  aOer  
instruc:on?      

Fairness:  Is  the  performance  task  fair  and  unbiased  to  every  student?  Is  the  task  beneficial  
to  high  socioeconomic  status  students?        

Feasibility:  How  feasible  is  the  task?    Does  the  school  have  the  space  and  equipment?    Do  
students  have  enough  :me  to  conduct  and  how  much  will  it  cost?    

Scorability:  Does  the  performance  task  have  scorability?  Can  it  be  evaluated  reliably  and  
accurately?  

14
Step  3  of  performance  assessment  is  “crea:ng  scoring  rubric.”  

When  crea:ng  rubrics  do  not  limit  scoring  criteria  to  those  that  are  easiest  to  measure.  

In  contrast,  you  should  carefully  construct  detailed  scoring  systems  to  help  you  minimize  
the  arbitrariness  of  judgments.  A  scoring  rubric  holds  learners  to  high  standards  of  
achievement.  

15
It  is  important  to  know  that  rubrics  should  be  developed  for  a  variety  of  accomplishments.  
In  general,  performance  assessment  requires  the  following  types  of  accomplishment:  
including  products,  cogni:ve  processes,  and  observable  performance.  

Products  for  performance  assessment  could  be  essays,  graphs,  movies,  or  websites.  

Cogni:ve  processes  could  be  skills  in  acquiring,  organizing,  or  using  informa:on.  

Observable  performance  could  be  dancing,  dissec:ng  frogs,  or  following  recipes.  

16
The  second  crucial  considera:on  in  developing  rubrics  is  to  choose  a  scoring  system  
appropriate  to  the  task  you  want  to  measure.  

There  are  three  types  of  rubrics  to  use:  checklists,  ra:ng  scales,  and  holis:c  scoring.  

See  your  text  for  more  informa:on:    


-   pg  172-­‐178  (8th  ed)  
-   pg  195-­‐202  (9th  ed)  

17
Checklists  contain  a  list  of  behaviors,  traits  or  characteris:cs  that  can  be  scored  as  either  
present  or  absent.  

They  are  best  suited  for  tasks  that  can  be  broken  down  into  clearly  defined,  specific  ac:ons.    

When  using  a  checklist  you  should  provide  for  cases  in  which  there  was  no  opportunity  to  
observe  a  specific  element.  In  such  cases,  the  value  of  +1  represents  the  task  present,  0  for  
no  opportunity  to  observe,  and  -­‐1  for  absent.    

Typically,  a  task  being  present  is  marked  as  "1"  or  "yes"  and  not  being  present  is  marked  as  
a  "0"  or  "no."  

See  your  text  for  more  informa:on:    


-   fig  8.5  &  8.6    (8th  ed)  
-   fig  9.5    &  9.6  (9th  ed)  

18
Ra:ng  scales  are  typically  used  for  more  complex  behaviors  that  yes/no  judgments  are  not  
enough.  

The  use  of  ra:ng  scales  usually  involves  assigning  numbers  to  performance  categories.  

Most  numerical  ra:ng  scales  use  an  analy:c  scoring  technique  called  “primary  trait  
scoring.”  Primary  trait  scoring  requires  the  test  developer  to  first  iden:fy  the  most  
important  traits,  and  then  assign  numbers  to  represent  degrees  of  performance.  This  helps  
scorer  focus  on  important  criteria.  

See  your  text  for  more  informa:on:    


-   fig  8.7  &  8.8    (8th  ed)  
-   fig  9.7    &  9.8  (9th  ed)  

19
Holis:c  scoring  is  used  when  the  rater  is  more  interested  in  es:ma:ng  the  overall  quality  of  
the  performance.  

It  is  typically  used  with  essays,  term  papers,  dance  or  musical  performance.  

It  is  important  to  have  a  model  for  each  category  to  ensure  similar  quality  with  categories.  
See  your  text  for  more  informa:on:    
-   fig  8.9    (8th  ed)  
-   fig  9.9    (9th  ed)  

20
Each  of  the  three  scoring  systems  has  its  par:cular  strengths  and  weaknesses.    

This  table  summarizes  the  comparisons  of  checklists,  ra:ng  scales,  and  holis:c  scoring  in  
terms  of  ease  of  construc:on,  scoring  efficiency,  reliability,  defensibility,  and  quality  of  
feedback.      

Checklists  have  the  highest  level  of  reliability,  defensibility,    and  feedback  while  holis:c  
scoring  is  easiest  to  construct  and  has  high  scoring  efficiency.      

Ra:ng  scales  received  the  moderate  ra:ng  for  all  facets  of  comparison.  

21
Checklists,  ra:ng  scales,  and  holis:c  judgments  can  be  combined  to  determine  total  
assessment  and  this  strategy  should  be  used  if  a  variety  of  traits  are  assessed.  

See  your  text  for  more  informa:on:    


-­‐   fig  8.9    (8th  ed)  
-­‐   fig  9.9    (9th  ed)  

22
In  the  scoring  system,  three  sources  of  error  may  occur:  including  scoring  instrument,  
procedure,  and  teacher.  

Common  flaws  in  scoring  instruments  include  lack  of  descrip:ve  rigor  and  ambiguity  which  
can  lead  to  unreliability.  

Having  too  many  grading  criteria  for  a  task  or  having  too  many  students  to  rate  can  cause  
procedural  flaws.  

Teachers  can  be  a  source  of  scoring  error.  There  are  mul:ple  types  of  teacher  bias.  
Generosity  error  is  when  a  teacher  grades  too  leniently.  Severity  error  is  when  a  teacher  
grades  too  harshly.  Central-­‐tendency  error  is  when  a  teacher  grades  all  students  about  the  
same.  The  halo  effect  in  grading  is  when  the  teacher’s  adtude  toward  a  student  influences  
the  score  a  student  receives.  

23
Step  four  for  developing  performance  assessment  is  specifying  constraints.    

Outside  the  classroom,  professionals  have  constraints  on  their  performance,  such  as  
deadlines,  limited  office  space,  and  outmoded  equipment.  In  the  same  way,  teachers  need  
to  decide  which  condi:ons  to  impose  on  a  performance  task.  Among  the  most  typical  
things  of  test  constraints  are:    

•   Time:  How  much  :me  are  students  allowed  to  prepare,  rethink,  and  finish  a  performance  
task?  
•   Reference  material:  Are  students  allowed  to  have  reference  materials?  
•   Other  people:  Are  they  allowed  to  consult  with  other  people?  
•   Equipment:    Can  students  use  computers  or  calculators  to  help  them  solve  problems?  
•   Prior  knowledge  of  task:  How  much  informa:on  will  they  be  tested?  Do  they  receive  the  
informa:on  in  advance?    
•   Scoring  criteria:  Do  students  know  the  standards  (or  criteria)  for  their  performance  task  in  
advance?  

See  your  text  for  more  informa:on:    


-   pg  178    (8th  ed)  
-   pg  201    (9th  ed)  

24
To  help  decide  what  to  do  about  these  constraints,  ask  yourself  the  following  ques:ons:  

What  constraints  authen:cally  replicate  the  real-­‐world  situa:on?  

What  constraints  bring  out  the  best  performance  in  novices?  

What  are  authen:c  limits  to  place  on  the  use  of  :me,  help  from  others,  reference  
materials,  etc.?  

25
Like  objec:ve  tests,  to  determine  the  quality  of  performance  assessment,  validity  and  
reliability  need  to  be  considered.    

These  are  the  two  most  cri:cal  criteria  of  test  quality.    

26
Validity  is  the  extent  to  which  the  test  actually  measures  what  it's  supposed  to  measure.  
To  help  ensure  validity,  teachers  should  go  over  all  the  elements  in  any  performance  
assessment  to  look  for  possible  problems  with  validity.  

27
The  following  things  should  be  considered:  

Be  a_en:ve  to  issues  of  task-­‐specificity  and  domain-­‐sampling  

Recognize  poten:al  subjec:vity  in  raters  

Inform  students  of  performance  criteria  

Avoid  common  errors,  such  as,  “failure  to  use  en:re  ra:ng  scale,”  “reliance  on  mental  
record-­‐keeping,”  and  “influence  of  prior  percep:on  of  student.”  

28
Reliability  of  the  test  refers  to  the  consistency  or  stability  of  the  test  scores.    Ideally,  
students  should  get  the  same  score  regardless  of  who  the  rater  is.  

Reliability  of  a  performance  assessment  is  more  challenging  to  achieve  than  with  an  
objec:ve  test.    One  choice  that  teachers  can  make  to  help  increase  the  reliability  of  
performance  assessment  is  to  use  several  performance  tasks  that  are  rela:vely  small  in  
scope,  rather  than  only  one  large  task.    

Other  ways  to  increase  reliability  of  assessments  can  be  as  follows:  

When  possible,  obtain  mul:ple  observa:ons,    

When  possible,  use  mul:ple  raters    

Use  smaller  tasks.      

Be  explicit  about  assessment  purpose  and  state  the  performance  criteria  and    
ra:ng  categories  clearly.  

29

S-ar putea să vă placă și