CMU - Teaching Portfolio Guide

Eberly Center for Teaching Excellence | 5000 Forbes Avenue, Cyert Hall 125 | Pittsburgh, Pennsylvania 15213
www.cmu.edu/teaching | 412.268.2896
www.cmu.edu/teaching
Guidelines for Teaching Portfolios
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
What is a Teaching Portfolio? .............................. Page 1
Why do a Teaching Portfolio? .............................. Page 1
What goes in a Teaching Portfolio? ..................Page 1-2
Reective statement
Statement of teaching responsibilities
Supporting documents
Statement of teaching goals for the future
What kinds of supporting documents
should I include? ............................................... Page 2-5
Your own teaching
Your students
Your colleagues, department, and institution
Your discipline or teaching in general
What kind of support is available to me
as I compile my Teaching Portfolio? ..................... Page 5
Examples of documents for the
Teaching Portfolio ............................................ Page 6-8
C O N T E N T S
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 1
What is a Teaching Portfolio?
The teaching portfolio is a description of
a professors major strengths and teaching achieve-
ments. It describes documents and materials which
collectively suggest the scope and quality of a profes-
sors teaching performance. (Seldin 1997). Because
teaching efforts and accomplishments are difficult to
capture solely by means of numerical summaries, the
concept of a portfolio becomes useful in this arena. A
teaching portfolio goes beyond a list to include elabo-
rations and reflections. In this sense it is similar to an
artists portfolio, because it showcases your best work
and conveys a sense of yourself and your vision.
Why do a Teaching Portfolio?
The goal for a teaching portfolio is twofold.
At its core, compiling a teaching portfolio is essen-
tially a reflective activity. The very process of writ-
ing down your teaching philosophy and corroborat-
ing it with the appropriate evidence automatically
causes you to reflect on your teaching. It can help
clarify your goals, underscore your development as
an educator, and highlight areas for further growth.
However, portfolios are also used by the adminis-
tration for the assessment of your teaching perfor-
mance. Therefore, you are writing both for yourself
and for an external audience, for the purposes of
self-reflection and summative evaluation.
What goes in a Teaching Portfolio?
Teaching portfolios generally exhibit a simi-
lar structure, but the individual documents included
can vary significantly. You will want to play to your
strengths as you showcase the breadth of your accom-
plishments, so you should think about what they are
and how to best represent them. The structure of a
teaching portfolio consists of four broad categories:
{ Reflective statement } Also called the teaching or
education philosophy, this document frames the
whole portfolio. It communicates who you are as a
teacher: how you define education, how you concep-
tualize your role and the learners role in and out of
the classroom, and how you translate your beliefs
into action (typically 1 or 2 pages, but you shold
check with your Department Head).
{ Statement of teaching responsibilities } This doc-
ument should communicate at a glance the breadth
of the instructors teaching activities. It should be
presented in a tabular format, and possibly support-
ed by a narrative in the appendix. The table should
include every teaching assignment over the past 3
years, including:
Official teaching assignments
Other teaching activities (e.g., Andrews Leap)
Supervisory activities (e.g., independent stud-
ies, SURG grants)
For each teaching assignment, the information that
is typically included consists of:
type of course (e.g., studio, lab, large lecture)
number of students enrolled
breakdown of majors/non-majors or under-
graduate/graduate if relevant
numerical summary of FCE ratings
whether the course has been developed or
substantially revamped by the instructor.
The other 2 categories should include information
appropriate to each activity. The tables from your
annual reports are a good place to start to create this
document. In addition, check with your Department
Head for departmental guidelines (up to 3 pages).
{ Supporting documents } This section will provide
evidence for the claims in the philosophy and is
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 2
really the portfolios body. This is the section most
similar to the artists portfolio. For guidelines on
how to organize the documents in this category, see
the Supporting Documents section below.
{ Statement of teaching goals for the future } The
portfolio should be forward-looking, with ideas
for future action fueled by reflections on the past
and the present. Just like a researcher needs to
have a research agenda, an educator should be able
to articulate how she wants her teaching to grow.
For instance, you might talk about working on a
specific challenge in the classroom, or the wish to
create a summer program to reach out to talented
high-school students, and so on. Some instructors
prefer to integrate this statement into the teaching
philosophy. In this case, the goals should be clearly
highlighted within that document (1 page).
What kinds of supporting documents
should I include?
Just like with an artists portfolio, the aim is
not to be comprehensive. A novice instructor might
not have copious material to showcase, whereas
somebody whos been teaching for a long time might
choose to be more selective in her choices. Some
people might opt to showcase the most significant
contributions thorough the years, others might wish
to emphasize their most recent work. Others still,
might choose to emphasize development, starting
from a challenge in their own teaching, steps they
took to overcome it, and finally evidence of having
successfully navigated the challenge. Whatever the
direction chosen, the portfolio is a dynamic docu-
ment and it will evolve over the course of your ca-
reer. Ultimately, it should demonstrate commitment
to and/or impact on:
1. Your own teaching. This section aims to demon-
strate that the instructor is engaged in an ongoing
reflection about her teaching.
2. Your students. There is no teaching without
learning. The best teachers inspire, motivate, and
create an environment where learning can occur.
This section aims to demonstrate all this.
3. Your colleagues, department and institution.
Teaching takes place in a broader context than the
classroom. Therefore, it should be informed by and
inform the departments curricular goals and by the
institutions pedagogical mission. This section aims
to demonstrate all this.
4. Your discipline or teaching in general. The best
teaching is also informed by a broader scholarship,
whether disciplinary (e.g., Science Education), cross-
disciplinary (e.g., First Year Programs) or specific ped-
agogies (e.g., Service Learning). This section aims to
demonstrate your participation in those dialogues.
In all these dimensions there is a trajectory
from commitment to impact, and instructors will
be at different points along the four continua de-
pending on where they are in their career. For in-
stance, beginning instructors (years 1 to 3) would
be expected to show commitment to reflecting on
their own teaching, to their students, and to learn-
ing about educational goals and practices of the in-
stitution and the discipline. At a later stage (years
4 to 6), the process of reflection should result in
concrete changes implemented in the course, such
as the development of new materials, or the integra-
tion of new technologies and pedagogies. Instructors
should have some preliminary measures of the im-
pact of such changes. Advanced instructors should
be able to document a demonstrated impact on
student learning and motivation, as well as broader
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 3
participation in pedagogical activities outside their
own classroom, possibly up to the national level
(e.g., taking active roles in the education section
of their disciplinary association). Some exceptional
instructors might also be able to demonstrate recog-
nition of their impact. This could take the form of
awards, internal or national, being invited to speak
on teaching topics, and so on. The figure below
synthesizes this model.
Different types of evidence are appropriate
at each stage. For instance, a record of attendance
to teaching development seminars might be enough
to demonstrate commitment to your own teaching,
but in order to demonstrate impact you will need to
articulate how the knowledge from those seminars
translates into pedagogical practices in your courses.
The type of evidence presented can vary even with-
in the same stage, depending on what is appropriate
to the type of course and discipline. For instance,
if you are trying to document student learning and
performance, you might have indirect measures
(e.g., the words of an external observer saying that
the students appeared to be learning) and more di-
rect ones (e.g., samples of student work, or pre-post
tests in appropriate domains).
So how do you select documents for the
portfolio? As a general rule, as you gather more evi-
dence of impact, this evidence gradually replaces the
evidence of commitment. Likewise, more direct mea-
sures should be preferred to indirect ones when avail-
able and appropriate. Following is a list of suggested
indicators for the four dimensions. Sometimes the
way to document those indicators is self-evident (e.g.,
students scores on tests); for cases where it is not im-
mediately clear how to document them, possible evidence
follows in italics.
1. Your Own Teaching
Participation in an association con-
cerned with the improvement of
teaching and learning (e.g., AAC&U,
AERA, SENCER, Education section of
disciplinary association)
Taking/auditing courses relevant to
teaching (e.g., cognitive psychology,
group dynamics, public speaking,
cultural diversity)
Seeking feedback from colleagues or Eberly
Center
Statement from colleagues or Eberly Center records
Observing other classes
Evidence of reflection on your own teaching
(Course logs)
Participation in seminars and workshops to
improve teaching
Records from Eberly Center or other relevant
organization
2. Your Students
a. Student Learning and Performance
Students scores on tests, like pre-post tests
Students laboratory workbooks and logs
Student essays, creative work and projects or
fieldworks reports
Demonstration of change in student per-
formance (e.g., successive drafts in light of
grading criteria or rubrics)
Ones Own
Teaching Students
Colleagues |
Department |
Colleagues | Colleagues |
Institution
Discipline or
Teaching in
General
Commitment
Impact
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 4
Research on the impact of changes to the
course
Reports or published papers
Instructional and assessment materials
developed (e.g., handouts, projects, grading
rubrics, pre-post tests)
Student ratings of own learning (From FCEs)
Student comments from FCEs related to
their own learning
Unsolicited letters from students about their
own learning
Interview data collected from students, such
as SGIDs or focus groups
Made available to instructor confidentially by
Eberly Center; her choice whether to submit them
or not
Statements from colleagues who have ob-
served the instructor teach or who have
observed the quality of student work
Statements from teaching consultants who
observed the instructor teaching
or not
Statements on the preparedness of instructors
students by instructors who teach courses
downstream
Letters from alumni about impact
Reports on students preparedness by their
employers
b. Student Inspiration/Motivation
Record of students who elect another course
with same instructor (e.g. letters from students)
Evidence of the effect of courses on student
major/career choices
From student letters
Letters from alumni about impact
Student ratings of satisfaction with course/
instructor (from FCEs)
Student comments from FCEs
Unsolicited letters form students
Interview data collected from students, such
as SGIDs or focus groups
or not
Statements from colleagues who have ob-
served the instructor teach
Statements from teaching consultants who
have observed the instructors teaching
or not
Early evaluations and evidence of responsive-
ness to student feedback
Evidence of effective supervision of honors,
masters or PhD theses, independent studies,
5th year scholars, SURG grants (from student
letters)
Information on instructor availability to
students (above and beyond office hours, in
person or online) (Self-report, or from students
letters and comments on FCEs)
3. Your Colleagues/Department/Institution
Internal education awards
Letters from Dean praising the instructors
teaching
Adoption of instructors innovation by
broader curriculum (e.g. template syllabus for
all sections of intro course, labs developed by
instructor still used in the course when other
people teach it)
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 5
Instructional or assessment materials devel-
oped (e.g., syllabi, educational software)
Developing new courses or revamping old
ones (self-report, syllabi)
Developing course portfolios to facilitate the
task of future instructors teaching the course
Evidence of effectiveness of help given to
colleagues or graduate students on teaching
improvement, informally or through Eberly
Center or other workshops (from faculty and
graduate student letters, Eberly Center database)
Serve on educational or curriculum develop-
ment/revision committees
Organizing or leading teaching development
opportunities (TA training or orientation,
seminars in instruction, teaching teas) (self-
report, letters from participants)
Informal help given to colleagues or graduate
students on teaching improvement (self-report
of time and task involved, colleague and graduate
student letters)
Evidence of success of internship, co-op, sum-
mer academy programs set up by instructor)
Setting up internship program, or co-op, sum-
mer academy (self-report or program documentation)
4. Your Discipline or Teaching in General
Awards from external institutions
Invitations to teach for outside agencies
Instructional or assessment materials (e.g.,
textbooks and instructors manuals)
Publications on teaching (research articles, op-
ed columns) (published articles, internal reports
for work not yet accepted for publication, or
invitations to contribute for work not done yet)
Other kinds of invitations based on ones
reputation as a teacher (e.g., a magazine or radio
interview)
Records of adoption of own textbook by
other people
Editing or contributing to a professional
journal on teaching ones subject
Evidence of success of internship, co-op sum-
mer program with external students (number
of applicants, student evaluation of satisfaction,
future course selection if tracked)
What kind of support is available to me as I
compile my Teaching Portfolio?
Check with your Department Head for depart- Department Head Department Head
mental or college guidelines. Some departments have
written documents. Your mentors might be another
possible source of support. Finally, the Eberly Center
is available for consultations as well.
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 6
Examples of documents for the Teaching
Portfolio
{ Reflective statements/teaching philosophies }
The following pages include three examples of re-
flective statements, selected from both tenure and
teaching tracks. They are from multiple disciplines
and all look different, but they all convey a sense of
the authors teaching philosophy.
Steven Rudich, Computer Science
This statement is very concise, but it clearly
communicates what the instructor values in
teaching, and how he translates his values into
classroom pedagogy.
Pamela Lewis, Heinz School
This statement blends text and diagrams to pro-
vide a sense of the evolution of the instructors
self-concept through the years. It also com-
municates that she is familiar with some of the
teaching literature. This statement includes the
statement of teaching goals as well.
Peggy Knapp, English
This statement demonstrates development
through a different model, an initial philosophy
(dated 1999) plus an addendum to reflect her
recent accomplishments (dated 2003).
{ Supporting documents }
The following documents highlight some points
across the continuum from commitment to impact.
The list is not comprehensive, but is meant to get
you started thinking about the possibilities. Not all
documents have actually appeared in a packet yet,
but all could be included. Again, the selection draws
from both tenure and teaching track faculty.
Instructional and assessment materials
developed
Handout: How to write a philosophy
paper by Maralee Harrell, Philosophy
This handout reveals an understanding of
the level of writing and argumentation skills
first-year students possess, and a commitment
to proactively helping them develop further.
Grading rubric by Shelley Evenson, Design
and HCII
The significance of the rubric is to clarify
expectations and grading criteria for the
students in order to facilitate their study and
their learning.
Grading According to a Rubric, Maralee
Harrell, Philosophy (published in Teaching
Philosophy)
Publishing a paper on your own grading
rubric is evidence of impact on the disciplin-
ary discourse.
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 7
Portfolio
Research on the impact of changes to the
course
Using Argument Diagrams to Improve
Critical Thinking Skills in Introductory
Philosophy, Maralee Harrell, Philosophy,
technical report
This report documents the positive impact of
a specific innovation the instructor intro-
duced in her class by means of a statistical
data analysis of student performance.
Implementing a Computerized Tutor in a
Statistical Reasoning Course: Getting the
Big Picture, Oded Meyer, statistics, Paper
presented at the American Association for
Educational Research conference.
In this case, the claim of impact is validated
by acceptance for presentation and by the
educational/scientific community.
Adoption of instructors innovation by broader
curriculum
A Successful Peer Writing Assistant
Program, Bonnie Youngs and Anne Green,
Modern Languages, paper published in the
Foreign Language Annals
This article documents the key features of a
program the two faculty members designed,
which has been very successful and is now
used by instructors in all seven languages
taught in the Modern Languages department.
Developing New Courses or Revamping Old
Ones
Tutoring for Community Outreach: A
Course Model for Language Learning and
Bridge Building between Universities and
Public Schools, Susan Polanski, Modern
Languages, paper published in the Foreign
Language Annals
This article describes a new course marry-
ing disciplinary content and service-learning
pedagogy.
Organizing or leading teaching development
opportunities
Have your cake and eat it too, lecture
on how to give lectures by Steven Rudich,
Computer science
This seminar, given to faculty and graduate
students in Computer Science, documents
the instructors commitment to the depart-
ment. Evaluations of the seminar (not shown
here) serve as evidence how of well received
the lecture was by participants.
TA Handbook 2005-2006, Maralee
Harrell, Philosophy
The handbook documents the instructors
effort to clarify roles and responsibilities for
TAs in the department, as well as to equip
them with some valuable pedagogical knowl-
edge.
C
R
E
A
T
I
N
G

A

C
O
M
M
U
N
I
T
Y

O
F

E
D
U
C
A
T
O
R
S
Page 8
Portfolio
Setting up internship, co-op, summer academy
etc
Andrews Leap, Steven Rudich, Computer
Science
This is a summer program for talented high-
school students to explore the frontiers of
computer science. The materials presented
document the instructors commitment to
outreach in Computer science education.
Records of advanced studies of participants
(not shown here) give evidence of impact on
students motivation and learning.
Allegheny Intermediate Unit Outreach
Program, Scott Matthews, Engineering and
Public Policy
This is a campus-based outreach program for
high-school students to familiarize them with
environmental principles and generate inter-
est in related careers. The materials presented
document some of the impact in terms of
students increased environmental awareness.
Publications on teaching
A Statics Concept Inventory: Development
and Psychometric Analysis, Paul Steif et
al., Mechanical Engineering
This paper, published in the Journal of
Engineering Education, documents a test
developed by the author and used at many
universities to record conceptual progress in
the learning of statics.
Humorous Engineering 101, Larry
Cartwright, Civil and Environmental
Engineering (award-winning paper at the
ASEE conference 2001)
This paper is evidence not only of national
impact but also national recognition.
What We Know about Learning, Herb
Simon, Psychology (1997 Frontiers in
Education conference keynote address)
This paper too is evidence of national impact
and recognition, across fields.
Other kinds of invitations based on ones
reputation as a teacher
Laura Lee: Bridging the Gap between
School and Practice, in Direct Connection,
volume 3, issue 2, December 2000
This interview demonstrates the instructors
influence on architecture education nationally.
1
How to Write a Philosophy Paper
By Mara Harrell

How do I know what I think until I see what I say? EM Forster

Writing is important, especially to a liberal arts education, because in many ways the act of
writing is the embodiment of the process of critical thinking. Consider what Douglas Soccio
1
has
said about the subject:

Critical thinking is the conscious, deliberate, rational assessment of claims according to clearly
defined standards of proof (Soccio 2001, 39).

One of the signs of good critical thinking is the willingness to accept the best evidence, even
when it requires modifying or rejecting a cherished belief of highly desired conclusion (Soccio
2001, 41).

Just as we could say that philosophy is the pursuit of wisdom (for philosophers are, literally,
lovers of wisdom), we can say that philosophy is a deep and comprehensive inquiry into human
experience. In order to perform this inquiry, philosophy relies on knowledge from other
disciplines, but also may go further in its investigation by analyzing the assumptions made by
other disciplines, like what it means to be human, what it means to have knowledge, etc.

As such, philosophical inquiry is a special brand of critical thinking. As philosophers we are
intensely interested in what we should believe, what we should do, and how we are to know what
we should believe and what we should do. The process of creating, analyzing, and evaluating
arguments helps us to determine what claims are most likely to be true, and thus helps us achieve
our philosophical goals.

The goal of this handout is to help you to become comfortable writing philosophy papers. Many
of the ideas, directives, and admonitions are idiosyncratically my own, and may not be shared by
all philosophy teachers. However, I do think that much of it will be useful to you not only in my
class, but in other philosophy courses you may take, and indeed in other college courses you may
take.

I. Vocabulary and Logic

In philosophy, we take precise definitions of words very seriously. Many philosophers have
spent a good chunk of their lives arguing about what a particular word or phrase means. In this
spirit, I am going to introduce several technical terms that have particular meaning in logical
philosophical discourse. These words may have different colloquial uses, so be conscientious
about using them properly in your philosophy papers.

Statement: A statement is a sentence that can either be true or false. Example: It will rain
tomorrow. We say that this sentence has a truth-value (either true or false), and perhaps we can

1
Soccio, Douglas J . (2001) How to Get the Most Out of Philosophy, 4
th
Edition. Belmont, CA: Wadsworth.
2
even know what the truth-value is. Not all sentences in English are statementsquestions,
commands, and propositions, for example. Some non-statement sentences can, however, be
transformed into statements with some re-wording. Example: Be a doctor! can be usefully
transformed into You should be a doctor if the context permits.

Conditional Statement: A conditional statement is also a sentence that can be either true or
false. They are special, though, because they occur so often in arguments. A conditional
statement has two parts: the antecedent and the consequent. A conditional statement generally
has the form of an if, then statement, in which we have If [the antecedent], then [the
consequent]. Consider the following example:
If you earned an A on the final paper, then you earn an A in the class.
The entire statement is the conditional, you earned an A on the final paper is the antecedent,
and you earn an A in the class is the consequent.

Argument: An argument is a set of statements, one of which is the conclusion, and the others
are premises. The premises provide support for the conclusion. In other words, the conclusion
asserted to be true on the basis of the premises.
Example: Premise: Either it will rain tomorrow, or it will be sunny tomorrow.
Premise: It will not rain tomorrow.
Conclusion: It will be sunny tomorrow.
An argument can be good or bad based on (1) how well the premises support the conclusion, and
(2) whether the premises are actually true.

Validity: A valid argument is one in which it is not possible for the conclusion to be false if the
premises are true. This is a very bold statement, not about what is actually the case, but about
what could possibly be the case. It is helpful, when considering validity, to consider the notion of
possible worlds. I can imagine a possible world in which grass is blue, and I can imagine a
possible world in which trees are blue. But consider the following argument:
Premise: If grass is blue, then trees are blue.
Premise: The grass is blue.
Conclusion: Trees are blue.
There is no possible world in which the premises are true, but the conclusion false. Thus, this is a
valid argument.

Conversely, an invalid argument is one in which it is possible for the premises to be true and the
conclusion false. Consider the following argument:
Premise: If you earned an A on the final paper, then you earn an A in the class.
Premise: You earn an A in the class.
Conclusion: You earned an A on the final paper.
It is possible for the premises to be true, but the conclusion false, if there are other ways to get an
A in the class, and you achieved one of them. For example, it may also be a policy in the class
that if your homework average is an A, then you earn an A in the class. In this case you can earn
an A in the class without earning an A on the final paper.
On the other hand, the following argument is valid:
Premise: You earned an A on the final paper.
3
Conclusion: You earn an A in the class.
Here, if the premises are true, then the conclusion must be true.

*It is important to note that, given our definitions, a statement cannot be valid or invalid, and an
argument can neither be true nor false.

Soundness: A sound argument is a valid argument in which all the premises are actually true in
our world. This means that any argument that is either invalid, or valid with at least one false
premise, is unsound. Consider this example again:
Premise: If grass is blue, then trees are blue.
Premise: The grass is blue.
Conclusion: Trees are blue.
This is a valid but unsound argument because at least one of the premises is not actually true.
Consider again the invalid argument from above:
This argument is unsound because it is invalid, regardless of whether the premises are actually
true.

Formal Fallacy: An invalid argument may be a bad argument. Consider again the following
argument:
This is considered to be a bad argument because it exemplifies the formal fallacy of Affirming
the Consequent. No consider a similar argument:
Premise: You did not earn an A on the final paper.
Conclusion: You did not earn an A in the class.
This is also an invalid argument, because it is possible that the premises could be true while the
conclusion false. Recall the example above: it may also be a policy in the class that if your
homework average is an A, then you earn an A in the class. In this case you can earn an A in the
class without earning an A on the final paper. This is also considered to be a bad argument
because it exemplifies the formal fallacy is Denying the Antecedent.

Strength: An invalid argument may be a good argument. Consider the following argument:
Premise: 90% of Americans are afraid of snakes.
Premise: Jane is an American.
Conclusion: Jane is afraid of snakes.
This argument is invalid because it is certainly possible that Jane is part of the 10% of Americans
who are not afraid of snakes. Thus, it is possible for the premises to be true and the conclusion to
be false. However, it is unlikely that Jane is a part of the 10% rather than the 90%, so it is
unlikely that the conclusion would be false if the premises are true.
4
A strong argument, then, is an invalid argument in which is likely that the conclusion is
true, given that the premises are true. Unlike validity, strength can come in degrees. Consider a
similar argument:
Premise: 99% of Americans are afraid of snakes.
Conclusion: Jane is afraid of snakes.
Here, it is even more likely that the conclusion is true given that the premises are true. And since
it is more likely, we say that this argument is stronger than the first, although they are both
considered strong.

Conversely, a weak argument is an invalid argument in which it is not likely that the conclusion
is true, given the truth of the premises. Consider the following argument:
Premise: 30% of Americans speak French.
Conclusion: Jane speaks French.
While it is possible that Jane is part of the 30% of Americans who speak French, it is more likely
that she is part of the 70% who do not. Thus, this is a weak argument.

Strong arguments are not always those with premises that assert percentages. Many (if not all)
scientific laws are actually the conclusions of strong arguments, the premises of which are
assertions about what we have experienced so far, and the conclusions of which are assertions
about what we will continue to experience in the future. These are often referred to as arguments
by induction.

In addition, many arguments by analogy are strong arguments. This type of argument is one we
come across quite frequently in philosophy. The premises generally are (1) that two situations
are analogous (or alike in some important respects), and (2) that certain things are true of one
situation. The conclusion is then that those same things will be true of the second situation. The
strength of arguments by analogy depend on how good the analogy is for the purposes of the
argument.

Cogency: Just as we can evaluate valid arguments in terms of the actual truth or falsity of their
premises, we can evaluate invalid arguments. A cogent argument is a strong argument in which
all the premises are actually true in our world. This means that any argument that is either weak,
or strong with at least one false premise, is uncogent. Consider the following argument:
Premise: All swans observed so far are white.
Conclusion: All swans are white.
This is quite a strong argument, but, unfortunately, black swans have now been observed in
Australia. Thus the premise is false, and the argument is uncogent.

Informal Fallacy: Not all valid arguments are good arguments. Those with clearly false
premises are bad, of course. However, there are some particular arguments that seem to have a
valid form and possibly true premises, but, upon examination of the content of the argument,
clearly exemplify errors is reasoning.

5
Straw Man: This is an argument attacking another persons argument. Often, when debating a
point with another, it is wise to reconstruct the other persons argument in your own words. This
ensures that you understand the argument, and can help you evaluate it. We often do this in class
with the readings for the day; after all you cannot criticize someones argument until you
understand exactly what it is. The straw man fallacy occurs when you misrepresent the other
persons argument, or a statement made by the other person, in a way that makes it much easier
to criticize his or her argument.

Argument against the person (Ad hominem fallacy): This is also an argument against the
conclusion of another persons argument. The conclusion of the argument is that the other person
is wrong, but the premises of the argument, instead of giving good reasons not to believe the
claim itself, attack the other persons character, intelligence, or motivation. In general statements
about a person have nothing to do with whether that persons argument is valid, sound, strong,
etc.

Appeal to the People: This is an argument in which the basis for the conclusion is the desire or
need to be accepted or valued by a larger group. Unless there are specific reasons given to the
contrary (as in scientific inquiry, for example), the fact that some group of people believe a
statement (or supposedly believe a statement), is not an indication that the statement is true.

Appeal to Ignorance: This is an argument in which the basis for believing the truth of a statement
is that its falsity has not been proven. Unfortunately, the converse argument seems just as
persuasive: believing a statement is false merely because its truth has not been proven. Again,
unless there are specific reasons given to the contrary, lack of proof for (or against) a statement
does not constitute proof that the statement is false (or true).

Equivocation: This fallacy is made when an argument has a valid form, but in fact two or more
meanings of a single word or phrase are used when validity depends on using just one meaning.

Begging the Question: This fallacy is also made when an argument has a valid form, but in fact
the conclusion is merely a restatement of one or all of the premises. That is, begging the question
occurs when one is assuming (using as a premise) that which is to be proven (the conclusion).

False Dilemma: This is another fallacy made when the argument has a valid form, but one of the
premises is unfairly narrowing the possibilities to be considered. It usually occurs when someone
asserts a dilemma (either A or B, with the implicit assumption that nothing other than A or B
could be the case), but the fallacy can also occur for an assertion of three, four or more choices
(as in: either A, B, or C, with the implicit assumption that nothing other than A, B, or C could be
the case).

Appeal to Unreliable Authority: Generally, an appeal to authority can make a very strong (in the
technical sense) argument for a particular conclusion. However, this is only true when the
authority appealed to is actually a reliable source of information about the subject. This fallacy
occurs when the source cited as a reason to believe a claim is either (1) not actually a reliable
authority, (2) a reliable authority on some subjects, but not the one in question, or (3) is asserting
a claim that is quite controversial even among reliable authorities in the field.
6

False Cause: This fallacy occurs when the claim is made that one possible cause of a
phenomenon is a (or the) cause of another phenomenon without evidence excluding other
possible causes. This fallacy comes in two common forms. The first is post hoc ergo propter hoc
(after this, therefore because of this). This fallacy occurs when A is asserted to be the cause of
B solely because A occurred before B. The second form is the slippery slope. This fallacy occurs
when A is asserted to inevitably cause B, through some sort of chain reaction, despite lack of
evidence that each link in the chain will inevitably lead to the next link.

II. Structure

The structure of you paper is very important. I should be able to follow your line of reasoning
from the first sentence to the last without getting lost, confused, or sidetracked. Wonderful ideas
arent worth much if it is not clear how they all fit together.

Argument. In its most basic form, your paper should be an extended argument, the conclusion of
which is a statement you are asserting to be true. An extended argument is one that incorporates
one or more sub-argument into the main argument. For example, your main argument may have
four premises which lead to your conclusion. If, however, your premises are slightly
controversial, not obviously true, or not common knowledge, you should provide sub-arguments
to support your premises. Moreover, the premises which support these sub-conclusions may need
arguments of their own. Thus, an extended argument is a series of embedded arguments,
ultimately leading to your main conclusion.

Thesis. The thesis is a statement that is the conclusion of the main argument of your paper.
Every paper should have a thesis. Your thesis should be clearly stated at the beginning of your
paper in the introduction.

Introduction: The introduction to your paper has at least two purposes. First, you need to tell
your reader what your conclusion will be. This is your thesis statement. Second, you need to tell
your reader roughly how you will get to this conclusion. This is not a detailed outline of your
paper, but rather a concise summary of the steps in argument.

Body of the paper: The body of the paper comprises the argument for your thesis. Each
paragraph should be a step in the process of supporting your final claim. This process has several
parts:
(1) A demonstration of how the truth of a set of premises leads makes the truth of the
conclusion (or sub-conclusion) either necessary or probable. This step may be skipped if
the argument or sub-argument follows a known valid form.
(2) Assertions of factual claims, obtained from a variety of sources, that are used as premises
is an argument or sub-argument.
(3) Considerations of counter-arguments. This follows the recognition that there may be
people who would disagree with you about either (1) or (2). You should consider the
objections they might raise, and respond.
7

Conclusion. Contrary to what you may have learned in High School, your conclusion should not
be a restatement of your introduction. Rather, the conclusion is a handy place to tie up loose
ends. Here is where, for example, you can consider objections to your argument to which you do
not have the space or expertise to respond. This is also a good place to briefly consider the
implications of the acceptance of your conclusion. Or this may be a good place to explain what
further work may need to be done in this area.

III. Content

The content of your paper will mostly be dependent on the structure, for the content is the filling-
in of the outline dictated by the structure. However, the particular kinds of arguments and
evidence you use will depend on the topic you are considering, and the particular claims you are
making.

Arguments. While all philosophy papers should advance an argument, there are many different
types of arguments that one can make. Thus, writing in philosophy can take many forms:
Advancing your own original argument
Exploring the consequences of a particular hypothesis or position
Reconstructing someone elses position, theory, or argument
Evaluating someone elses argument
Some combination of the above

Advancing your own original argument is the basic task of providing a set of premises and
demonstrating how these premises lead to your conclusion. The premises can be asserted on the
basis of a number of different kinds of sources that constitute evidence that they are true.

Exploring the consequences of a particular hypothesis or position is the task of assuming that the
hypothesis, or the set of statements comprising the position, is true, and showing what
conclusions can be drawn from them, either alone, or in addition to some other set of premises
you assert.

Reconstructing someone elses position, theory, or argument is the task of putting the other
persons position, theory, or argument into your own words with as little excess verbiage as
possible. In so doing, you should always abide by the principles of fairness and charity. The
principle of fairness says that you should always paraphrase someone else in a way that is as
close to his or her intentions as possible. This prevents the possibility of committing the Straw
Man fallacy. The principle of charity says that, when you may be confused about the authors
intentions, you should always interpret him or her in the best possible light. This means, for an
argument for example, interpreting the premises and conclusion in such a way as to make the
argument valid instead of invalid, or strong instead of weak. This kind of reconstruction is itself
a kind of argument because you will need textual support to provide the evidence that your
reconstruction is as fair as possible.

8
Evaluating someone elses argument consists, generally, in showing that his or her argument is
good or bad. The goodness of an argument may depend on many things, but as a rule, sound
arguments and cogent arguments are considered good, while valid but unsound, strong but
uncogent, and weak arguments are all considered bad. In any case, you need to provide your own
argument to show that the other persons argument fits some established criteria of what
constitutes a good or bad argument. In general, if you disagree with the conclusion of someone
elses argument, you have two options for evaluating it negatively: you can either show that the
argument form is bad (i.e. show that it is a weak argument), or you can show that the premises
are false (or you an do both).

Evidence. One thing that differentiates philosophy papers from other kinds of papers (English
papers, scientific papers) is the wide variety of the kinds of evidence that are routinely used and
accepted to support various different kinds of claims:
Derivations from first philosophical principles (e.g., I think, therefore I am)
Scientific or factual claims
Logical principles
Appropriate personal experience
Linguistic analysis (definitions, colloquial uses of terms and phrases)
Expert testimony
Textual evidence

The kinds of evidence you use to support your thesis may vary according to the assignment and
the topic, but some basic guidelines should be followed.

Relevance. The evidence you use should be directly and obviously related to the claim it is
supposed to support. If it is either indirectly or not obviously related, you need to provide
additional arguments why this evidence does indeed support your conclusion. We have already
seen some fallacies you may commit if you use irrelevant evidence, like Ad hominem, appeal to
the people, or appeal to ignorance.

Reasonableness. The evidence you use to support your claim should be such that any reasonable
person would accept it, based not on the information you have, but on the information you have
given in your paper. This means that, in general, if your evidence is controversial or not
obviously true given the information you have offered, you will have to provide additional
reasons to accept it.

Sufficiency. The evidence you use to support your claim must be sufficient to support it. That is,
you must have enough relevant, reasonable evidence to persuade your reader. This means that,
for example, you may not make sweeping generalizations about what all philosophers have been
concerned about throughout the course of history if you have only read a handful of philosophy
texts.

Definitions. As I said above, philosophers are generally very concerned with definitions, and
especially concerned with being clear and consistent. This means that if you are going to use or
consider important philosophical terms or concepts, you need to define them as clearly as
9
possible. However, a conventional dictionary is usually not up to the task of providing adequate
definitions of technical philosophical terms. There are many reasons for this:
(1) A conventional dictionary is usually concerned with giving colloquial uses of words, not
technical definitions that may vary by field or author (consider the difference between the
colloquial use of the word force, and the technical definition this word has in physics).
(2) Conventional dictionaries do not generally contain philosophical phrases as single
entries, and the meaning of a phrase often cannot be captured by a conjunction of the
meanings of the individual words.
(3) Philosophers writing in different settings may use words and phrases in slightly different
ways. In addition, philosophers often spend good portions of articles, or entire articles, on
determining the precise definition of a term or phrase, exactly because there may be no
generally accepted definition of it. In this case, a dictionary will be of no use at all.

Citations: Different sources may be used for different purposes, but all should be documented
clearly and correctly. Many of the kinds of evidence you will use to support a claim involve
sources that you will need to document.
Scientific or factual claims unless you are stating a fact that is obviously common
knowledge and generally uncontroversial (e.g. the earth is round), you will need to cite
the sources of your scientific and/or factual claims (like how many people visit the
Pittsburgh Zoo every year, or what a Pandas natural habitat is).
Expert testimony if you are going to state that an expert in the field has made some
claim, you need to cite the source in which the claim was advanced, be it a scholarly
article, a newspaper story, or personal communication.
Textual evidence in making a case for a particular interpretation of an authors words or
reconstruction of an authors argument, you will need to cite textual evidence to show
that you are following the principles of fairness and charity.

10
IV. General Miscellaneous Advice

The thesis should something sufficiently interesting, about which reasonable people could
differ, and which can be defended. In this paper, I will show that two plus two is four is
obviously true and not worth writing about. In this paper, I will prove that Colorado
Springs has a tropical climate is impossible to prove. Dont set your sights either too high
or too low.
Not everything you assume to be true is actually self-evidently true. A lot of what happens in
philosophy is learning about ones own assumptions and how to defend those, or to get rid of
the ones that are indefensible. You might think its impossible for anyone to disagree with
the statement that There are no true liberals left in American politics; even Clinton and Gore
are basically Republicans, but not everyone will be like-minded.
If, halfway through, you cant figure out what the sentence youre writing has to do with the
thesis, stop and try to figure out how it links. If it doesnt at all, it probably doesnt need to
be there. If it does but you need to explain why, add the explanation.
Transitions are important; they lead the reader through your argument. Transitional words,
though, should only be used for actual transitions. Obvious problems are using words like
therefore and thus followed by a statement which has nothing to do with what came
before it.
Texts have to be interpreted, which means you need to quote passages to support your
interpretation. The following are not appropriate uses of text as they stand. (Nor are the
appropriate citations.)
o The Bible says be nice to people.
o Kierkegaard says that Abraham is important (Kierkegaard).
o Nietzsche says Kierkegaard is tiresome (Nietzsche, 20-150).
o Marx loves capitalism. As he says, it involves amazing wealth (Marx, 241).
o Platos dialogue is about love (class notes).
o Plato believes in a theory of forms (Plato Introduction, 2).
o Descartes believes in dualism (www.randomhack.com/descartes/bs.html).
Evidence for a claim can be of varying types. A lot of the work in evaluating arguments rests
in deciding how to evaluate evidence.
o A reference to an authority is not necessarily the best evidence. One must consider he
authoritys credentials, and whether there is reasonable disagreement among experts.
Arguments from authority are not valid, but they can be good arguments nonetheless.
o Ad hominems are logical fallacies: Marx said it, and hes stupid, therefore its false
is not a good argument. There are many other logical fallacies you should avoid as
well; you should become familiar with them.
o I think is not evidence in itself. I think it is going to rain tomorrow is not a better
reason to bring out an umbrella than it is going to rain tomorrow. Generally, since it
is your paper, I will assume that statements you assert are things you think are true.
Avoid useless generalities: In todays modern complex society, throughout time people
have struggled with political questions, etc. They waste space and make you sound like
someone who doesnt have enough to say. Youre also probably not qualified to say things
like philosophers have always struggled with the question of How many philosophers
have you read?
11
Avoid making absolutist, overly general claims. Any argument whose conclusion is
therefore Plato is totally wrong is probably a bad argument. Remember that most issues
worth talking about are not simple.
Most issues worth talking about also arent cut and dried: the other side probably does
have something worth taking into consideration. The best reasoning is that which
understands and anticipates opposing arguments, and takes them into account.
With respect to vocabulary, there are a number of points:
o Keep your writing simple, but not simplistic. Only use big words and jargon if you
must; otherwise use simple, easy-to-understand language.
o Dont use a variety of different words to refer to the same idea or problem just because
you dont want to sound repetitive. You should worry more about precision and less
about beauty.
o Philosophers often give words precise, technical meanings that are not used in that
precise or technical way in ordinary conversation. You should know if a word is being
used in this way in anything you read; and you should only use these words in your
writing if you fully understand them and can use them correctly.
o In addition, you may want to give a term a precise meaning that is different from its
colloquial meaning in your paper. It is fine to do this as long as you carefully explain
the definition youre using, and use the term consistently with this definition
throughout your paper.
o Some philosophers also have a habit of giving the same technical word a slightly
different meaning than others do. Thus, while there is no need to explain terms like
valid or necessary in your paper, you do need to explain what you mean by
physicalism or sense data.
Always follow the guidelines for quotations and citations.
A coherent argument is difficult to sustain over many pages. I recommend having someone
else read your paper for clarity, cohesiveness, etc., before you hand it in.
Proofreading your paper is not the same as running spell-check. Spell-check wont catch
common mistakes like:
o Using their for there or vice versa
o Using its for its or vice versa
o Writing based off when the appropriate clause is based on
o Writing centered around when the appropriate clause is centered on
o Mismatching the subject of the sentence with the possessives: One should do their
own work should be One should do his or her own work.
o Misspelling an authors name
Do not use rhetorical questions or slang in your writing. They diminish the clarity of your
paper by making the reader guess at your meaning.

Using Argument Diagrams 1
Using Argument Diagrams to Improve
Critical Thinking Skills
in Introductory Philosophy

Maralee Harrell

Carnegie Mellon University
Department of Philosophy
135 Baker Hall
Pittsburgh, PA, 15213
mharrell@cmu.edu
(412) 268-8152

Abstract
In an experiment involving 139 students in an introductory philosophy course we tested whether
students were improving their ability to think critically about arguments and whether using
argument diagramming as an analysis aid contributed to this improvement. We determined that
the students did develop this skill over the course of the semester. We also determined that the
students in one section of the course gained significantly more than the students in the other
sections, and that this was due almost entirely to their ability to use argument diagrams. We
conclude that learning how to construct argument diagrams significantly improves a students
ability to analyze, comprehend, and evaluate arguments.

Using Argument Diagrams to Improve Critical Thinking Skills
in Introductory Philosophy

In the introductory philosophy class at Carnegie Mellon University (80-100 What Philosophy Is)
one important learning goal is the development of general critical thinking skills. Even though
there are a few generally accepted measures of these skills (e.g. the California Critical Thinking
Skills Test and the Watson Glaser Critical Thinking Appraisal, but see also Halpern, 1989 and
Paul, Binker, Jensen, & Kreklau, 1990), there is surprisingly little research on the sophistication
of, or on effective methods for improving, the critical thinking skills of college students. The
research that has been done shows that the population in general has very poor skills (Perkins,
Allen, & Hafner, 1983; Kuhn, 1991; Means & Voss, 1996), and that very few courses actually
improvement these skills (Annis & Annis, 1979; Pascarella, 1989; Stenning, Cox, & Oberlander,
1995).
Critical thinking involves the ability to analyze, understand, and evaluate an argument.
Our first hypothesis is that students improved on these tasks after taking the introductory
philosophy course. However, we wanted to determine not only whether they improved, but how
much improvement could be attributed to alternative teaching methods.
One candidate method is the use of argument diagrams as an aid to overall argument
comprehension, since we believe that they significantly facilitate understanding, analysis, and
evaluation. An argument is a series of statements in which one is the conclusion, and the others
are premises supporting this conclusion; and an argument diagram is a visual representation of
these statements and the inferential connections between them.
For example, at the end of Meno, Plato (1976) argues through the character of Socrates
that virtue is a gift from the gods (89d-100b). While the English translations of Platos works are
among the more readable philosophical texts, it is still the case not only that the text contains
many more sentences than just the propositions that are part of the argument, but also that,
proceeding necessarily linearly, the prose obscures the inferential structure of the argument. Thus
anyone who wishes to understand and evaluate the argument may reasonably be confused. If, on
the other hand, we are able to extract just the statements Plato uses to support his conclusion, and
visually represent the connections between these statements (as shown in Figure 1), the structure
of the argument is immediately clear, as are the places where we may critique or applaud it.
Recent interest in argument visualization (particularly computer-supported argument
visualization) has shown that the use of software programs specifically designed to help students
construct argument diagrams can significantly improve students critical thinking abilities over
the course of a semester-long college-level course (Kirschner, et al. 2003; Twardy, 2004; van
Gelder, 2001, 2003). But, of course, one need not have computer software to construct an
argument diagram; one needs only a pencil and paper. However, to our knowledge there has
been no research done to determine whether it is the mere ability to construct argument
diagrams, or the aid of a computer platform and tutor (or possibly both) that is the crucial factor.
Our second hypothesis is that the crucial factor in the improvement of critical thinking
skills is the ability to construct argument diagrams. This hypothesis posits that students who
construct correct diagrams during argument analysis tasks should perform better on these tasks
than students who do not.


FIGURE 1 An argument diagram representing one of the arguments in Platos Meno.

We typically teach several sections of Carnegie Mellon Universitys introduction to
philosophy course (80-100 What Philosophy Is) each semester, with a different instructor for
each section. While the general curriculum of the course is set, each instructor is given a great
deal of freedom in executing this curriculum. For example, each section is a topics based course
in which epistemology, metaphysics, and ethics are introduced with both historical and
contemporary primary-source readings. Each instructor, however, chooses a text, the order of the
topics, and the assignments for his or her section. The students who take this course are a mix of
classes and majors from all over the University.
In the Spring of 2004, students in Section 1 were explicitly taught how to construct
argument diagrams to represent a selection of text. In contrast, students in Sections 2, 3, and 4
were not explicitly taught the use of argument diagrams, but ratherif they were taught to
analyze arguments at allwere taught to use more traditional kinds of representations (e.g. lists
of statements).
In this study, we test the first hypothesis by comparing the pretest and posttest scores of
all the students in 80-100 in the Spring semester of 2004. We test the second hypothesis in three
ways: (1) by comparing the pretest and posttest scores of students in Section 1 to students in
Sections 2, 3, and 4, (2) by comparing the pretest and posttest scores of students who constructed
correct argument diagrams on the posttest to those students who did not, and (3) by comparing
total scores on individual questions on the posttest of students who constructed the correct
argument diagrams for that question to those students who did not.

Virtue is not knowledge
Virtue is a true belief
Virtue is either knowledge or true belief
True belief is a gift from the gods
Virtue cannot be
taught
Something can
be taught if and
only if it is
knowledge
Virtue guides
correct actions
Virtue is a gift from the gods
Only knowledge
and true belief
guide correct
actions
There are no
teachers of
virtue
Something can be
taught if and only if it
has teachers
Method
Participants
139 students (46 women, 93 men) in each of 4 sections of introductory philosophy (80-100 What
Philosophy Is) at Carnegie Mellon University in the Spring of 2004 were studied. Each section of
the course had a different instructor and teaching assistant, and the students chose their section.
There were 35 students (13 women, 22 men) in Section 1, 37 students (18 women, 19 men) in
Section 2, 32 students (10 women, 22 men) in Section 3, and 35 students (5 women, 30 men) in
Section 4. The students in Section 1 were taught the use of argument diagrams to analyze the
arguments in the course reading, while the students in the other three sections were taught more
traditional methods of analyzing arguments.

Materials
Prior to the semester, the four instructors of 80-100 in the Spring of 2004 met to determine the
learning goals of this course, and designed an exam to test the students on relevant skills. The
identified skills were to be able to, when reading an argument, (i) identify the conclusion and the
premises; (ii) determine how the premises are supposed to support the conclusion; and (iii)
evaluate the argument based on the truth of the premises and how well they support the
conclusion.
We used this exam as the pretest (given in Appendix A) and created a companion
posttest (given in Appendix B). For each question on the pretest, there was a structurally
(nearly) identical question with different content on the posttest. The tests each consisted of 6
questions, each of which asked the student to analyze a short argument. In questions 1 and 2, the
student was only asked to state the conclusion (thesis) of the argument. Questions 3-6 each had
five parts: (a) state the conclusion (thesis) of the argument; (b) state the premises (reasons) of the
argument; (c) indicate (via multiple choice) how the premises are related; (d) provide a visual,
graphical, schematic, or outlined representation of the argument; and (e) decide whether the
argument is good or bad, and explain this decision.

Procedure
Each of the four sections of 80-100 was a Monday/Wednesday/Friday class. The pretest was
given to all students during the second day of class. The students in sections 2 and 3 were given
the posttest on the last day of classes, while the students in sections 1 and 4 were given the
posttest as one part of their final exam, during exam week.

Results and Discussion
Test Coding
Pre- and posttests were paired by studentsingle-test students were excluded from the sample
so that there were 139 pairs of tests in the study. Tests which did not have pairs were used for
coder-calibration, prior to the coding of the 139 pairs of tests.
Two graduate students independently coded all 278 tests (139 pairs). Each pre-/posttest
pair was assigned a unique ID, and the original tests were photocopied (twice, one for each
coder) with the identifying information replaced by the ID. We had an initial grader-calibration
session in which the author and the two coders coded several of the unpaired tests, discussed our
codes, and came to a consensus about each code. After this, each coder was given the two keys
(one for the pretest and one for the posttest) and the tests to be coded in a unique random order.
The codes assigned to each question (or part of a question, except for part (d)) were
binary: a code of 1 for a correct answer, and a code of 0 for an incorrect answer. Part (e) of each
question was assigned a code of correct if the student gave as reasons claims about the truth of
the premises and/or the support of premises for the conclusion. For part (d) of each question,
answers were coded according to the type of representation used: Correct argument diagram,
Incorrect or incomplete argument diagram, List, Translated into logical symbols like a proof,
Venn diagram, Concept map, Schematic (e.g., P1 + P2/Conclusion (C)), Other or blank.
To determine inter-coder reliability, the Percentage Agreement (PA), Cohens Kappa ()
and Krippendorffs Alpha () were calculated for each test (given in Table 1).

TABLE 1
Inter-coder Reliability: Percentage Agreement (PA),
Cohens Kappa (), and Krippendorffs Alpha () for each test
PA
Pretest .85 .68 .68
Posttest .85 .55 .54

The inter-coder reliability was fairly good, however, upon closer examination it was
determined that one coder had systematically higher standards than the other coder on the
questions in which the assignment was open to some interpretation (questions 1 & 2, and parts
(a), (b), and (e) of questions 3-6). Specifically, on the pretest, out of 385 question-parts on which
the coders differed, 292 (75%) were cases in which Coder 1 coded the answer as correct while
Coder 2 coded the answer as incorrect; and on the posttest, out of 371 question-parts on which
the coders differed, 333 (90%) were cases in which Coder 1 coded the answer as correct while
Coder 2 coded the answer as incorrect. In light of this, the codes from each coder on these
questions were averaged, allowing for a more nuanced scoring of each question than either coder
alone could give.
Since we were interested in how the use of argument diagramming aided the student in
answering each part of each question correctly, the code a student received for part (d) of
questions 3-6 were preliminarily set aside, while the addition of the codes received on questions
1 and 2, as well as parts (a), (b), (c), and (e) of questions 3-6 determined the raw score a student
received on the test.

TABLE 2
The variables and their descriptions recorded for each student
Variable Name Variable Description
Pre Fractional score on the pretest
Post Fractional score on the posttest
A* Averaged score (or code) on the pretest for question *
B* Averaged score (or code) on the posttest for question *
Section Enrolled section
Sex Students sex
Honors Enrollment in Honors course
Grade Final Grade in the course
Year Year in school

The primary variables of interest were the fractional pretest and posttest scores (the raw
score converted into a percentage), and the individual average scores for each question on the
pretest and the posttest. In addition, the following data was recorded for each student: which
section the student was enrolled in, the students final grade in the course, the students year in
school, the students sex, and whether the student had taken the concurrent honors course
associated with the introductory course. Table 2 gives summary descriptions of these variables.

Average Gain from Pretest to Posttest for All Students
The first hypothesis was that the students critical thinking skills improved over the course of the
semester. This hypothesis was tested by determining whether the average gain of the students
from pretest to posttest was significantly positive. The straight gain, however, may not be fully
informative if many students had fractional scores of close to 1 on the pretest. Thus, the
hypothesis was also tested by determining the standardized gain: each students gain as a fraction
of what that student could have possibly gained. The mean scores on the pretest and the posttest,
as well as the mean gain and standardized gain for the whole population of students is given in
Table 3.

TABLE 3
Mean fractional score (standard deviation) for the pretest and the posttest,
mean gain (standard deviation), and mean standardized gain (standard deviation)
Pre Post Gain GainSt.
Whole
Population
0.59 (0.14) 0.78 (0.12) 0.19 (0.01) 0.43 (0.03)

The difference in the means of the pretest and posttest scores was significant (paired t-
test; p < .001). In addition, the mean gain was significantly different from zero (1-sample t-test;
p < .001) and the mean standardized gain was significantly different from zero (1-sample t-test; p
< .001). From these results we can see that our first hypothesis is confirmed: overall the students
did have significant gains and standardized gains from pretest to posttest.

Comparison of Gains of Students by Section and by Argument Diagram Use
Our second hypothesis was that the students who were able to construct correct argument
diagrams would gain the most from pretest to posttest. Since the use of argument diagrams was
only explicitly taught in Section 1, we first tested this hypothesis by determining whether the
average gain of the students in Section 1 was significantly different from the average gain of the
students in each of the other sections. Again, though, the straight gain may not be fully
informative if the mean on the pretest was not the same for each section, and if many students
had fractional scores close to 1 on the pretest. Thus, we also tested this hypothesis using the
standardized gain. The mean scores on the pretest and the posttest, as well as the mean gain and
standardized gain for the sub-populations of students in each section is given in Table 4.

TABLE 4
Section 1 0.64 (0.14) 0.85 (0.10) 0.21 (0.02) 0.51 (0.07)
Section 2 0.53 (0.16) 0.70 (0.14) 0.17 (0.03) 0.32 (0.05)
Section 3 0.58 (0.14) 0.79 (0.08) 0.21 (0.02) 0.48 (0.04)
Section 4 0.63 (0.10) 0.80 (0.09) 0.17 (0.02) 0.42 (0.05)

Since there was such variability in the scores on the pretest among the different sections,
we ran an ANCOVA on the each of the variables Post, Gain, and GainSt, with the variable Pre
used as the covariate. This analysis indicates that the differences in the pretest scores was
significant for predicting the posttest scores (df = 1, F = 24.36, p < .001), the gain (df = 1, F =
125.50, p < .001), and the standardized gain (df = 1, F = 29.14, p < .001). In addition, this
analysis indicates that, even accounting for differences in pretest score, the differences in the
posttest scores among the sections were significant (df = 3, F = 8.71, p < .001), as were the
differences in the gains (df = 3, F = 8.71, p < .001) and the standardized gains (df = 3, F = 6.84, p
< .001).
This analysis shows that a students section is a significant predictor of posttest score,
gain, and standardized gain, but it does not tell us how they are different. The hypothesis is that
the posttest score, gain and standardized gain for students in Section 1 is significantly higher than
all the other sections. Thus, we did a planned comparison of the variables Post, Gain, and GainSt
for Section 1 with the other sections combined, again using the variable Pre as a covariate. This
analysis again indicates that the differences in the pretest scores was significant for predicting the
posttest scores (df = 1, F = 32.28, p < .001), the gain (df = 1, F = 107.37, p < .001), and the
standardized gain (df = 1, F = 21.42, p < .001). In addition, this analysis indicates that, even
accounting for differences in pretest score, the differences in the posttest scores between Section
1 and the other sections were significant (df = 1, F =11.89, p = .001), as were the differences in
the gains (df = 1, F = 11.89, p = .001) and the standardized gains (df = 1, F = 8.07, p = .005),
with the average posttest score, gain, and standardized gain being higher in Section 1 than in the
other three sections.
Although these differences between sections (at least with standardized gain scores)
obtained, they do not provide a direct test of whether students who (regardless of section)
constructed correct argument diagrams have better skills. The explanation is that, although the
students in Section 1 were the only students to be explicitly taught how to construct argument
diagrams, a substantial number of students from other sections constructed correct argument
diagrams on their posttests. In addition, a substantial number of the students in Section 1
constructed incorrect argument diagrams on their posttests. Thus, to test whether it was actually
the construction of these diagrams that contributed to the higher scores of the students in Section
1, or whether is was the other teaching methods of the instructor for Section 1, we introduced a
new variable into our model.
Recall that the type of answer given on part (d) of questions 3-6 was the data recorded
from the test. From this data, a new variable was defined that indicates how many correct
argument diagrams a student had constructed on the posttest. This variable is PostAD (value = 0,
1, 2, 3, 4).
The second hypothesis implies that the number of correct argument diagrams a student
constructed on the posttest was correlated to the students pretest score, posttest score, gain and
standardized gain. Since there were very few students who constructed exactly 2 correct
argument diagrams on the posttest, and still fewer who constructed exactly 4, we grouped the
students by whether they had constructed No correct argument diagrams (PostAD = 0), Few
correct argument diagrams (PostAD = 1 or 2), or Many correct argument diagrams (PostAD = 3
or 4) on the posttest. The results are given in Table 5.


TABLE 5
No Correct 0.56 (0.16) 0.74 (0.12) 0.18 (0.02) 0.39 (0.03)
Few Correct 0.57 (0.13) 0.75 (0.12) 0.17 (0.02) 0.37 (0.04)
Many Correct 0.66 (0.13) 0.88 (0.06) 0.22 (0.02) 0.56 (0.06)

Again, since there was such variability in the scores on the pretest among the different
sections, we ran an ANCOVA on the each of the variables Post, Gain, and GainSt, with the
variable Pre used as the covariate. This analysis indicates that the differences in the pretest scores
was significant for predicting the posttest scores (df = 1, F = 24.68, p < .001), the gain (df = 1, F
= 132.81, p < .001), and the standardized gain (df = 1, F = 30.97, p < .001). This analysis also
indicates that, even accounting for differences in pretest score, the differences among the
students who constructed 0, Few or Many correct argument diagrams on the posttest are
significant (df = 2, F = 14.66, p < .001), as are the differences in gains (df = 2, F = 14.66, p <
.001), and standardized gains (df = 2, F = 11.78, p < .001).
This analysis shows that a whether a student constructed no, few or many correct
argument diagrams is a significant predictor of posttest score, gain, and standardized gain, but it
does not tell us how they are different. The hypothesis is that the posttest score, gain and
standardized gain for students who constructed many diagrams is significantly different from
both of the other groups. Thus, we did a planned comparison of the variables Post, Gain, and
GainSt for the group of Many Correct with the other two groups combined, again using the
variable Pre as a covariate. This analysis again indicates that the differences in the pretest scores
was significant for predicting the posttest scores (df = 1, F = 23.67, p < .001), the gain (df = 1, F
= 132.00, p < .001), and the standardized gain (df = 1, F = 31.29, p < .001). In addition, this
analysis indicates that, even accounting for differences in pretest score, the differences in the
posttest scores between students who constructed many correct argument diagram and the other
groups were significant (df = 1, F =28.13, p < .001), as were the differences in the gains (df = 1,
F = 28.13, p < .001) and the standardized gains (df = 1, F = 22.27, p < .001), with the average
posttest score, gain, and standardized gain being higher for those who constructed many correct
argument diagrams than for those who did not.
These results show that the students who mastered the use of argument diagramsthose
who constructed 3 or 4 correct argument diagramshad the highest posttest scores and gained
the most as a fraction of the gain that was possible. Interestingly, those students who constructed
few correct argument diagrams were roughly equal on all measures to those who constructed no
correct argument diagrams. This may be explained by the fact that nearly all (85%) of the
students who constructed few correct argument diagrams and all (100%) of the students who
constructed no correct argument diagrams were enrolled in the sections in which constructing
argument diagrams was not explicitly taught; thus the majority of the students who constructed
few correct argument diagrams may have done so by accident. This suggests some future work to
determine how much the mere ability to construct argument diagrams aids in critical thinking
skills compared to the ability to construct argument diagrams in addition to instruction on how to
read, interpret, and use argument diagrams.
Prediction of Score on Individual Questions
The hypothesis that students who constructed correct argument diagrams improved their critical
thinking skills the most was also tested on an even finer-grained scale by looking at the effect of
(a) constructing the correct argument diagram on a particular question on the posttest on (b) the
students ability to answer the other parts of that question correctly. The hypothesis posits that
the score a student received on each part of each question, as well as whether the student
answered all the parts of each question correctly is positively correlated with whether the student
constructed the correct argument diagram for that question.
To test this, a new set of variables were defined for each of the questions 3-6 that had
value 1 if the student constructed the correct argument diagram on part (d) of the question, and 0
if the student constructed an incorrect argument diagram, or no argument diagram at all. In
addition, another new set of variables was defined for each of questions 3-6 that had value 1 if
the student received codes of 1 for every part (a, b, c, and e), and 0 if the student did not. The
histograms showing the correlations between constructing the correct argument diagram and
answering correctly all parts of each questions are given in Figure 2.
Completely Correct Answer to Question Given
Presence/Absence of Correct Argument Diagram
0
0.2
0.4
0.6
0.8
1
Question 3 Question 4 Question 5 Question 6
Pr(Correct Answer| Argument Diagram Present) Pr(Correct Answer| Argument Diagram Absent)

FIGURE 2 Histograms comparing the frequency of students who answered all parts of each question
correctly given that they constructed the correct argument diagram for that question to the frequency of
students who answered all parts of each question correctly given that they did not construct the correct
argument diagram for that question.

We can see from the histograms that, on each question, those students who constructed
the correct argument diagram were more likelyin some cases considerably more likelyto
answer all the other parts of the question correctly than those who did not construct the correct
argument diagram. Thus, these results further confirm our hypothesis: students who learned to
construct argument diagrams were better able to answer questions that required particular critical
thinking abilities than those who did not.

Prediction of Posttest Score, Gain, and Standardized Gain
While the results of the above sections seem to confirm our hypothesis that students who
constructed correct argument diagrams improved their critical thinking skills more than those
who did not, it is possible that there are many causes besides gaining diagramming skills that
contributed to the students improvement. In particular, since the students in Section 1 were the
only ones explicitly taught the use of argument diagrams, and all of the students were able to
chose their section, it is possible that the use of argument diagrams was correlated with
instructors teaching ability, the students year in school, etc.
To test the hypothesis that constructing correct argument diagrams was the only factor in
improving students critical thinking skills, we first considered how well we could predict the
improvement based on the variables we had collected. We defined new variables for each section
(Section 1, Section 2, Section 3, Section 4) that each had value 1 if the student was enrolled in
that section, and 0 if the student was not. We performed three linear regressionsone for the
posttest fractional score, a second for the gain, and a third for the standardized gainusing the
pretest fractional score, Section 1, Section 2, Section 3, Sex, Honors, Grade, and Year as
regressors. (Section 4 was omitted for a baseline). The results of these regressions are shown in
Table 7.

TABLE 7
Prediction of posttest, gain, and standardized gain: coefficient (SE coefficient)
Post Gain GainSt
Constant 0.671 (0.052)*** 0.671 (0.052)*** 1.171 (0.149)***
Pre 0.265 (0.066)*** 0.735 (0.066)*** -1.044 (0.189)***
Section 1 0.065 (0.025)** 0.065 (0.025)** 0.133 (0.071)
Section 2 -0.075 (0.026)** -0.075 (0.026)** -0.210 (0.074)**
Section 3 0.015 (0.025) 0.015 (0.025) 0.024 (0.070)
Sex -0.016 (0.019) -0.016 (0.019) -0.039 (0.054)
Honors 0.016 (0.027) 0.016 (0.027) 0.025 (0.076)
Grade -0.022 (0.013) -0.022 (0.013) -0.057 (0.038)
Year -0.004 (0.009) -0.004 (0.009) -0.004 (0.025)

Note: *p < .05, **p < .01, ***p < .001

Next, we performed three more linear regressionsagain on the posttest fractional score,
the gain, and the standardized gainthis time using PostAD as a regressor, in addition to the
pretest fractional score, Section 1, Section 2, Section 3, Sex, Honors, Grade, Year. (Section 4
was again omitted for a baseline). The results are shown in Table 8.
These results show that in both cases a students pretest score was a highly significant
predictor of the posttest score, gain, and standardized gain. The coefficient of the pretest was
positive when predicting the posttest, as expected; if all the students scores generally improve
from the pretest to the posttest, we expect the students who scored higher on the pretest to score
higher on the posttest.

TABLE 8
Prediction of posttest, gain, and standardized gain: coefficient (SE coefficient)
Post Gain GainSt
Constant 0.672 (0.051)*** 0.672 (0.051)*** 1.174 (0.146)***
Pre 0.223 (0.066)*** -0.777 (0.066)*** -1.046 (0.189)***
Section 1 -0.013 (0.035) -0.013 (0.035) -0.058 (0.102)
Section 2 -0.082 (0.025)** -0.082 (0.025)** -0.228 (0.073)**
Section 3 -0.030 (0.028) -0.030 (0.028) -0.086 (0.081)
Sex -0.007 (0.019) -0.007 (0.019) -0.015 (0.054)
Honors 0.0004 (0.0264) 0.0004 (0.0264) -0.016 (0.076)
Grade -0.020 (0.013) -0.020 (0.013) -0.052 (0.037)
Year -0.0003 (0.0106) -0.0003 (0.0106) 0.004 (0.025)
PostAD 0.032 (0.011)** 0.032 (0.011)** 0.787 (0.031)*

Note: *p < .05, **p < .01, ***p < .001

We also see that in both cases the coefficient of the pretest was negative when predicting
gain and standardized gain. In fact, since the score on the pretest is a part of the value of the gain
and standardized gain, it is interesting that the coefficient for pretest was significant at all.
However, a regression run on a model that predicts gain and standardized gain based on all the
above variables except the pretest shows that none of the variables are significant. We believe
that this can be explained by the fact that scores on the pretest were not evenly distributed
throughout the sections, as we can see from Table 4. Thus, there seems to be a correlation
between which section a student enrolled in and his or her score on the pretest. So, a plausible
explanation for the negative coefficient when predicting gain is that the students who scored the
lowest on the pretest gained the mostand this is to be expected at least because there is more
room for them to improve. In addition, a plausible explanation for the negative coefficient when
predicting standardized gain is that, since the grade a student received on the posttest counted as
a part of his or her grade in the course, the students who scored the lowest on the pretest had
more incentive to improve, and thus, as a percentage of what they could have gained, gained
more than the students who scored highest on the pretest. Thus, since we are also concluding that
there is a correlation between the section the student enrolled in and the score on the posttest,
gain, and standardized gain (see below), there are many contributing factors to a students gain
the score on the pretest being onewhich may be roughly offset if all the relevant variables are
not examined.
These results also show that the variables Sex, Honors, Grade, and Year were not
significant in either case in predicting a students posttest score, gain, or standardized gain. In
addition, in both cases, the variable Section 3 was not significant as a predictor, which means
that the students in section 3 were not significantly different from the students in Section 4,
which was taken as the baseline.
In sum, ignoring the variables that were not significant in either table, the two regression
equations for each predicted variable can be represented as follows:

Posttest.
y = 0.671 + 0.265 Pre + 0.065 Section1 0.075 Section2
(0.052) (0.066) (0.025) (0.026)
p < .001 p < .001 p = .010 p = .005

y = 0.672 + 0.223 Pre 0.013 Section1 0.082 Section2 + 0.032 PostAD
(0.051) (0.066) (0.035) (0.025) (0.011)
p < .001 p = .001 p = .722 p = .002 p = .003

Gain.
y = 0.671 + 0.735 Pre + 0.065 Section1 0.075 Section2
(0.052) (0.066) (0.025) (0.026)
p < .001 p < .001 p = .010 p = .005

y = 0.672 0.777 Pre - 0.013 Section1 0.082 Section2 + 0.032 PostAD
(0.051) (0.066) (0.035) (0.025) (0.011)
p < .001 p < .001 p = .722 p = .002 p = .003

Standardized Gain.
y = 1.171 1.044 Pre + 0.133 Section1 0.210 Section2
(0.052) (0.066) (0.025) (0.026)
p < .001 p < .001 p = .062 p = .005

y = 1.174 1.146 Pre 0.058 Section1 0.228 Section2 + 0.079 PostAD
(0.051) (0.066) (0.102) (0.073) (0.031)
p < .001 p = .001 p = .573 p = .002 p = .012

Here we can clearly see that, before we introduced the variable PostAD, the coefficient
for Section 1 was significantly positive for predicting a students posttest score and gain, and
nearly significant (p = 0.062) for predicting a students standardized gain, while the coefficient
for Section 2 was significantly negative for predicting a students posttest score, gain and
standardized gain.
After we introduce the variable PostAD, however, the variable Section 1 is no longer
significant as a predictor; that is, when controlling for how many correct argument diagrams a
student constructed, the students in section 1 were not significantly different from the students in
sections 3 and 4. Interestingly, though, the coefficient for Section 2 was still significantly
negative for predicting a students posttest score, gain, and standardized gain, implying that even
controlling for how many correct argument diagrams a student constructed, the students in
section 2 did worse than students in the other sections. We do not currently have an explanation
for this result.
In addition to Section 1 no longer being a predictor, the coefficient for PostAD is
significantly positive for predicting a students posttest score, gain, and standardized gain. This
implies that in fact the only measured factor that contributed to a students gain from pretest to
posttest was his or her ability to construct correct argument diagrams on the posttest.
Thus the instructor for Section 1 was not a contributing factor to the posttest score, gain
or standardized gain. Rather, the only factor that does contribute, aside from pretest score, is
whether the student constructed correct argument diagrams on the posttest. In other words,
regardless of the instructor and the students personal history, the more correct argument
diagrams were constructed on the posttest, the more was gained from the pretest to the posttest.
A stronger version of our second hypothesis, then, is confirmed: constructing correct
argument diagrams not only positively contributes to the improvement of argument analysis, but
also overrides differences in instruction and personal history.

General Discussion
One set of skills we would like our students to acquire by the end of our introductory philosophy
class can be loosely labeled the ability to analyze an argument. This set of skills includes the
ability to read a selection of prose, determine which statement is the conclusion and which
statements are the premises, determine how the premises are supposed to support the conclusion,
and evaluate the argument based on the truth of the premises and the quality of their support.
One purpose of argument diagrams is to aid students in each of these tasks. An argument
diagram is a visualization of an argument that makes explicit which statement is the conclusion
and which statements are the premises, as well as the inferential connections between the
premises and the conclusion. Since an argument diagram contains only statements and inferential
connections, it is clear which are the premises and which is the conclusion and how they are
connected, and there is little ambiguity in deciding on what bases to evaluate the argument.
Since the scores on part (a) of each question were high on the pretest, and even higher on
the posttest, it seems that the students taking What Philosophy Is at Carnegie Mellon University
are already good at picking out the conclusion of an argument, even before taking this class. It
also seems as though these students in general are not as able, before taking this class, to pick out
the statements that served to support this conclusion, recognize how the statements were
providing this support, and decide whether the support is good.
While on average all of the students in each of the sections improved their abilities on
these tasks over the course of the semester, the most dramatic improvements were made by the
students who demonstrated their ability to construct argument diagrams. Constructing the correct
argument diagram was highly correlated in general with correctly picking out the premises,
deciding how these premises are related to each other and the conclusion, and choosing the
grounds on which to evaluate the argument.
It also seems that the access to a computer program that aids in the construction of an
argument diagram (e.g. Reason!Able, Argutect, Inspiration) may not be nearly as important as
the basic understanding of argument diagramming itself. The students who learned explicitly in
class how to construct argument diagrams were all in section 1; these students saw examples of
argument diagrams in class that were done by hand by the instructor, and they constructed
argument diagrams by hand for homework assignments. While it may the case that access to
specific computer software may enhance the ability to create argument diagrams, the results here
clearly show that such access is not necessary for improving some basic critical thinking skills.
Interestingly, an analysis of the individual questions on the pretest yielded qualitatively
similar results with respect to the value of being able to construct argument diagrams.
We conclude that taking Carnegie Mellon Universitys introductory philosophy course
helps students develop certain critical thinking skills. We also conclude that learning how to
construct argument diagrams significantly raises a students ability to analyze, comprehend, and
evaluate arguments.

Educational Importance
Many, if not most, undergraduate students never take a critical thinking course in their time in
college. There may be several reasons for this: the classes are too hard to get into, the classes are
not required, the classes do not exist, etc. It is difficult to understand, though, why any of these
would be the case since the development of critical thinking skills are a part of the educational
objectives of most universities and colleges, and since the possession of these skills is one of the
most sought-after qualities in a job candidate in many fields.
Perhaps, though, both the colleges and employers believe that the ability to reason well is
the kind of skill that is taught not intensively in any one course, but rather across the curriculum,
in a way that would ensure that students acquired these skills no matter what major they chose.
The research seems to show, however, that this is not the case; on tests of general critical
thinking skills, students average a gain of less than one standard deviation during their entire
time in college, while most of this gain comes just in the first year.
In fact, these are among the reasons we give to prospective majors for joining the
philosophy department. We can cite statistics about which majors generally do better on the
LSAT and GRE; but what we have not been able to do in the past is show evidence that our
classes improve critical thinking skills.
What this study shows is that students do improve substantially their critical thinking
skills if they are taught how to construct argument diagrams to aid in the understanding and
evaluation of arguments. Although we studied only the effect of the use of argument diagrams in
an introductory philosophy course, we see no reasons why this skill could not be used in courses
in other disciplines. The creation of ones own arguments, as well as the analysis of others
arguments occurs in nearly every discipline, from Philosophy and Logic to English and History
to Mathematics and Engineering. We believe that the use of argument diagrams would be helpful
in any of these areas, both in developing general critical thinking skills, and developing
discipline specific analytic abilities. We hope to perform more studies in the future to test these
conjectures.

Future Work
This study raises as many questions as it answers. While it is clear that the ability to construct
argument diagrams significantly improves a students critical thinking skills along the
dimensions tested, it would be interesting to consider whether there are other skills that may
usefully be labeled critical thinking that this ability may help to improve.
In addition, the arguments we used in testing our students were necessarily short and
relatively simple. We would like to know what the effect of knowing how to construct an
argument diagram would be on a students ability to analyze longer and more complex
arguments. We suspect that the longer and more complex the argument, the more argument
diagramming would help.
It also seems to be the case that it is difficult for students to reason well about arguments
in which they have a passionate belief in the truth or falsity of the conclusion (for religious,
social, or any number of other reasons). We would like to know whether the ability to construct
argument diagrams aids reasoning about these kinds of arguments, and whether the effect is
more or less dramatic than the aid this ability offers to reasoning about less personal subjects.
In our classes at Carnegie Mellon University, we use argument diagramming not only to
analyze the arguments of the philosophers we study, but also to aid the students with writing
their own essays. We believe that, for the same reasons that constructing these diagrams helps
students visually represent and thus understand better the structure of arguments they read, this
would help the students understand, evaluate, and modify the structure of the arguments in their
own essays better. We would like to know whether the ability to construct arguments actually
does aid students essay writing in these ways.
Lastly, unlike the relatively solitary activities in which students engage in our philosophy
courseslike doing homework and writing essaysthere are many venues in and out of the
classroom in which students may engage in the analysis and evaluation of arguments in a group
setting. These may include anything from classroom discussion of a particular author or topic, to
group deliberations about for whom to vote or what public policy to implement. In any of these
situations it seems as though it would be advantageous for all members of the group to be able to
visually represent the structure of the arguments being considered. We would like to know
whether knowing how to construct argument diagrams would aid groups in these situations.

Acknowledgements
I would like to thank Ryan Muldoon and Jim Soto for their work on coding the pretests and
posttests; I would also like to thank Michele DiPietro, Marsha Lovett, Richard Scheines, and
Teddy Seidenfeld for their help and advice with the data analysis; and I am deeply indebted to
David Danks, Marsha Lovett, and Richard Scheines for detailed comments on many drafts.

References
Annis, D., & Annis, L. (1979) Does philosophy improve critical thinking? Teaching Philosophy,
3, 145-152.
Halpern, D.F. (1989). Thought and knowledge: An introduction to critical thinking. Hillsdale,
NJ: L. Erlbaum Associates
Kirschner, P.A., Shum, S.J.B., & Carr, C.S. (Eds.). (2003). Visualizing argumentation: Software
tools for collaborative and educational sense-making. New York: Springer.
Kuhn, D. (1991). The skills of argument. Cambridge: Cambridge University Press.
Means, M.L., & Voss, J.F. (1996). Who reasons well? Two studies of informal reasoning among
children of different grade, ability, and knowledge levels. Cognition and Instruction, 14,
139-178.
Pascarella, E. (1989). The development of critical thinking: Does college make a difference?
Journal of College Student Development, 30, 19-26.
Paul, R., Binker., A., Jensen, K., & Kreklau, H. (1990). Critical thinking handbook: A guide for
remodeling lesson plans in language arts, social studies and science. Rohnert Park, CA:
Foundation for Critical Thinking.
Perkins, D.N., Allen, R., & Hafner, J. (1983). Difficulties in everyday reasoning. In W. Maxwell
& J. Bruner (Eds.), Thinking: The expanding frontier (pp. 177-189). Philadelphia: The
Franklin Institute Press.
Plato. (1976). Meno. Translated by G.M.A. Grube. Indianapolis: Hackett.
Stenning, K., Cox, R., & Oberlander, J. (1995). Contrasting the cognitive effects of graphical and
sentential logic teaching: reasoning, representation and individual differences. Language
and Cognitive Processes, 10, 333-354.
Twardy, C.R. (2204) Argument Maps Improve Critical Thinking. Teaching Philosophy, 27, 95-
116.
van Gelder, T. (2001). How to improve critical thinking using educational technology. In G.
Kennedy, M. Keppell, C. McNaught, & T. Petrovic (Eeds.), Meeting at the crossroads:
proceedings of the 18
th
annual conference of the Australian Society for computers in
learning in tertiary education (pp. 539-548). Melbourne: Biomedical Multimedia Uni,
The University of Melbourne.
van Gelder, T. (2003). Enhancing deliberation through computer supported visualization. In P.A.
Kirschner, S.J.B. Shum, & C.S. Carr (Eds.), Visualizing argumentation: Software tools
for collaborative and educational sense-making (pp. 97-115). New York: Springer.

Appendix A
80-100 Spring 2004 Pre-Test
A. Identify the conclusion (thesis) in the following arguments. Restate the conclusion in the
space provided below.
1. Campaign reform is needed because many contributions to political campaigns are morally
equivalent to bribes.
Conclusion:
2. In order for something to move, it must go from a place where it is to a place where it is not.
However, since a thing is always where it is and is never where it is not, motion must not be
possible.
Conclusion:

B. Consider the arguments on the following pages. For each argument:
(a) Identify the conclusion (thesis) of the argument.
(b) Identify the premises (reasons) given to support the conclusion. Restate the premises in the
(c) Indicate how the premises are related. In particular, indicate whether they
(A) are each separate reasons to believe the conclusion,
(B) must be combined in order to provide support for the conclusion, or
(C) are related in a chain, with one premise being a reason to believe another.
(d) If you are able, provide a visual, graphical, schematic, or outlined representation of the
argument.
(e) State whether it is a good argument, and explain why it is either good or bad. If it is a bad
argument, state what needs to be changed to make it good.

3. America must reform its sagging educational system, assuming that Americans are unwilling
to become a second rate force in the world economy. But I hope and trust that Americans are
unwilling to accept second-rate status in the international economic scene. Accordingly, America
must reform its sagging educational system.
(a) Conclusion:
(b) Premises:
(c) Relationship of the premises. Circle one: (A) (B) (C)
(d) Visual, graphical, schematic, or outlined representation of the argument:
(e) Good or bad argument? Why?

4. The dinosaurs could not have been cold-blooded reptiles. For, 2unlike modern reptiles and
more like warm-blooded birds and mammals, some dinosaurs roamed the continental interiors in
large migratory herds. In addition, the large carnivorous dinosaurs would have been too active
and mobile had they been cold-blooded reptiles. As is indicated by the estimated predator-to-
prey ratios, they also would have consumed too much for their body weight had they been cold-
blooded animals.
(a) Conclusion:
(b) Premises:
5. Either Boris drowned in the lake or he drowned in the ocean. But Boris has saltwater in his
lungs, and if he has saltwater in his lungs, then he did not drown in the lake. So, Boris did not
drown in the lake; he drowned in the ocean.
(a) Conclusion:
(b) Premises:

6. Despite the fact that contraception is regarded as a blessing by most Americans, using
contraceptives is immoral. For whatever is unnatural is immoral since God created and controls
nature. And contraception is unnatural because it interferes with nature.
(a) Conclusion:
(b) Premises:

Appendix B
80-100 Spring 2004 Final Exam
A. Identify the conclusion (thesis) in the following arguments. Restate the conclusion in the
1. In spite of the fact that electrons are physical entities, they cannot be seen. For electrons are
too small to deflect photons (light particles).
Conclusion:

2. Since major historical events cannot be repeated, historians are not scientists.]After all, the
scientific method necessarily involves events (called experiments) that can be repeated.
Conclusion:

B. Consider the arguments on the following pages. For each argument:
(a) Identify the conclusion (thesis) of the argument.
(b) Identify the premises (reasons) given to support the conclusion. Restate the premises in the
(c) Indicate how the premises are related. In particular, indicate whether they
(A) are each separate reasons to believe the conclusion,
(B) must be combined in order to provide support for the conclusion, or
(C) are related in a chain, with one premise being a reason to believe another.
(d) Provide a visual, graphical, schematic, or outlined representation of the argument (for
example, an argument diagram).
(e) State whether it is a good argument, and explain why it is either good or bad. If it is a bad
argument, state what needs to be changed to make it good.

3. If species were natural kinds, then the binomials and other expressions that are used to refer to
particular species could be eliminated in favor of predicates. However, the binomials and other
expressions that are used to refer to particular species cannot be eliminated in favor of predicates.
It follows that species are not natural kinds.
(a) Conclusion:
(b) Premises:

4. Although Americans like to think they have interfered with other countries only to defend the
downtrodden and helpless, there are undeniably aggressive episodes in American history. For
example, the United States took Texas from Mexico by force. The United States seized Hawaii,
Puerto Rico, and Guam. And in the first third of the 20
th
century, the United States intervened
militarily in all of the following countries without being invited to do so: Cuba, Nicaragua,
Guatemala, the Dominican Republic, Haiti, and Honduras.
(a) Conclusion:
(b) Premises:
5. Either humans evolved from matter or humans have souls. Humans did evolve from matter, so
humans do not have souls. But there is life after death only if humans have souls. Therefore,
there is no life after death.
(a) Conclusion:
(b) Premises:

6. Of course, of all the various kinds of artists, the fiction writer is most deviled by the public.
Painters, and musicians are protected somewhat since they dont deal with what everyone knows
about, but the fiction writer writes about life, and so anyone living considers himself an authority
on it.
(a) Conclusion:
(b) Premises:

IMPLEMENTING A COMPUTERIZED TUTOR IN A STATISTICAL
REASONING COURSE: GETTING THE BIG PICTURE
Oded Meyer and Marsha Lovett
Carnegie Mellon University, USA
Many schools, like Carnegie Mellon University, are now teaching introductory statistical
reasoning courses in a way that emphasizes conceptual understanding of the basic ideas of data
analysis. There are several challenges in teaching such a course; foremost among them is the
difficulty of conveying a sense of the "Big Picture." This paper describes a computerized learning
tool that we have developed to help overcome this obstacle. This tool is a cognitive tutor in which
students solve data-analysis problems and receive individually tailored feedback. We discuss our
cognitive tutor's use in the course and its measured effectiveness in a controlled experiment.
INTRODUCTION
Teaching Statistical Reasoning at Carnegie Mellon University
Introductory statistics courses are being taught to a large and broad audience of students
(Gordon, 1995). In his 1998 presidential address, David Moore estimated that "hundreds of
thousands" of students pass through a first course in statistics in the US each year (Moore, 1998)
In Carnegie Mellon University (CMU) one such course is "Introduction to Statistical Reasoning"
(36-201), which is a required course for all Humanities and Social Sciences students, as well as
some other majors. There are roughly 450 students taking this course each year (Fall - 240, Spring
- 200, Summer - 10), the vast majority being freshmen. For some of these students this course is
"terminal" and is the only formal exposure to statistics. Others, depending on the their major, go
on to take additional upper level courses. The course emphasizes conceptual and critical
understanding of statistics and utilizes statistics software (MINITAB) to minimize computational
mechanics. The main goal of the course is to help students "get" what statistics is all about, in
other words, that students will see the "Big Picture" of statistics. By "Big Picture" we mean
understanding the process of: (1) producing data (sampling data from a population and
considering study design issues), (2) conducting exploratory analysis on the collected data, and
(3) making inferences from the sample back to the population of interest. The course curriculum
is organized into three roughly equal parts corresponding to the above.
Teaching Statistical Reasoning: Challenges
Many challenges arise when teaching an introductory statistical reasoning course.
Freshmen tend to approach material with an authoritative view of knowledge recognizing only
right or wrong answers. It is therefore very hard for the students to really "get" statistics, which
embraces ideas such as variability and uncertainty. Another challenge is to overcome students'
prior beliefs about statistics (Gal & Ginsburg, 1994). Many students' conception is that statistics
is a boring, "plug-in numbers to a formula'' kind of subject, and that it has no relevance to their
life, studies, or future plans. Some students see statistics as another mathematics course, and
project their math-phobia onto it. Our biggest challenge, though, is to convey to students a sense
of the "Big Picture" as frequently as possible without them getting lost in the details.
How does our course currently try to convey the Big Picture?
Our statistical reasoning course has three components: lectures, labs and homework. The
course meets twice a week for 50-minute lectures (200-240 students). Lectures typically cover a
specific topic that can be viewed as a point along the path of the course, plus some "-
neighborhood" around this point consisting of a short review of the previous lecture and a peek
into what is coming ahead. There is little opportunity for conveying a bigger picture than that,
except in the beginning of the course and in the transitions from one part of the course to the next.
The students split into smaller groups for weekly computer labs. Students work in pairs
and go through a paper-version exercise in a guided environment in the sense that TAs are
available to answer questions and are required to check students' answers to a pre-selected subset
of questions. The labs allow for a somewhat bigger picture, but still mainly focus on providing
practice on the current week's material. Moreover, in labs, students do not get to make choices
regarding which analysis to choose, since the relevant MINITAB instructions are provided in the
exercise itself. In addition, there is a weekly homework assignment, which again, is related almost
exclusively to the current week's topics, and gives the needed practice but very little chance for
synthesis.
HOW COGNITIVE TUTORS CAN HELP
The above discussion suggests that we need a "tool" that will engage students with
statistical problems covering more than the current week's topic. Ideally, this tool would
encourage students to use of variety of skills (e.g., from considering the study design to selecting
appropriate analyses to drawing conclusions) and to apply these skills in the context of real-world
data sets. We propose that cognitive tutors offer one way to do just that. The name "cognitive
tutor" refers to a computerised learning environment whose design is based on cognitive
principles and whose interaction with students is based on that of a (human) tutori.e., making
comments when the student errs, answering questions about what to do next, and maintaining a
low profile when the student is performing well.
What is a Cognitive Tutor?
A cognitive tutor is a computer system that has both (a) a problem-solving engine that
gives it the capacity to generate step-by-step solutions and (b) an enriched interface that allows
students to communicate their own step-by-step solutions. These two components enable the
system to track students' problem-solving processes at a detailed level and offer individually
tailored feedback and hints. That is, the student takes a step by interacting with the computer
interface, and the problem-solving engine judges the appropriateness of that step in the current
situation, responding (if necessary) based on its knowledge of what step it would have taken and
why. In addition, by collecting a database of information on individual students' performance, the
system can make inferences about students' states of knowledge and suggest additional exercises
that could remediate any apparent gap(s).
Past research has shown that cognitive tutors in various domains, including algebra,
geometry, and computer programming, have been effective. Both randomized experiments and
field studies have shown that students working with a cognitive tutor learn more efficiently and/or
show better scores on posttests, including standardized tests such as the SAT (e.g., Anderson,
Conrad, & Corbett, 1989; Anderson, Corbett, Koedinger & Pelletier, 1995; Koedinger, Anderson,
Hadley & Mark, 1997). There is a body of theory on which the cognitive tutor methodology is
based. Because of space constraints here, however, suffice it to say that cognitive tutors work by
allowing students to solve problems on their own and to receive help when needed to avoid
getting lost or confused. This help, which comes from the structure of the tutor interface and the
hints and feedback from the problem-solving engine, can be thought of as a mental scaffolding
that supports students' knowledge as it is constructed through practicejust as a physical
scaffolding supports a building as it is erected. This metaphor additionally suggests that, ideally, a
cognitive tutor should include mechanisms for reducing the scaffolding when appropriate, so that
studentsjust like the finished buildingcan stand on their own.
Building a Cognitive Tutor for Data Analysis
Before building a cognitive tutor for data analysis, it made sense to us to look for data
that could help shape such an endeavor. There is much evidence that students have difficulty
applying statistical concepts, in part because of competing prior conceptions (Shaughnessy, 1992;
Garfield & Ahlgren, 1988). We added to this body of empirical work by further investigating
where (and hopefully why) these difficulties arise in the context of solving data-analysis problems
(see Lovett, 2001, for more details). We found that, even among students who had completed the
above course with a grade of B or better:
Students have difficulty choosing appropriate graphical displays and statistical tools
(e.g., many of their analyses were inappropriate or at best not directly relevant to the
question).
Students often fail to interpret their results with respect to the question of interest
(e.g., students would say they were finished with the problems almost immediately
after producing a display or statistic).
Generally described, students do not take a systematic approach to solving these
problems (e.g., their behavior appeared instead to be driven by the menu options of
the statistics package or by a random process-of-elimination strategy).
These results indicated that a cognitive tutor for data analysis could be quite beneficial if it helped
students choose appropriate analyses, remember to draw conclusions from results (beyond just
restating them), and use a conceptual structure to approach these problems.
Therefore, we designed a cognitive tutor called SmartLab (written in Java) to address
these points. Most importantly, we tried to do so in a way that would facilitate students' learning
of the "Big Picture". First, we designed SmartLab to highlight the structure that is common across
data-analysis problems. This common structure consists of all the elements of the "Big Picture"
and is repeated for each problem, regardless of its details. Second, many of the "headings" in this
structure represent steps that were previously "hidden" from students' point of view in that they
were covert, planning steps (e.g., identifying relevant variables and classifying them as to type).
By revealing these steps, SmartLab both provides scaffolding to students in the steps that precede
selecting an appropriate analysis, and it makes that planning process open to feedback (so
students can see from SmartLab's feedback where they are going wrong before they get too far
down an erroneous path). Third, the provision of hints and feedback in SmartLab applies to all the
steps of problem solving, so students can get quicker diagnosis of their errors than they would
otherwise (e.g., using paper handouts in labs or on homeworks). Fourth, SmartLab was designed
so that the scaffolding we offer to beginning students can gradually be faded away. We
accomplish this in several stages, first by substituting fill-in-the-blanks for pull-down selectors
and later by removing "hidden skill" headings from the structure. One additional point about
SmartLab is that when students finish solving a problem, they have produced a well organized,
printable report of their work.
USING AND TESTING OUR COGNITIVE TUTOR FOR DATA ANALYSIS
We have used SmartLab in two different venues to test its effectiveness, namely, in the
classroom and in an experiment. We will discuss the results of each briefly below.
Use in the Classroom
We have used SmartLab in the past three semesters in several of the labs instead of the
paper-version exercise, and asked students as well as TAs to provide technical and conceptual
feedback on their experiences. Based on this valuable input we have made a series of technical
and pedagogical refinements to our tool, and we are currently working on including the use of
SmartLab in homework assignments (a web-based Javascript version). It should be noted that
SmartLab has an extremely positive impact on the interaction between TAs and students in the
lab. In the paper-version exercises TAs were required to check the students' answers to the more
technical questions, making sure that students were on the right track. Using SmartLab, students
get tailored feedback from the system for those questions, and thus the interaction between the
TAs and the students can be shifted to the interpretational questions. In this new role, TAs are
more engaged with the students and can get a better feel for their understanding of the material.
Use in an Experiment
To evaluate SmartLab in a more controlled environment, we designed an experiment in
which people could get a fairly intensive statistics experience in a relatively short amount of time.
Participants without any prior (formal) statistics training were recruited for pay. They were asked
to attend five sessions, for two to three hours per session, and were assigned to work with
SmartLab when solving problems. (This experiment also involved a comparison with a variant of
SmartLab, which is outside the scope of this paper.) For sessions 1-4, the participants watched
videotaped lectures and worked on sequences of problems. In addition, at the beginning of session
1 and during session 5, participants completed several paper-and-pencil tests. Moreover, during
session 5, participants worked on additional, open-ended data-analysis problems on the computer
with minimal scaffolding.
One paper and pencil assessment was a multiple-choice test covering the basic skills and
concepts of exploratory data analysis, including questions on identifying study designs, selecting
appropriate analyses, and drawing conclusions from the results. Participants' scores increased by
23%, a significant improvement, t(19)=5.877, p < .001. Another paper and pencil assessment
asked students to read through short, data-analysis situations and classify them into groups (on
whatever basis they felt reasonable). By analyzing participants' categories, we found that there
was a significant pre-post shift in the way participants classified the problem: before the
experiment, they tended to base their classifications on the subject matter of the problems and,
after the experiment, they tended to base their classifications on the appropriate exploratory
analysis, t(19) = 4.11, p < .001.
We also looked at participants' performance on the open-ended quiz problems. An
interesting comparison here is that, after using SmartLab, participants in the experiment made
only 0.73 errors on average per opportunity to select an appropriate analysis, whereas in Lovett
(2001) students who had taken an entire semester's course made more than 9 errors per selection
opportunity (with the appropriate analysis being, on average, the 6
th
selected).
CONCLUSIONS AND FUTURE WORK
This paper discusses the design and implementation of a cognitive tutor for data analysis
called SmartLab. Each time students use SmartLab, they get exposed to the Big Picture of data
analysis and see that the same process applies across all problems. In particular, SmartLab puts an
emphasis on helping students learn how to choose the appropriate analysis and requires them to
draw conclusions in context. We found that it works well in the computer lab sessions of our
course. By adding this tool, we find that TAs can be released from attending to the details of
students' solutions and can focus then on the deeper issues. Moreover, a controlled experiment
showed that, even over short period of use, this tool led to significant improvements in students'
approach to exploratory data analysis. In particular, results suggested that after the experiment,
when participants encountered a new problem, they were thinking about it in terms of the
appropriate analysis instead of in terms of the subject matter.
Our future plans include extending the use of SmartLab in the classroom (to more labs
throughout the semester and to homeworks). We also will conduct more experiments that
compare SmartLab with control conditions representative of current typical lab experience (e.g.,
paper versions). Also, because our focus thus far has been to refine SmartLab pedagogically, we
have plans to collect additional data to help assess the usability of this tool in terms of human-
computer interaction and to make refinements accordingly.
REFERENCES
Anderson, J.R., Conrad, F.G., & Corbett, A.T. (1989). Skill acquisition and the LISP tutor. Cognitive
Science, 13, 467-505.
Anderson, J.R., Corbett, A.T., Koedinger, K.R., & Pelletier, R. (1995). Cognitive tutors: lessons learned.
Journal of the Learning Sciences, 4, 167-207.
Gal, I., & Ginsburg, L. (1994). The role of beliefs and attitudes in learning statistics: Towards an
assessment framework. Journal of Statistics Education, 2(2).
Garfield, J. & Ahlgren, A. (1988). Difficulties in learning basic concepts in statistics: Implications for
research. Journal for Research in Mathematics Education, 19, 44-63.
Gordon, S. (1995). A theoretical approach to understanding learners of statistics. Journal of Statistics
Education, 3(3).
Koedinger, K.R., Anderson, J.R., Hadley, W.H., & Mark, M.A. (1997). Intelligent tutoring goes to school
in the big city. International Journal of Artificial Intelligence in Education, 8, 30-43.
Lovett, M. C. (2001). A collaborative convergence on studying reasoning processes: A case study in
statistics. In S. Carver and D. Klahr (Eds.), Cognition and instruction: twenty-five years of progress
(pp. 347-384). Mahwah, NJ: Erlbaum.
Moore, D.S. (1998). Statistics among the liberal arts. Journal of the American Statistical Association, 93,
1253-1259.
Shaughnessy, J M. (1992). Research in probability and statistics: Reflections and directions. In D.A.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 465-494). NewYork:
Macmillan.
1
Have Your Cake And Eat It Too! Have Your Cake And Eat It Too!
Steven Rudich Steven Rudich
Computer Science Department Computer Science Department
Carnegie Mellon University Carnegie Mellon University
July 1, 2003
Anne Willan Anne Willan s s Look & Cook: Look & Cook:
The ideal technical talk has all the The ideal technical talk has all the
delectable qualities of delectable qualities of Look & Cook. Look & Cook.
Methodically Accessible Accessible
Masterfully Masterfully Ambitious Ambitious
Sweetly Sweetly Seductive Seductive
Have Your Cake And Eat It Too! Have Your Cake And Eat It Too!
1) 1) Accessible: Accessible: A cookbook kids can use A cookbook kids can use
2) Ambitious: 2) Ambitious: Recipes as impressive as Recipes as impressive as
any in the Cordon Blue cooking any in the Cordon Blue cooking
manual manual
How was this book made? How was this book made?
1) 1) Ambitious: Ambitious: All the great recopies All the great recopies
and techniques were collected and and techniques were collected and
evaluated. Experts masterfully evaluated. Experts masterfully
distilled the material, without distilled the material, without
tossing out any key idea. tossing out any key idea.
How was this book made? How was this book made?
1) 1) Ambitious. Ambitious.
2) 2) Accessible: Accessible: Every step was Every step was
articulated and illustrated. articulated and illustrated.
Empirical feedback used to refine Empirical feedback used to refine
the steps. The gold standard: the steps. The gold standard:
Could a child use this? Could a child use this?
2
The result: The result:
1) 1) Ambitious: Ambitious: All the key ideas and All the key ideas and
examples are presented. Give the examples are presented. Give the
aspiring cook the tools to do aspiring cook the tools to do
anything. anything.
2) 2) Accessible: Accessible:
A child can understand it. A child can understand it.
Three Principles Three Principles
Ambition
Accessibility
Seduction
AMBITION:
Distill *all* the key ideas.
ACCESSIBILITY:
Eliminate barriers to
understanding.
SEDUCTION:
Inspire the desire to know.
Ambition:
All the key ideas.
Accessibility:
Smooth, simple, lucid steps.
Seduction:
Ambition:
All the key ideas.
Accessibility:
Smooth, simple, lucid steps.
Seduction:
Technical Technical
Lecturing/Teaching Lecturing/Teaching
Have Your Cake And Eat It too! Have Your Cake And Eat It too!
My Personal Mission: My Personal Mission:
To be the To be the Anne Anne Willan Willan of technical of technical
teaching. teaching.
3
The Usual Story The Usual Story
1) 1) Accessibility at the price of Accessibility at the price of
content: content: Simple, kid friendly Simple, kid friendly
cookbooks make mud pies. cookbooks make mud pies.
2) 2) Content at the price of Content at the price of
accessibility: accessibility: The hundreds of The hundreds of
intimidating tomes of cooking. intimidating tomes of cooking.
The Usual Story The Usual Story
content: content: Dumped down technical Dumped down technical
curriculum. curriculum.
accessibility: accessibility: Incomprehensible Incomprehensible
math course. math course.
University courses frequently give up University courses frequently give up
accessibility accessibility
content: content: Dumped down technical Dumped down technical
curriculum. curriculum.
accessibility: accessibility: Incomprehensible Incomprehensible
math course. math course.
Intellectual Fantasy: Intellectual Fantasy:
Brain To Brain Transfer Brain To Brain Transfer
More!
Harder!
Faster!
Platonic Fantasy: Platonic Fantasy:
A Talk Is An Aesthetic Object A Talk Is An Aesthetic Object
Perfect
Truth Is
Beauty.
Human Reality Human Reality
Whens
lunch?
Whats
a DAG?
blah,
blah,
blah
4
The human brain has a limited The human brain has a limited
supply of cognitive energy. supply of cognitive energy.
The Audience Brains Are Depleted The Audience Brains Are Depleted
By The Accumulation Of Demands By The Accumulation Of Demands
Visual Processing. Visual Processing.
Auditory Processing. Auditory Processing.
Concentration. Concentration.
Calculation. Calculation.
5
Short Term Memory. Short Term Memory.
Recall From Long Term Memory. Recall From Long Term Memory.
Stress Of Unresolved Concerns. Stress Of Unresolved Concerns.
Stress Of Unresolved Concerns. Stress Of Unresolved Concerns.
Etc . . . Etc . . .
Duh.
Work hard
to be easy.
How do we concretely apply these
principles in designing a talk?
Ambition
Accessibility Seduction
6
MINIMIZE
INFORMATION / SLIDE
EMPATHY:
Simplify Visual Processing
Accessibility:
Use kid friendly representations.
b
b
a
b
a
a
a
b
a
b
Spell out the exact correspondence
between arguments in different
representations.
Accessibility:
Use multiple representations.
1 + 2 + 3 + . . . + n-1 + n = S
n + n-1 + n-2 + . . . + 2 + 1 = S
(n+1) + (n+1) + (n+1) + . . . + (n+1) + (n+1) = 2S
n (n+1) = 2S
Lets restate this argument
using a geometric
representation
Algebraic argument
1 2 . . . . . . . . n
= number of white dots.
7
1 2 . . . . . . . . n
= number of white dots
= number of yellow dots
n . . . . . . . 2 1 n+1 n+1 n+1 n+1 n+1
n
n
n
n
n
n
There are n(n+1)
dots in the grid
n+1 n+1 n+1 n+1 n+1
n
n
n
n
n
n
I own 3 beanies and 2 ties.
How many different ways can
I dress up in a beanie and a
tie?
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
8
There is a
correspondence between
paths in a choice tree
and the cross terms of
the product of
polynomials!
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) =
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) = b
1
t
1
+
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) = b
1
t
1
+ b
1
t
2
+
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) = b
1
t
1
+ b
1
t
2
+ b
2
t
1
+
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) = b
1
t
1
+ b
1
t
2
+ b
2
t
1
+ b
2
t
2
+
9
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) = b
1
t
1
+ b
1
t
2
+ b
2
t
1
+ b
2
t
2
+ b
3
t
1
+
b
2
b
3
b
1
t
1
t
2
t
1
t
2
t
1
t
2
b
1
t
1
b
1
t
2
b
2
t
2
b
2
t
1
b
3
t
1
b
3
t
2
(b
1
+ b
2
+ b
3
)(t
1
+ t
2
) = b
1
t
1
+ b
1
t
2
+ b
2
t
1
+ b
2
t
2
+ b
3
t
1
+ b
3
t
2
There is a
correspondence between
paths in a choice tree
and the cross terms of
the product of
polynomials!
Informal before formal.
This does not mean being vague
or ambiguous!
Accessibility:
Qualitative before quantitative.
The initial minute of a talk should
ENGAGE THE AUDIENCE.
Initial minutes worth an hour of prep
Use A Hook: fair question, story, an
intriguing claim, something odd,
SEDUCTION:
First impressions count.
Let me
teach you a
programming
language so
simple that
you can
learn it in
less than a
minute.
10
Meet Meet ABA ABA The Automaton! The Automaton!
b
b
a
b
a
a
a
b
a
b
Accept Accept aabba aabba
Accept Accept ! !
Reject Reject aabb aabb
Accept Accept aba aba
Result Result Input String Input String
The Simplest Interesting Machine: The Simplest Interesting Machine:
Finite State Machine Finite State Machine
OR OR
Finite Automaton Finite Automaton
State State
transition transition
instructions instructions
a b # a b #
x 1 x 1
A finite A finite
alphabet alphabet
A set of A set of
accepting accepting
states states
A start state A start state
Finite set of Finite set of
states states
1 2
{ , , , , }
o k
Q q q q q = K
Finite Automaton Finite Automaton
o
q
{ }
1 2
, , ,
r
i i i
F q q q = K
!
:
( , )
i j
Q Q
q a q
! "# $
! =
i
q
j
q
a
They really dont know!
Accessibility:
Tell them what is at issue.
Machines That Cant Count
CS 15-251 Lecture 15 Lecture 15
b
b
a
b
a
a
a
b
a
b
a
n
b
n
is not regular. No
machine has enough
states to keep track of
the number of as it
might encounter.
11
That is a fairly weak
argument. Consider
the following example
Cant be regular. No machine
has enough states to keep
track of the number of
occurrences of ab.
L = strings where the # of
occurrences of the pattern ab
is equal to the number of
occurrences of the pattern ba
Remember Remember ABA ABA ? ?
b
b
a
b
a
a
a
b
a
b
ABA accepts only the strings
with an equal number of abs
and bas!
Professional Strength Proof Professional Strength Proof
Theorem: a
n
b
n
is not regular.
Proof: Assume that it is. Then " M with k
states that accepts it.
For each 0 # i # k, let S
i
be the state M is in
after reading a
i
.
$i,j # k s.t. S
i
= S
j
, but i $ j
M will do the same thing on a
i
b
i
and a
j
b
i
.
(to S
i
Z ) (to S
i
Z )
But a valid M must reject a
j
b
i
and accept
a
i
b
i
.
%&
Question: Question:
How can one teach
concepts like theorem,
proof, conjecture, and
independence?
Question: Question:
How can one teach
concepts like theorem,
proof, conjecture, and
independence to a
five year old?
12
Richard Kaye Richard Kaye
Np Np = = coplete coplete. 2000. . 2000.
MINESWEEPER MINESWEEPER
Mines Left: 10 Mines Left: 10
3 3
! ! ! !
! ! 3 3
! ! ! !
2 2 ! ! 3 3
1 1 3 3 ! ! ! !
0 0 2 2 ! ! 3 3
13
1 1 1 1 3 3 ! ! ! !
0 0 0 0 2 2 ! ! 3 3
1 1 1 1 2 2
0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 2 2 ! ! 3 3
1 1 1 1 2 2 ! !
0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 2 2 ! ! 3 3
1 1 1 1
0 0 1 1
0 0 1 1
0 0 1 1 1 1 2 2 ! ! 3 3 4 4
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
1 1 1 1
0 0 1 1
0 0 1 1 ! !
0 0 1 1 1 1 2 2 ! ! 3 3 4 4
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1
0 0 1 1 ! ! 2 2 2 2
0 0 1 1 1 1 2 2 ! ! 3 3 4 4
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
14
1 1
1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1
0 0 1 1 ! ! 2 2 2 2 ! !
0 0 1 1 1 1 2 2 ! ! 3 3 4 4
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 2 2 1 1
0 0 1 1 ! ! 2 2 2 2 ! ! 2 2 ! !
0 0 1 1 1 1 2 2 ! ! 3 3 4 4 3 3
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
! ! 1 1 ! ! 1 1 ! !
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 2 2 1 1
0 0 1 1 ! ! 2 2 2 2 ! ! 2 2 ! !
0 0 1 1 1 1 2 2 ! ! 3 3 4 4 3 3
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
! ! 1 1 ! ! 1 1 ! !
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 2 2 1 1
0 0 1 1 ! ! 2 2 2 2 ! ! 2 2 ! !
0 0 1 1 1 1 2 2 ! ! 3 3 4 4 3 3
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
Dad, it Dad, it s INDIAN-PENDANT like s INDIAN-PENDANT like
you said. you said. Isaac Rue Rudich Isaac Rue Rudich
? ? ? ? 1 1 ? ? ? ? 1 1 ? ? ? ?
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 2 2 1 1
0 0 1 1 ! ! 2 2 2 2 ! ! 2 2 ! !
0 0 1 1 1 1 2 2 ! ! 3 3 4 4 3 3
0 0 0 0 0 0 1 1 1 1 3 3 ! ! ! !
0 0 0 0 0 0 0 0 0 0 2 2 ! ! 3 3
Oh! No!
What do we
do when it is
independent?
Dont be silly
daddy, just
guess.
15
Why Is Minesweeper So Why Is Minesweeper So
Effective? Effective?
It gets it hooks into It gets it hooks into
you. It motivates you you. It motivates you
to perform the task to perform the task
well. well.
When you use bad When you use bad
reasoning and blow reasoning and blow
up it shows you a up it shows you a
concrete refutation: concrete refutation:
where the mines where the mines
were located. were located.
It has an easy
to understand
interface.
I aspire to teach mathematics I aspire to teach mathematics
with the same compelling with the same compelling
clarity of minesweeper. Hence, clarity of minesweeper. Hence,
I use similar methods. I use similar methods.
Conclusion Conclusion
I wish all technical talks were I wish all technical talks were
as as substantial substantial, , clear, clear, and and
compelling compelling as as Look & Cook Look & Cook. .
The Philosophy Department Teaching Assistant Handbook 1
Philosophy Department Teaching Assistant Handbook

Welcome t o t he Department of Phi losophy

Teaching assistants (TAs) and graders in the Carnegie Mellon Philosophy Department are an integral
part of undergraduate education. In this role, you will assist the faculty in the instruction, advising, and
evaluation of students. In addition, being a TA or grader is an important component of your development
as a teacher, and thus, an important part of your preparation for a career in academia. This handbook is
designed to provide you with some of the information you will need to meet this challenge.

1. Teachi ng Assignments

1.1 Regular Academic Year. For any given semester, e.g., Fall 2004, we must decide on what courses
we will actually offer fairly early in the prior one, e.g., Spring 2004. Existing students register online for
those courses (in April for the fall and in November for the Spring), and we receive tentative enrollment
figures soon thereafter. Based on these enrollment figures and the number of available graders/TAs, we
decide the list of courses that will get graders and TAs. We will then send out this list (usually in late
November for the spring or in J une for the fall), and inquire of the students which courses they would like
to be assigned to, and of the instructors which students they would like to have as TAs or graders. Soon
thereafter we will circulate a proposal for TA/Grader assignments. It is important to be clear that
TA/Grader duties are a paid job in service of providing excellent undergraduate courses. The criteria for
making assignments, in order, is:
1. Optimizing the educational experience of the undergraduates.
2. Preparing our graduate students for teaching.
3. Optimizing the preferences of the students and professors.

After a few rounds of swapping people around to accommodate preferences while ensuring the best
educational outcome, we will settle on an assignment for the upcoming semester. After this assignment is
made, you should communicate directly with the instructor for whom you are grading or TAing.

Our introductory Philosophy course, 80-100, is somewhat special. We typically teach 4 to 5 separate
sections of 44 students each, where each section has its own 50-minute lecture on Mondays and
Wednesdays, and splits up into two smaller 50 minute recitation sections on Friday. The lecturing is
usually handled by regular faculty, visitors, or senior graduate students who have been a TA for the
course several times and have taught it during the summer; and the TAing by Ph.D. students or Masters
students who have a good background in general philosophy. Mara Harrell usually coordinates all
sections so they are all roughly covering the same body of material.

There are other courses besides 80-100 for which we occasionally will allow a graduate student full
responsibility for the course. For example, we have often had graduate students teach 80-110, the Nature
of Mathematical Reasoning. Lecture assignments are made on the basis of intellectual maturity,
dedication to teaching, and seniority. Clearly, teaching your own course is an important piece of a
teaching portfolio, and the department strongly encourages you to gain the experience necessary to be
allowed full responsibility for one or more courses.

1.2 Summer Teaching. Carnegie Mellon offers two summer sessions of classes. The first usually begins
the Monday after commencement (around May 17
th
) and goes through the end of J une, and the second
begins with J uly and proceeds through the first week in August. The Philosophy department usually offers
a half-dozen 100 or 200 level courses during each session, and these courses usually enroll between 2-
15 students, some of whom are in advanced High School programs. They meet every day of the week,
and are thus compact and quite intense. These courses are taught only by graduate students, by default
only those who have graded or been a TA for that course at least twice previously. If you believe you can
teach the course effectively having not been a grader or TA twice previously, you may make a case in
writing to us and, if we are willing, do so nevertheless.

We decide on our offerings in April, and at the same time send a list of these courses to you and ask if
you would like to teach one of them and in which session. We make assignments by the end of April.
Summer classes are not guaranteed to even take place unless they enroll at least 3 people.

2. Teachi ng Portfol ios

2.1 Documentat ion by t he depart ment. Beginning in AY 2003-4, we will keep teaching portfolios on all
graduate students. We will record the courses graded, TAed, or lectured, evaluations by students if they
are available, comments from instructors, and we will keep a record of any extra efforts made to improve
teaching, e.g., seminars attended at the Eberly Center for Teaching Excellence, or teaching awards, etc.

In addition to the portfolios kept by the Philosophy Department, the Eberly Center offers a documentation
of Teaching Development program. More information is available on the web:
http://www.cmu.edu/teaching/support/index.html.

Especially for those interested in getting a job at an institution where teaching is a high priority, this sort of
systematic record of teaching experience is very important.

2.2 Documentat ion f or dossier. A concise documentation of teaching activity should be included in
every students dossier.

This documentation should include:
A statement of teaching philosophy and interests
Summaries of teaching evaluations
At least one letter of recommendation from a faculty member pertaining directly to teaching
Documentation from the Eberly Center (if applicable)
A section on your CV of courses you have taught, as well as courses you are prepared to teach

2.3 Preparati on for i nterviews. In an interview for an academic position, candidates are often asked
about courses they would like to teach, as well as courses they are prepared to teach, should they be
hired. In order to be able to speak informatively about these courses, we recommend that students
develop the following:
Course descriptions and rationales
Sample syllabi that include texts, lecture/discussion topics, and homework
A concise statement about how your teaching philosophy is reflected in your design and
implementation of each course

3. Teachi ng Responsi bi lit ies

The duty of a TA or Grader is to assist the professor in teaching a particular course. The nature of the
assistance required varies from course to course and from professor to professor, of course, but there are
some responsibilities upon which there is general agreement.

Graders. In general, graders do not interact with students in a classroom setting. Rather, as the title
suggests, they attend classes, grade all of the assignments, maintain grade records, and hold office
hours for students who may need help. Occasionally, they may be asked to hold review sessions, paper-
writing sessions, or something similar.

TAs. In addition to attending classes, grading assignments, maintaining grade records, and holding office
hours, TAs run recitation sections for the course. Currently, we only have TAs for 80-100, and the format
is as follows: there are lectures on Monday and Wednesday, given to the entire class (about 44 students)
by the professor, and there are two recitation sections (about 22 students each), run by the TA on
Fridays. In general, the TAs are expected to direct a discussion about the material in the course during
these recitations, and are not expected to introduce the students to new material. More detail about the
TAs interaction with students is given in section 5 below. In addition to the responsibilities mentioned
above, TAs may also be asked to hold review sessions, paper-writing sessions, or something similar.

Many professors either create their own websites for their courses, or use the course management
software called Blackboard. If either is the case, you should familiarize yourself with what the students will
be seeing, and how the site will be maintained.

The expectations for both TAs and Graders will be unique to the course and the professor, and so TAs
and Graders are encouraged to discuss specifics with the professor before the course begins. More detail
about this discussion is given below.

4. TA-Faculty Relat ionships

Your job as a TA is to assist the professor in teaching a particular course, even though the particular form
this assistance takes may differ in practice from course to course and from professor to professor. Thus, it
is a good idea to make an appointment with the professor as soon as you know what your assignment for
the semester will be. Before the course begins, you should have a clear idea of what will be expected of
you. The professor has a great deal of latitude in the duties which can be delegated to a TA or grader
(although he or she retains ultimate responsibility for the course and grades). You are paid to work 12
hours per week as a grader, or 15 as a TA, and you should discuss with the professor your duties with
this time commitment in mind. Among the issues you need to clarify with your professor are the following.

1. What are some of the main course goals?
Are some of these goals more important than others for the work I will do with the students?

2. What will my responsibilities be?
a. Grading homework? Tests? Papers? A final exam? Other assignments? Will I do all the
grading myself?
b. Attending lectures?
c. Attending weekly meetings?
d. Drafting or revising grading keys?
e. Providing written feedback?
f. Reporting common student errors or difficulties?
g. Preparing quizzes, handouts, assignments, exam questions?
h. Holding regular office hours?
i. Conducting review sessions?
j. Giving guest lectures?
k. Maintaining grade records?
l. Giving a percentage of the final grade based on activities in section meetings?
m. Recording attendance?
n. Proctoring exams?
o. Maintaining or creating online resources for students?

3. What do you expect the students to know or be able to do from prior courses?
If you expect wide variation in students backgrounds, is there anything specific I should do in
response (e.g. offer tutoring, conduct review sessions, find extra challenge assignments)?

4. As a TA, how much will I interact with students?
Will students be expected to attend sections meetings, participate actively in discussion, seek
help with assignments out of class, or attend help sessions? If section meetings are optional, how
can students be encouraged to attend?

5. As a TA or grader, how often will I meet with you to discuss the course?
If we do not have regular meetings, should I contact you regarding problem students and/or
assignments? If there are multiple TAs, will we all meet to discuss how to coordinate activities?

6. What are the criteria for grading in this course, and how can I be sure my grading is calibrated to
your standards? Specifically,
a. Will we go over any of the grading of assignments together, or will you check my grading
of a sample of assignments?
b. Will you provide a grading key or rubric for assignments? If not, should I/may I use my
own?
c. How tough/easy a grader should I be?
d. Do you have a desired distribution for grades for each assignment and/or the overall
grade for the course?
e. How is partial credit awarded?
f. How will the final grades be determined?

7. If there are multiple TAs, how will we coordinate activities? In particular,
a. How will we divide the grading to insure parity, consistency, etc.?
b. How will we formulate a common answer key or rubric for assignments?
c. How do we make sure we are calibrated with each other?

8. About what policies, if any, do I have authority to make decisions and for what issues do you want
me to refer questions to you?
a. Requests for redoing assignments
b. Requests for re-grading
c. Granting extensions
d. Accepting late assignments
e. Giving make-up assignments
f. Responding to suspected cheating or plagiarism
g. Helping a student find additional assistance for personal or academic problems

9. Is there a syllabus for this class? Do you have due dates for all of the assignments and/or tests
set before the class begins? If not, how much advance warning will the students (and I) receive
about assignments and/or tests?

10. What are the books and/or other materials for this course? Will I receive a desk copy of the text?
Will I receive handouts before or at the same time as the students? Will non-text materials be
available online? Should non-text materials be made available to students who do not come to
class the day they are handed out?

11. What are your policies for this class? Specifically,
a. Will you allow students into the course beyond the cap? If so, how many?
b. Is there a policy for late assignments?
c. Is there a policy for class/recitation section attendance?
d. Is there a policy for how assignments are turned in? For example, can they be turned in
via email or the digital drop box, or must they be a turned in as a hard copy? If they can
be turned in via email, to whom will they be sent?
e. Is there a policy for collaboration on assignments?
f. Is there a policy for redoing assignments?
g. If a student wants an extension, will you send them to me first, or take care of it yourself?
If you take care of it yourself, how will I be notified of the outcome?
h. If a student wants a re-grade, will you send them to me first, or take care of it yourself? If
you take care of it yourself, how will I be notified of the outcome?

12. How much flexibility do I have in how I fulfill my responsibilities? Specifically,
a. What aspects of my teaching are important to maintain consistency across sections or to
fulfill specific course objectives?
b. How quickly do you expect me to grade the assignments?
c. How detailed should the comments I give on assignments be?
d. If I am maintaining grade records, is there a specific format I should use?

13. In what ways will my work be evaluated?
a. Faculty review of graded exams or papers
b. Classroom visits and feedback
c. Videotaping and review
d. Early or midterm course evaluations
e. End-of-course student evaluations via Faculty Course Evaluations (FCEs)
f. End-of-course student evaluations specific to TA responsibilities

5. TA-Student Interacti on

As a graduate student TA, you occupy a distinctive position in the University hierarchy. You are neither
solely a student, nor solely a teacher, but a little bit of each. Since it was probably not so long ago that
you were an undergraduate student yourself, you are in a good position to empathize with student
problems, and build a good rapport with them.

You will be teaching primarily in sections in which the smaller and more personal setting gives the
students the opportunity to ask questions about the course, discuss the content of the course, and learn
how to express themselves verbally and in writing. There is no recipe for teaching sections successfully,
and each TA must develop his or her own style. Nonetheless, we can offer some specific suggestions.

Planning. In most cases, the section should not be another lecture, but rather a setting for the students to
interact with each other and you while discussing material presented by the professor. However, this does
not mean that the section should merely be a question-and-answer session. Even though you wont be
giving a formal lecture, you should be prepared with some sort of lesson plan formulated with an eye
towards the goals of the course.

Office Hours. The TAs office hours are an important extension of the classroom that can help personalize
a students educational experience. You should discuss with the professor how many office hours you
should hold per week. Generally, it is not required for students to attend office hours, so you might think
of ways to encourage them to do so.

Discussion. There are several strategies that facilitate a good discussion. First, it is important to think
about using the classroom space wisely. For example, having the students sit in a circle may help
differentiate this setting from the lecture setting, as well as encouraging the students to talk to each other
and not just to you. Second, learning the students names quickly helps establish rapport. The easiest
way to learn student names is to take digital pictures (we will provide a camera), and attach names to the
faces. Third, on the first day of class, set some ground rules for discussion, preferably with the help of
the students. Finally, is important to strike a balance between encouraging students to contribute and
providing corrective feedback.
Suggestions for rewarding student contributions:
Talk directly and explicitly to the student who contributes.
Put student comments on the chalk/whiteboard.
Make eye contact and use the students name.
Listen carefully and ask follow up questions and for paraphrasing.
Ask the student to restate complex or inaudible comments for the whole class, or do so
yourself when necessary.
Point out specifically what you thought was valuable in the contribution.
If you see potential in a comment, ask the student for elaboration, application, or
continuation of the point.
Incorporate student points in later material.
Invite other students to add their reactions to build further on the original point.
Comment on the thinking process the student has used, as well as the point made
If a comment is unclear or confused, help the student express the original intent.
Use non-verbal messages to reward students for the act of participating, regardless of
the substance.

Suggestions for providing corrective feedback without discouraging students:
Be clear about the difference between what is incorrect and what you as an individual
can disagree with.
Before you disagree with or correct a student, restate the point to test your
understanding.
Admit your ignorance. If you dont know something, say so. Refer the student to other
sources or offer to get the information.
When you criticize a comment, ask for reactions. This keeps a dialogue going and makes
students less likely to withdraw.
Be specific in both positive and negative comments.
When making criticisms, explain your reasons.
Encourage students to respond to each others ideas.
Be sensitive to student pride and fears. In putting forward an idea, a student is also
putting self-esteem on the line.
Avoid any tone of condescension. A student who is working on an idea, however,
elementary, deserves respect.
Recognize the realities of a high-pressure, competitive campus. All students have to
worry about grades.
Leave your ego outside the classroom. Do not try to look good at the expense of a
student.

Classroom Management. Establishing rapport with students is important for facilitating the process of
education, and it is often easier for you to do this than professors because you are closer in age and
experience to the students. As such you are often considered more of an experienced peer than an
authority figure, making your job both easier and more difficult. You need to find the level of rapport that is
the most comfortable and workable for you.
One of the most serious problems that can arise is a challenge to your authority, either in or out of
class. Challenges in class might take the form of a student making demands or trying to intimidate you in
front of the class. The key here is to defuse the situation by offering to deal with the issue after class or in
office hours. Challenges outside of class might take the form of an accusation that you are not qualified to
teach and/or evaluate students. Perhaps the best response to this is to refer the student to the professor.

6. Academic Honesty

Either the professor or the TA should make clear to the students what the expected method is for citing
sources in any type of assignment. In addition, students (especially first-years) may need assistance
deciding where citations are appropriate or necessary. Often, cases that seem like plagiarism turn out to
be the result of confusion.
However, a TA or grader who suspects deliberate cheating or plagiarism by students should
discuss it with the faculty member in charge of the course. Since the faculty member is ultimately
responsible for student grades, he or she is also responsible for initiating action concerning violations of
standards of academic honesty.

The following is a departmental statement on plagiarism that should be made available to every student.

The straightforward disclosure of the sources used in completing course work is essential to the
integrity of the educational process. In that way one acknowledges the ideas of others and helps to
highlight what is distinctive of ones own contribution to a topic. It also enables instructors to be more
effective teachers by providing an accurate sense of the students grasp of course material.
Students are expected to use proper methods for citing sources; such methods can be found in
style guides like the Chicago Manual of Style, or the most recent MLA Handbook. In general, an
acceptable method of citation provides enough information to allow a reader to track down the original
sources. You should consult your professor, if you have any questions about which method to use, or
which kinds of collaboration or assistance to disclose.
Failure to acknowledge the ideas of others is a serious violation of intellectual integrity and
community standards. It is the individual students responsibility to be aware of university policies on
academic integrity, including the policies on cheating and plagiarism. This is available online at:
http://www.cmu.edu/poli cies/documents/Cheati ng.html and in the section on University Policies in
the most recent edition of The Word: Undergraduate Student Handbook.
Students who cheat or plagiarize face serious sanctions at both the course level, and the
university level. At the course level, faculty at Carnegie Mellon University have significant discretion to
determine the sanctions that are appropriate to individual cases of cheating and plagiarism. Within the
Philosophy Department, it is customary to give plagiarized assignments a failing grade and, where
appropriate, to fail students for the course. Additionally, a letter is sent to the Dean of Students indicating
that the student in question has submitted plagiarized material and received a course-level sanction.
Plagiarism is also a violation of the community standards of Carnegie Mellon University. As such,
allegations of plagiarism may be brought before a University Academic Review Board which will
determine whether a violation of community standards has taken place and level additional sanctions if
appropriate. Although this body also has significant discretion over the sanctions that it levels, plagiarism
can result in academic probation, suspension, and even expulsion.

7. Sexual Harassment

Carnegie Mellon University is committed to maintaining a work environment free from sexual harassment.
The policy on and definition of sexual harassment can be found online at:
http://www.cmu.edu/poli cies/documents/SexHarass.ht ml and in the section on University Policies in
the most recent edition of The Word: Undergraduate Student Handbook.

8. More Inf ormat ion

More information, advice, and discussion about good teaching practices can be found at the Eberly
Center for Teaching Excellence, located in 120 Cyert Hall, or on the web:
http://www.cmu.edu/teaching/eberlycenter/i ndex.ht ml. Information about incorporating technology into
your teaching at the Office of Technology for Education, located in 101 Cyert Hall, or on the web:
http://www.cmu.edu/teaching/ote/index.ht ml.

Motivating Environmental
Systems and Life Cycle Thinking
for High School Students
H. Scott Matthews, Troy Hawkins,
Chris Hendrickson, Joe Marriott,
Aurora Sharrard
AEESP - July 26, 2005
Acknowledgments
NSF (Grant #0328870, Tracking key metals in
product life cycles)
Judy Hallinen (CMU Center for School Outreach)
Shahzeen Attari, Joule Bergerson, Gyorgyi Cicas,
Cortney Higgins, Paulina Jaramillo, David Lieberman,
Joe Marriott, Deanna Matthews, Bill Morrow, Aurora
Sharrard, Karina Tipton, Chris Weber
AIU Overview
Allegheny Intermediate Unit (AIU) supports
and assists 42 local (small) public school
districts to provide educational opportunities
Collects resources and centrally coordinates
special programs
Do programs they otherwise could not do
Campus outreach coordinator wanted a program
High School Apprenticeship Program
One AIU effort is a pull out
program for high school students
Hosted by local professional
groups or colleges
Allows students to interact with
professionals in a field that interests
them
Selection based on student essays
and teacher recommendations
High-Level Goals
Systems/Life Cycle Thinking
Learn how to teach this to high school students
Motivate and Inspire Environmental Careers
Inform Civil/Environmental Engineering
Discipline
Pre and post assessments of students
Planned and delivered by CMU graduate and
undergraduate students (faculty assist)
Goals
Communicate to students:
Why we care about green design
The role of a green design
engineer
Students would participate in an
engaged and positive learning
experience
Students gain an understanding of the complicated trade-offs
involved in environmental decision-making
Students learn to think systematically
Develop ability to identify connections between various
stakeholders
Practice thinking in terms of life cycle, triple bottom line, and
sustainability
Green Design Apprenticeship Program
4 day-long sessions over academic year
15 students from 15 different schools selected from over 30
applicants
Interact with graduate students and faculty through research-
related activities
Variety of green design topics:
Sept. 20: Product life cycle
assessment
Oct. 11: Energy and transportation
Nov. 15: Green buildings and
sustainability indicators
Jan. 24: Brownfield redevelopment
Pre-Assessment Survey
Interested in engineering,
environment
Very traditional view of
civil/environmental
engineering (remediation)
Interested in what students
(inc. graduate) do
Wanted to improve problem solving
Specifically environmental decisions
Basically no concept of sustainability
Sample Schedule
9:35 Initial conceptions questions
9:50 Introduction to Green Design
10:10 Life Cycle Assessment Introduction
Product life cycle mapping example
10:35 Product life cycle mapping exercise (5 min.
intro, 5 mins. at each of 5 stations)
11:00 Extended research and mapping and small
group discussions with CEE professors
12:30 Presentations of various life cycle maps
1:00 Wrap-up discussion
1:45 Revised conceptions assessment

Targeted Learning Exercises
Drawing life cycle supply chain diagrams of
products
Understanding Interconnectedness
Price is Right game
Pricing of water: bottled, POTW, for agriculture
Benefits of high-efficiency appliances
Counting mobility: cars, bikes, buses, walkers
Estimating energy use
Life Cycle Visualization
Various beverage packaging
options
Use what you know to create a
simple process map for your
beverage container
Think about:
Emissions to Air, Water, Land
Greenhouse gases
Materials used
Where energy is used
Chemical reactions
Recycled material use
So which is better?
Functional units, tradeoffs, boundaries
Survey Results / Comments
Enjoyed interaction with faculty and students
Even simple products have complex supply
chains and life cycles
A lot of energy and resources go into making
and distribution of Coca-cola products.
Often what seems to be green is not when other
aspects are considered.
One small choice has a profound impact.
Long-term Vision
Expanded scope next year (more students,
sessions)
Weeklong summer program for HS students
and teachers
Improvement of EIO-LCA website for younger
audience
1

A STATICS CONCEPT INVENTORY: DEVELOPMENT AND PSYCHOMETRIC ANALYSIS

Paul S. Steif, Carnegie Mellon University
John A. Dantzler, Censeo Research Services

ABSTRACT

Quantification of conceptual understanding of students in
Statics has been undertaken. Drawing on a prior study
identifying the fundamental concepts and typical student errors
in Statics, multiple choice questions have been devised to probe
students ability to use concepts in isolation. This paper
describes a testing instrument comprising such questions, as
well as psychometric analyses of test results of 245 students at
five universities.

INTRODUCTION

It is increasingly appreciated that learning is tied to effective
assessment: monitoring student progress and feeding that
information back to students [1]. There are many aspects of
learning that can be assessed. However, if we seek to empower
students to transfer the knowledge gained to new situations,
then a deep understanding must be developed [2]. In many
engineering science courses, deep understanding is usually
associated with understanding of concepts. Thus, some effort
must be devoted to identifying core concepts and then to
devising means of gauging students understanding of those
concepts. This paper describes efforts to measure student
understanding of concepts in Statics, and to judge the
effectiveness of the resulting testing instrument.

Engineering Statics is a subject that is extremely worthy of this
heightened level of attention. Statics is a pivotal course in a
several engineering disciplines, preparing students for a number
of follow-on courses, such as dynamics, mechanics of
materials, and, ultimately, design. Instructors of these follow-
on courses, as well as instructors of engineering design, often
feel that student understanding of Statics is a major impediment
to their success in these courses. At the same time, instructors
are seeking to improve instruction in Statics. Judging such
instructional innovations should, at least in part, be based on
their ability to advance student understanding as captured by
clear, agreed upon measures. Thus assessment of conceptual
understanding can help instructors to gauge the effectiveness of
new teaching methods and approaches.

In the case of Newtonian mechanics, there have been efforts by
the physics education community [3,4] to identify its basic
concepts and associated misconceptions. These have lead to
the development of instruments for measuring conceptual
understanding in physics [5]. With the force Concept Inventory
(of Newtonian mechanics) as a model, there also have been
recent efforts in the engineering education community to
develop concept inventories for a variety of engineering
subjects [6], including preliminary efforts in Statics [7,8].

Little work has been devoted to identifying student
misconceptions in Statics specifically. This paper draws upon
the first authors recent effort to establish a conceptual
framework for Statics [9]. Four basic concept clusters were
proposed. The most common errors of students, identified
through collection and analysis of student work, were
identified, and these errors were explained on the basis of
inadequacies in student understanding of the concept clusters.
This paper will show how understanding of many of these
concepts can be gauged through multiple choice questions [10].
Results of administering a test composed of such questions to
245 students at five universities will be presented. Conclusions
regarding the psychometric effectiveness of the test and
implications for Statics instruction will be drawn.

CONCEPTS OF STATICS

One class of Statics problems that is directly relevant to
engineering systems involves the analysis of multiple,
connected bodies. The conceptual framework described by
Steif [9] was devised with this class of problems in mind, and it
consists of four clusters of concepts as follows:

C1. Forces are always in equal and opposite pairs acting
between bodies, which are usually in contact.

C2. Distinctions must be drawn between a force, a moment
due to a force about a point, and a couple. Two
combinations of forces and couples are statically
equivalent to one another if they have the same net
force and moment.

C3. The possibilities of forces between bodies that are
connected to, or contact, one another can be reduced
by virtue of the bodies themselves, the geometry of the
connection and/or assumptions on friction.

C4. Equilibrium conditions always pertain to the external
forces acting directly on a chosen body, and a body is
in equilibrium if the summation of forces on it is zero
and the summation of moments on it is zero.

Solving problems in Statics involves reasoning about physical
systems, translating the interactions between parts of systems
2
into the symbols and variables of Statics, and then deriving
meaningful relations between the variables based on the
principle of equilibrium. The concepts just described pertain
primarily to the modeling steps of translating features of the
system into symbols and variables. There are clearly also
important skills associated with carrying out mathematical
operations, such as resolving or combining forces and finding
moments due to forces. There are also less acknowledged skills
that involve reasoning about physical systems: recognizing the
distinct parts making up a mechanical system and discerning
how they are connected to one another. Thus, understanding
of concepts outlined above is critical to problem solving in
Statics, but additional skills are relevant as well.

CONCEPTUAL ERRORS IN STATICS

Certain types of errors that students make in solving Statics
problems recur with great frequency. Based on observations of
students work, and the sharing of experiences between
instructors at various institutions, these errors have been
identified and organized into categories [9]. Expressed
succinctly, these errors are:

1. Failure to be clear as to which body is being considered for
equilibrium

2. Failure to take advantage of the options of treating a
collection of parts as a single body, dismembering a system
into individual parts, or dividing a part into two

3. Leaving a force off the free body diagram (FBD) when it
should be acting

4. Drawing a force as acting on the body in the FBD, even
though that force is exerted by a part which is also included
in the FBD

5. Drawing a force as acting on the body of the FBD, even
though that force does not act directly on the body

6. Failing to account for the mutual (equal and opposite)
nature of forces between connected bodies that are
separated for analysis

7. Ignoring a couple that could act between two bodies or
falsely presuming its presence

8. Not allowing for the full range of possible forces between
connected bodies, or not sufficiently restricting the possible
forces

9. Presuming a friction force is at the slipping limit (N), even
though equilibrium is maintained with a friction force of
lesser magnitude

10. Failure to impose balance of forces in all directions and
moments about all axes

11. Having a couple contribute to a force summation or
improperly accounting for a couple in a moment summation

GAUGING CONCEPTUAL UNDERSTANDING WITH
MULTIPLE CHOICE QUESTIONS

To identify specific conceptual lapses, we have devised
questions that focus rather narrowly on particular concepts in
isolation. In each question students select from five choices;
this allows for simple quantification of performance. In
addition, the wrong answers to questions are specifically
chosen to reflect known conceptual errors exhibited by
students, such as those outlined above. As these questions are
intended to detect errors reflecting incorrect concepts, rather
than errors in mathematical analysis, most questions do not
involve computation. For those questions that involve
computation, the computations are extremely simple. Since
each wrong answer represents a correct computation based on
an incorrect conception, and the computations themselves are
trivial, such questions allow us to detect conceptual
misunderstanding. Such a question is shown below as the
friction sample.

The five classes of questions are as follows:

Free body diagrams
These questions capture a combination of concept cluster C1 on
the inter-body nature of forces and the first half of cluster C4,
namely, that equilibrium always pertains to a body. In these
questions, students must think about the forces that act on
subsets of a system. There are no complications associated
with the direction of forces, and any use of equilibrium is trivial
(for example, summation of forces in a single direction). Errors
3, 4, 5, and to a lesser extent 1 are at issue.

Static equivalence of combinations of forces and couples
In these questions, which capture concept cluster C2, students
must be able to determine whether one combination of forces
and couples can be replaced with another combination and still
maintain equilibrium. There is no issue of what forces and
couples are actually exerted by contacting bodies, but only the
equivalence between sets of vectors. However, the calculations
are trivial; thus, the focus is on understanding the distinctions
between force, moment and couple and their inter-relations
Errors 10 and 11 are at issue.

Type and direction of loads at connections (including
different situations of roller, pin in slot, general pin joint,
and pin joint on a two-force member)
These questions capture one aspect of concept cluster C3,
namely the simplifications in the forces between connected
bodies when the usual assumption of negligible friction is
made. Students must recognize the implications of the joint
regarding direction of force, and not be swayed by directions of
applied forces or orientation of members. Errors 7 and 8 are at
issue.

Limit on the friction force and its trade-off with
equilibrium conditions
These questions capture a second aspect of concept cluster C3,
namely reasoning about the forces between stationary
contacting bodies when the force at which slip occurs is
described by Coulomb friction. Error 9 is at issue.
3

Equilibrium conditions
These questions capture the second portion of concept cluster
C4, namely the necessity for both forces and moments acting
on a body to sum to zero. Errors 10 and 11 are at issue.

SAMPLE CONCEPTUAL QUESTIONS

Here we show samples of questions devised to test conceptual
understanding of free body diagrams, static equivalence, forces
at connections, friction forces and equilibrium.

Sample question on free body diagrams

Consider the configuration shown. A free body diagram is to
be constructed which includes blocks 2 and 3 and the cord
connecting them.

Which is the correct free body diagram?

Figure 1. Example of concept question addressing static equivalency

Notice that besides the correct response (d), the other
responses capture various types of generic errors. Options (a)
and (b) are two versions of the error of including a force
which does not act on the body being isolated. The operating
force is the tension of the attached cable (T
C
, T
E
, or T
F
). By
contrast, the weight, which is subtracted in case (a) and added
in (b), is a force between the earth and a block not in the
diagram, and should not be included. Choosing option (e)
probably signals a false equivalence between the rope tension
and the weight W
5
, stemming from failure to include the
weight W
4
which also is supported by the tension in cable E.
Option (c), which is widely chosen, reflects the failure to
reject forces which act between bodies, both of which are
included in the free body diagram.

From previous experience with variants on such questions,
when one option had a force obviously missing, students
apparently recognized it was wrong, perhaps by comparison
with the alternatives, and rarely chose it. Also, to address the
issue of equal and opposite forces between contacting bodies
(Newtons 3
rd
law), questions were tried which required free
body diagrams of two connected subsystems; those options
which failed to satisfy Newtons 3
rd
law apparently were
rarely chosen. Thus, while students often make those mistakes
in solving problems, we were not successful in devising
A B
C
D
E
F
1 2
3
4
5
6
W
3

W
2

W
3
W
2
W
3

W
2
W
3

W
2

W
3
W
2
T
B

T
B
T
B
T
B

T
B
T
C
T
C

T
C

T
C
T
C
- W
1

T
F

T
F

T
F

T
E

T
E
T
E

T
E
+W
4
/2
T
F
+W
4
/2
W
5 W
6
T
D
(a) (b) (c)
(d)
(e)
4
multiple choice questions that ferreted out those
misconceptions. New versions of such conceptual tests
currently being contemplated, which do not provide the crutch
of elimination by comparison, may be able to uncover such
misconceptions.

Sample question on static equivalence

One couple of magnitude 20 N-cm keeps the member in
equilibrium while it is subjected to other forces acting in the
plane at various points (shown at the left). The four dots
denote equally spaced points along the member.

Assuming that the same forces are applied at the left, what load(s) could replace the 20 N-cm couple and still maintain equilibrium?

Figure 2. Example of concept question addressing static equivalence
Option (a) captures the misconception that moving a couple
changes the moment that it exerts. Option (b) captures the
misconception that a force can be equivalent to a couple, if it
provides the right moment about a point. Option (c) makes the
impossible presence of a force apparently more palatable by
including a couple as well (that again produces the correct
moment about the original point). Option (d) while
appropriately leading to zero net force produces the wrong net
moment (the distance between the forces is ignored). The
correct option (e) is made slightly more difficult to choose
because the pair of forces is not centered about the point where
the original couple is applied, although this makes no
difference statically.

Sample question on simplification of forces between connected
bodies

The mechanism is acted upon by the force shown acting at 10.
It drives the vertical ram which punches the sheet. The
coefficient of friction between the rollers and the ram is 0.6.

What is the direction of the force exerted by the slot on the pin
of interest? (All pins have been identified as frictionless).

Figure 3. Example of concept question addressing
simplification of forces between contacting bodies.

This problem is based on the recognition that the force
associated with the frictionless pin contacting the surface of the
slot must act perpendicularly to the slot, irrespective of any
other forces acting or of the orientation of any members.
Option (a) tempts students with the incorrect possibility that the
force acts parallel to the member on which it acts (a common
5 cm
20 N-cm
other forces
5 cm 5 cm
30 N-cm
4 N
2 N
4 N
(a)
(b)
(c)
(d)
(e)
40 N-cm
2 N
20 N 20 N
20
10
Pin of
interest
(a)
20
(b)
10
(c)
(e)
20
(d)
20
5
assumption, perhaps tied to indiscriminant application of the
result for a two-force body). Option (b) is correct. Option (c) is
based on the misconception that the direction of the applied
force dictates the direction of the force that the slotted member
exerts in turn on the pin. Option (d) falsely takes the force to
be parallel to the slot (parallel to the relative motion between
the slot and the pin). Option (e) has the force acting in the
correct direction, but also features a couple as well. This option
may be tempting since the applied force produces a moment
about the pin in the indicated direction; it draws on the
misconception that a moment due to a force is tantamount to a
couple being exerted on the pin.

Sample question on trade-off between equilibrium and the
upper limit on the friction force

Two blocks are stacked on a table. The friction coefficient
between the blocks and between the lower block and the table is
0.2. (Take this to be both the static and kinetic coefficient of
friction). Then, the horizontal 10 N* force is applied to the
lower block. (N* denotes newtons.)

What is the horizontal component of the force exerted by the
table on the lower block?

(a) 4 N* (b) 6 N* (c) 8 N* (d) 10 N* (e) 18 N*
Figure 4. Example of concept question addressing trade off
between equilibrium and limit of friction.

In such problems, one needs to be cognizant both of the forces
that would be necessary to maintain equilibrium and of the
limits that friction might place on the magnitude of the forces.
Students have a tendency to make two errors: either to presume
the tangential force is automatically N (N is normal force) or
that the tangential force is the difference between the driving
force and what friction (N) takes away. We do not know the
origin of the second misconception, although it may be tied to
the idea that friction is not the force between bodies, but is due
to the roughness. In any event, option (a) would be arrived at if
the tangential force is the difference and if N is falsely taken to
be only 30 N*. Option (b) falsely takes the tangential force to
be N, and moreover takes N wrongly to be 30 N*. Option (c)
presumes the tangential force is the difference, but with N
correctly set to 90 N*. Option (d) is correct as it balances 10 N*
and is satisfies the friction condition (tangential force < N).
Option (e) takes the tangential force to be N, but at least with
N correctly set to 90 N*.

Sample question on equilibrium

The bar is maintained in equilibrium by a hand gripping the
right end (which is not shown). A positive upward force is
applied to the left end. Neglect the force of gravity.

Which of the following could represent the load(s) exerted by
the hand?

Figure 5. Example of concept question equilibrium conditions.

In maintaining the bar in equilibrium, the hand gripping the
right end must keep both the summation of moments and the
summations of forces in all directions equal to zero. Option (a),
while balancing forces, does not balance moments. Option (b)
does not balancing forces, but seems to balance moments
(though it does not). Option (c) seems to balance moments (say
about the lower right corner) by introducing the horizontal
force; however, now horizontal forces do not balance (and
moments cannot be balanced about all points). Option (d) is
correct. Option (e), like option (b), fails to balance the vertical
force.

STATICS CONCEPT INVENTORY

A test comprising a set of 27 questions was devised with the
following numbers of questions: free body diagrams (5), static
equivalence (3), force at connections (12), friction limit (3),
equilibrium (4). The group of questions addressing forces at
connections comprises four groups of three questions each;
these questions touch on forces on rollers, forces between pins
and slotted members, forces on two-force members and forces
at general pin joints. Thus, the inventory assesses a total of
eight concepts. This test is referred to here as the Statics
Concept Inventory or just the inventory. Questions within a
given category often have wrong answers that share common
misconceptions. Thus, ultimately, not only might one conclude
that a student has trouble with free body diagrams, but that the
misconceptions or errors tend to be consistently of one type.

The inventory was taken using pencil and paper by students in
the mechanical engineering department at Carnegie Mellon
University at the start and end of Statics. These students took
the test individually during a 50 minute class period, and did
not collaborate. (The CM students had had a 3 week exposure
to Statics in a freshman mechanical engineering course.) The
inventory was also taken by students at the end of Statics at
four other universities. (These universities ranged from a local
commuter school to an elite research university.) These
students were asked by their instructors to take the test by
computer outside of class in a time period of approximately one
60 N*
30 N*
= 0.2
= 0.2
10 N*
Table
(e) (d)
(c) (b) (a)
6
week. (Students downloaded a pdf file of the test, and then
entered answers either by filling in a prepared Excel
spreadsheet or preparing an email in a specified format, which
were then sent electronically to Carnegie Mellon for
processing.) Thus, it was impossible to monitor the time
student spent taking the test, or whether they had help while
doing the test. Students received credit for taking the test,
although not for their particular scores.

PSYCHOMETRICS

The first set of statistics pertains to the results for 245 students
from five universities taking the test at the end of a statics
course. In the combined sample, 77% (n=189) were male and
23% (n=56) were female. The racial/ethnic breakdown was
80% (n=196) white, 2% (n=5) black or African American, 13%
(n=32) Asian, 2% (n=4) were Hispanic, and 3% (n=8) were
from other racial/ethnic backgrounds. The mean score for the
overall sample was 15.71 with a standard deviation of 6.56. An
analysis of variance using gender and race/ethnicity as factors
was performed to examine differences between total test score
means. There was no significant interaction effect between
race and gender [F(4,235)=.417, p=.796], nor were significant
main effects of gender [F(1,235)=.426, p=.515] or
race/ethnicity [F(4,235)=1.578, p=.181] observed. The lack of
statistical significance between total test scores indicates that,
statistically, the means between groups are equal.

Table 1: Demographic breakdown of respondents.

Gender/Ethnicity

Number Mean SD
Male 189 16.37 6.70
Female 56 13.50 5.55

White 196 15.64 6.53
Black 5 16.20 6.57
Asian 32 17.50 5.89
Hispanic 4 16.00 9.63
Other 8 9.88 6.20

Total 245 15.71 6.56

Table 2: ANOVA table for demographic and race of
respondent.

Source

df

MS

F

p
Gender 1 17.493 0.426 .515
Race 4 64.835 1.578 .181
Gender X Race 4 17.143 0.417 .796
Error 235 41.087

ITEM ANALYSIS

Another important measure is the difficulty of various
questions; this is captured by the difficulty index, which is
merely the fraction of students who answer the given question
correctly [12]. Thus, higher values of the difficulty index
correspond to easier questions. In Figure 6, we display the
difficulty of questions, which ranged from a low of 0.31 to a
high of 0.85. Ideally, the test should be such that significant
gains over the semester could be observed, and that students
having significantly different levels of conceptual
understanding be distinguishable.

0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Question
D
i
f
f
i
c
u
l
t
y

o
r

D
i
s
c
r
i
m
i
n
a
t
i
o
n

Figure 6. Difficulty (left, lighter) and Discrimination (right,
darker) indices for items on SCI.

The discrimination index, which can range from -1 to +1,
captures how well students whose overall scores put them in
the top third of the sample performed on any particular question
in comparison with students in the bottom third [12]. The
closer the value is to 1, the more the test distinguishes between
students, and this is desirable. The discrimination index is also
shown in Figure 6 for each question. The discrimination index
ranged from a low of 0.26 to a high of 0.84. In general, one
seeks to have the discrimination index for all questions exceed
0.3. With one exception all items had discrimination indices
above 0.3. A vry low discrimination index may be indicative
of a poorly worded question.

RELIABILITY AND VALIDITY

A test or instrument is evaluated based on two major factors,
reliability and validity. Reliability refers to the consistency of
test scores over repeated administrations. An estimate of
reliability helps to determine how much of the variability in a
test is due to measurement error. Validity, on the other hand,
refers to whether or not the instrument measures what it was
intended to measure. While reliability is rather straightforward
to assess, validity is somewhat more difficult. There are three
main types of validity; content validity, construct validity, and
criterion-related validity. Assessing more levels of validation
helps to establish greater evidence of validity.

Reliability
Reliability can be measured using Cronbachs alpha reliability
coefficient which is a rating of reliability on a scale from 0 to
1.0. A high coefficient is evidence of good test reliability.
7
Cronbachs alpha reliability coefficient for the inventory was
found to be 0.89, which is strong evidence of reliability.

Content validity
Content validity refers to the ability of the test items to
represent the domain of interest. Each of the questions in the
inventory focuses on a major conceptual task faced in Statics,
and the distracters (wrong answers) were devised to single out
distinct errors made by students which could have a conceptual
basis. These errors, which were organized into categories [9],
were arrived at through several means, including the experience
of the first author as an instructor and those of colleagues at
two universities, also long time instructors of Statics. Errors
were also based on extensive analysis of written solutions to
Statics problems requiring the use of multiple Statics concepts.
Examples of errors from those solutions were reported earlier
[9], along with the above concept organization. Some of those
solutions were from students just beginning a Statics course
(who had some prior experience with Statics in a freshman
engineering course). Solutions to a second set of problems that
were analyzed were from students who had completed a Statics
course, and therefore displayed conceptual errors which persist
after a full semester of instruction in Statics. The analysis of
this latter set of problems was conducted by the first author as
part of a comparison between performance on an earlier version
of this inventory and other measures of performance in Statics
[11]. Similar types of errors were observed in these
comparison problems as are addressed in the inventory.

Criterion-Related Validity
Criterion-related validity refers to the level of agreement
between the test score and an external performance measure.
Predictive validity is a type of criterion-related validity that
refers to the ability of test scores to be predictive of future
criterion measures.

The underlying theoretical construct of this instrument is
Statics conceptual knowledge. If the test does indeed
measure statics conceptual knowledge, then it should correlate
well with an independent measure of statics knowledge. While
there is no other measure specifically of conceptual knowledge
in Statics (which is why this effort has been undertaken), an
understanding of concepts should be at least partially predictive
of overall success in a Statics course. To that end, we
compared grades of CMU students in the Statics course with
the performance on the inventory. The results are shown in
Table 3.

A Spearmans rho (P) correlation coefficient computed
between inventory score and course grade indicates a high level
of association between the two (P=-.547, n=105, p<.001). The
rho statistic is a measure of association between -1.0 to 1.0,
with results closer to the absolute value of 1 denoting a stronger
relationship between variables. Course grade is an ordinal
variable defined as follows: A=1, B=2, and C=3. The negative
association (P = -.547) indicates that as the inventory total
score increases, the Course Grade variable decreases. In other
words, as inventory score goes up, it is more likely that grades
are As instead of Cs. The strong and statistically significant
measure of association is evidence that the total score of the
inventory is a valid measure of statics conceptual knowledge.
More recent comparisons for other universities to be reported
also show significant positive correlations between inventory
scores and performance on exams in a Statics class.

Construct Validity
Construct validity refers to how well items measure the
underlying theoretical construct of the instrument. Do items
factor together in a logical, clean manner that is predicted in the
theory underlying the development of the test?

To answer this question, a confirmatory factor analysis (CFA)
was computed using LISREL 8.54. CFA allows the instrument
developer to describe a factor structure and then test the model
against the actual data. In the case of the inventory, an eight
factor model was tested. The eight factors, or subscales, are
Free Body Diagrams, Static Equivalence, Rollers, Two-Force
Members, Pin-in-Slot Joint, General Pin Joint, Friction, and
Equilibrium.

Table 3: Statics course grade compared with total score on the
Statics Inventory.

Statics
Inventory
Score
Course
Grade
A
Course
Grade
B
Course
Grade
C
Total
10 1 1
11 1 1
12 1 1
13 2 1 3
14 1 1
15 1 3 2 5
16 1 2 3
17 1 3 3 7
18 4 1 5
19 2 4 2 8
20 5 6 3 14
21 2 6 8
22 7 5 12
23 6 4 1 11
24 4 2 1 7
25 9 1 10
26 5 3 8
Total 42 44 19 105

Model fit determines how well the model fits the observed data
and can be assessed in a number of ways [13]. For the
inventory CFA, model fit was assessed using the chi-square
statistic, the Goodness of Fit Index (GFI), the Comparative Fit
Index (CFI), and the root mean square approximation
(RMSEA). Traditionally, model fit is assessed by comparing
the actual data matrix to the reproduced model data matrix
using the chi square statistic. If the chi-square statistic is
significant, then the model and actual data are significantly
different from one another and it can be said that the model
does not fit the data well. A non-significant chi-square statistic
may indicate a good fitting model; however, since the chi-
square statistic is strongly affected by sample size, other indices
of model fit are used to assess fit. GFI is a measure of the
relative amount of the observed variances and covariances
8
accounted for by the current model. A value of .90 or greater
indicates that the model fits the data well. CFI measures
improvement of the fit of the model in comparison to a null
model. A CFI value of .90 or greater indicates a well defined
model. The RMSEA values should be less than .05 for a close
model fit, and between .05 and .08 for an acceptable fit.

The inventory model of eight factors had a chi-square value of
314.82 (df=296, p=0.22), a GFI value of .90, a CFI value of
.91, and a RMSEA value of .067. The chi-square value
indicates that the reproduced matrix does not deviate
significantly from the original matrix. GFI and CFI values are
at the bottom of the acceptable ranges, and the RMSEA value is
in the acceptable range. While these values all indicate that the
theoretical factor structure is acceptable, there does seem to be
room for improvement. Further analysis of individual items
should lead to a better factor structure; however, the current
factor structure as analyzed by confirmatory factor analysis is
acceptable.

PRE-POST CHANGES IN PERFORMANCE

Key statistics from the pre- and post-tests at Carnegie Mellon
University are given in Table 4.

Table 4: Pre- and post test scores on Statics Concept Inventory
at Carnegie Mellon University.
Test N Mean S.D. Max. Min. Median
Pre 127 10.6 4.1 22 2 11
Post 105 20.34 3.5 26 10 20

In addition to the means, the fraction of correct answers
increased from the pre-test to the post-test for every question in
the inventory. One can see that performance on this test clearly
changed over the course of the semester. (A T-test which looks
at whether the statistical difference between pre- and post-test
scores could be associated with random variation, yielded a
value of t=-23.886, for 104 degrees of freedom, or p<.001. In
other words, the probability is less than 0.001 that the
difference in the pre- and post-test scores could be attributed to
random variation. Thus, this test does appear to measure an
ability which can change (markedly) with a semester studying
Statics, affirming its use as a tool to capture gain in conceptual
understanding.

INFERENCES REGARDING CONCEPTS THAT
STUDENTS FIND PARTICULARLY CHALLENGING

By studying the wrong answers that were chosen for various
questions, we can learn about common misconceptions that
persist even after instruction. Here we only consider the
implications of the fraction of total students choosing each
answer.

As pointed out early, even after instruction, many students
answered questions addressing the limits of friction incorrectly.
Although all wrong answers were chosen (in significant
numbers by the lower third of scorers), the most commonly
chosen answer corresponded to the assumption that the friction
force equals N, with N computed correctly (option (e) in
Figure 4). In fact, the lower third of scorers chose this option
more often than the correct answer for two out of three friction
problems.

Questions addressing static equivalence (e.g., Figure 2 above)
were also found to be answered incorrectly by many students.
All wrong answers save one were chosen by many students. In
two out of three such problems, students chose three wrong
answers more often than the correct answer. Both major types
of errors - having the net force be inconsistent with the original
and having the net moment inconsistent with the original
were made by many students. Likewise, in questions
addressing equilibrium (e.g., Figure 5 above), one could see
many examples in which answers indicated a neglect of force
equilibrium and many other examples of answers indicating a
neglect of moment equilibrium.

In the questions addressing free body diagrams (e.g., Figure 1
above), by far the most commonly chosen incorrect answers
were those with an internal force inappropriately placed in the
diagram. Option (c) in Figure 1 is an example of an internal
force.

In the remaining 12 questions addressing various connections,
for example rollers and slotted members (Figure 3 above), one
finds a variety of wrong answers. However, it is quite common
for students to presume that a force acts in directions that are
parallel to or perpendicular to one of the members, even if that
has nothing to do with the actual direction. Students also tend
often to make a choice which is consistent with the force
apparently balancing an applied force, again even if the force
locally cannot act in that direction.

For each of the question types in the inventory, one can
envision more detailed analyses that focus on patterns of errors
of individual students. As an example, in equilibrium
questions, some students might tend to choose answers in
which force equilibrium is violated, while others where only
moment equilibrium is violated. There are such distinct types
of errors in all questions. Such fine-grained information on
student thinking, if made available to instructor and student,
could lead to more targeted remedial instruction. Such analyses
will be undertaken and reported in the future.

POTENTIAL USES OF CONCEPT INVENTORIES

Concept inventories may be used to improve student learning in
many ways, a few of which are pointed out here. When
administered at the end of a course, the inventory can provide
the instructor with feedback on those concepts that may need
more attention in the future. Or, since most of the concepts
may have already been covered by, say, two-thirds through the
course, the test could be administered at that point. If the
results could be analyzed rapidly and provide diagnoses as to
conceptual lapses, then remedial exercises might be tailored to
address these lapses. An inventory could also be used at the
start of a follow-on course (e.g., dynamics or mechanics of
materials), to provide instructors with a picture of the starting
knowledge of their students. Finally, the questions themselves
might stimulate ideas for instruction that is more conceptually
based or might suggest in-class assessment exercises.

9
SUMMARY AND CONCLUSIONS

This paper has presented a new instrument for assessing
understanding of concepts of Statics. The Statics Concept
Inventory features multiple choice questions that reflect a
conceptual framework articulated previously; wrong answer
options (distractors) reflect misconceptions and errors
commonly observed in students work. The test comprises 27
such questions, addressing free body diagrams, static
equivalence, equilibrium, forces at connections and friction.
This test was administered to students before and after a Statics
course at Carnegie Mellon University, and to students at the
end of Statics courses at four other universities. Psychometrics
based on the sample of 245 test-takers indicated that the
inventory offers reliable and valid measures of conceptual
knowledge in Statics. On the basis of this test, one can infer
which concepts students in general tend to have the most
difficulties with, as well as the misconceptions that appear to be
most prevalent. Larger numbers of students have now taken
the Statics Concept Inventory. Instructors who are interested in
having their students take the test can contact the first author.
ACKNOWLEDGMENTS

The authors are very grateful to Andy Ruina for lengthy
comments on the concept questions through various revisions,
and to Anna Dollr and Marina Pantazidou for discussions of
the concepts of Statics.
REFERENCES

1. P. Black and D. William, Assessment and Classroom
Learning, Assessment in Education, Vol. 5(1), 1998, pp.
7-73.
2. National Research Council, 1999, How people learn:
Brain, mind, experience and school, Committee on
Developments in the Science of Learning, Bransford, J.D.,
Brown, A.L., Cocking, R.R. (Eds.), Washington, D.C.,
National Academy Press
3. Halloun, I.A. and D. Hestenes, The Initial Knowledge
State of College Physics Students, Am. J. Phys., Vol. 53,
1985, p. 1043.
4. Halloun, I.A. and D. Hestenes, Common Sense Concepts
about Motion, Am. J. Phys., Vol. 53, 1985, p. 1056.
5. Hestenes, D., Wells, M. and Swackhamer, G., Force
Concept Inventory, The Physics Teacher, Vol. 30, 1992,
p. 141.
6. D. Evans, C. Midkiff, R. Miller, J. Morgan, S. Krause, J.
Martin, B. Notaros, D. Rancor, and K. Wage, Tools for
Assessing Conceptual Understanding in the Engineering
Sciences, Proceedings of the 2002 FIE Conference,
Boston, MA.
7. Danielson, S., and Mehta, S..Statics Concept Questions
for Enhancing Learning, 2000 Annual Conference
Proceedings, American Society for Engineering Education,
June 18-21, St. Louis, MO. New York: American Society
for Engineering Education, 2000.
8. Mehta, S., and Danielson, S. Math-Statics Baseline
(MSB) Test: Phase I, 2002 Annual Conference
Proceedings, American Society for Engineering Education,
June 16-19, Montreal, Canada. New York: American
Society for Engineering Education, 2002.
9. Steif, P.S., An Articulation of the Concepts and Skills
which Underlie Engineering Statics, 34rd ASEE/IEEE
Frontiers in Education Conference, Savannah, GA,
October 20-23, 2004.
10. Steif, P.S., Initial Data from a Statics Concepts
Inventory, Proceedings of the 2004, American Society of
Engineering Education Conference and Exposition, St.
Lake City, UT, June, 2004.
11. P.S. Steif, Comparison Between Performance On A
Concept Inventory And Solving Of Multifaceted Problems,
33rd ASEE/IEEE Frontiers in Education Conference,
Boulder, Co., November 5-8, 2003.
12. Crocker, L. and Algina, J., 1986, Introduction to Classical
and Modern Test Theory, Harcourt Brace Javanovich, New
York.
13. Shumacker, R.E. & Lomax, R.G., 1996, A Beginner's
Guide to Structural Equation Modeling, Lawrence
Erlbaum Associates, Mahwah, N.J.

CMU - Teaching Portfolio Guide

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

CMU - Teaching Portfolio Guide

Încărcat de

Drepturi de autor:

Formate disponibile

Eberly Center for Teaching Excellence | 5000 Forbes Avenue, Cyert Hall 125 | Pittsburgh, Pennsylvania 15213

S-ar putea să vă placă și