Documente Academic
Documente Profesional
Documente Cultură
In Collaboration With
Quantity: 1000
Price: 150/-
2
TABLE OF CONTENTS
UNIT-1:..................................................................1
INTRODUCTION......................................................1
1.1 EVALUATION, ASSESSMENT, MEASUREMENT AND TEST:........1
1.2 THE PURPOSE OF TESTING..........................................30
1.3 GENERAL PRINCIPLES OF ASSESSMENT:..........................35
1.4 TYPE OF EVALUATION PROCEDURE................................37
1.5 NORM- REFERENCED AND CRITERION REFERENCED TEST:...43
1.6 EDUCATIONAL:........................................................45
UNIT-2:................................................................50
JUDGING THE QUALITY OF THE TEST.........................50
2.1 VALIDITY, METHODS OF DETERMINING VALIDITY:............51
2.2 FACTORS AFFECTING VALIDITY....................................54
2.3 RELIABILITY, AND METHODS OF DETERMINING RELIABILITY:
56
2.4 FACTORS AFFECTING RELIABILITY:...............................61
2.5 PRACTICALITY:........................................................64
UNIT-3:................................................................66
APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)......66
3.1 THE VALUE OF ITEM.................................................66
3.2 THE PROCEDURE/ PURPOSE OF ITEM ANALYSIS:................72
3.2 MAKING THE MOST OF EXAMS: PROCEDURES FOR ITEM
ANALYSIS:.......................................................................73
3.3 ITEM DIFFICULTY:....................................................91
3.4 THE INDEX OF DISCRIMINATION...................................93
UNIT-4:................................................................98
INTERPRETING THE TEST SCORES............................98
4.1 THE PERCENTAGE CORRECT SCORE:..............................98
4.2 THE PERCENTILE RANKS:.........................................108
4.3 STANDARD SCORES:................................................113
4.4 PROFILE:..............................................................115
1
UNIT-5:..............................................................117
EVALUATING PRODUCT, PROCEDURES & PERFORMANCE
........................................................................117
5.1 EVOLUTION THEMES AND TERMS PAPERS:....................117
5.2 EVALUATING GROUP WORK & PERFORMANCE................127
5.3 EVALUATING DEMONSTRATION:..................................131
5.4 EVALUATION OF PHYSICAL MOVEMENTS AND MOTOR SKILLS:
138
5.5 EVALUATING ORAL PERFORMANCE:.............................144
UNIT-6:..............................................................148
PORTFOLIOS......................................................148
6.1 PURPOSE OF PORTFOLIOS:.........................................148
6.3 GUIDELINE AND STUDENTS ROLE IN SELECTION OF PORTFOLIO
ENTRIES AND SELF-EVALUATION:.........................................156
6.4 USING PORTFOLIOS IN INSTRUCTION AND COMMUNICATION:
160
6.5 POTENTIAL STRENGTH AND WEAKNESSES OF PORTFOLIOS: 163
6.6 EVALUATION OF PORTFOLIO:......................................169
UNIT-7:..............................................................171
BASIC CONCEPTS OF INFERENTIAL STATISTS.............171
7.1 CONCEPT & PURPOSE OF INFERENTIAL STATISTICS:.........171
7.2 SAMPLING ERROR:..................................................173
7.3 NULL HYPOTHESIS:.................................................175
7.4 TESTS OF SIGNIFICANCE:..........................................177
7.5 LEVELS OF SIGNIFICANCE:........................................180
7.6 TYPE-I AND TYPE-II ERRORS: REMAINING:....................182
7.7 DEGREES OF FREEDOM:............................................186
UNIT-8:..............................................................191
SELECTED TESTS OF SIGNIFICANCE........................191
8.1 T-TEST:...............................................................191
8.2 CHI-SQUARE (X2):..................................................194
8.3 REGRESSION:........................................................199
2
FOREWORD
Knowledge is the main distinctive characteristic of human, due
to which hece of Allahys seems, by the gra ing a dream in the olden
dawas selected as vice-regent of Allah Almigthy. Man is superior to other
living beings, because he has the capability and potentiality to understand
as well as reason the consequences. Knowledge is obtained through the
continuous process of education. This process is usually a life long
process.
This is also a fact that education is such an activity which is bi-
lateral and participatory. It cannot be accomplished with out the two
partners-teacher and student. This activity requires a transmitter and
areceiver. If any one of them is missing the exercise would remain
incomplete.
To compare however, the two the- teacher appears superior to
his pupils as he is the organiser and director of the teaching learning
process. That is why since times immemoriable, search for significant
teachers has ever been in progress and the same is still going on. No dobt
the countable good teachers are there, but they are not countless. There is
a need of producing a countless number od genuine
educators/prospective educators to contribute in this regard.
This objective in view the people at the helm of the affairs are
trying their best to bring desirable changes in the education system,
teacher education curriculum and teacher training programmes. The best
teacher, being a dream in the old days, is about to become a reality, if the
curse outlines and syllabi are properly dispensed, it is hoped that the
required lot of teachers would be made available. The future
educators/teachers are needed to well equiped in all skills not confining
only to academic learning ignoring ITC, current affairs and contemporary
issues.
These objectives in view, improvements in the system are being
carried out to achiev the goals. The new curricula, on which is based, this
book of mine is the result of long deliberations and brain stormings
3
undertaken by the senior educators. This is now upto the implementers
and the students to benefit from the same in the best possible capacity.
The book is now in your hands and this is not claimed to be the
final word. There is always place for improvement. The author would be
highly obliged for any comments/recommendations, if conveyed to make
it further better and improved.
4
ACKNOWLEDGEMENT
5
Autho
r
6
7
UNIT-1:
INTRODUCTION
E
Feedback Loop
The model shows a fifth component, the Feedback Loop that
can be used by the teacher as both a management and a diagnostic
procedure. If the results of evaluation indicate that sufficient learning has
occurred, the loop takes the teacher back to the Instructional Objectives
component, and each successive component, so that plans for beginning
the next instructional unit can be developed. (New objectives are needed,
entering behavior is different, and methods will need to be reconsidered,)
But when evaluation results are not so positive, the Feedback Loop is a
mechanism for identifying possible explanations. (Note the arrows that
return to each component.) Were the objectives too vaguely specified?
Did students lack essential prerequisite skills or knowledge? Was the film
or text relatively ineffective? Was there insufficient practice opportunity?
Such questions need to be asked and frequently are. However, questions
need to be asked about the effectiveness of the performance assessment
procedures also, perhaps more frequently than they are. Were the test
questions appropriate? Were enough observations made? Were directions
clear to students? The Feedback Loop returns to the Performance
Assessment component to indicate that we must review and assess the
quality of out evaluation procedures, after the fact to determine the
appropriateness of the procedures and the accuracy of the information.
Unless the tools of evaluation are developed with care, inadequate
learning may go undetected or complete learning may be misinterpreted
as deficient.
In sum, good teaching requires planning for and using good
evaluation tools, Furthermore, evaluation does not take place in vacuum.
The BTM shows that other components of the teaching process provide
cues about what to evaluate, when to evaluate, and how to evaluate. Our
purpose is to identify such cues and to take advantage of them in building
tests and other assessment devices that measures achievement as
precisely as possible.
(B) Assessment decision maker who is concerned about all aspects of the
educational endeavour. The key point to consider and keep in mind is that
evaluation involves appraisal of particular goals or purposes. Useful
information may be obtained for evaluation procedures by both formal
and informal mean and should include information collected during
instruction as well as in the end of the course date. According to Ahmanrt
and Giock (1985) School Administrators, guidance personnel, classroom
teacher, and individual students require information that will allow them
to make informed and appropriate decision regarding their respective
educational activities. Ideally, they should be aware of all the alternatives
open to them, the possible outcomes of each alternative, and the
advantages and disadvantages of the respective outcomes, Educational
and psychological measurement can help individuals with these matters.
(C) Tyler, 1966: Airasian and Madaus. 1972: Gronlund 1976:
Thorndike and Hagen, 1977: rightly observe that the data
secured through testing procedures may have uses as give below:
First, measurement data may be employed in the placement of
students on one or another instructional programme. Usually pupils take
a pretest to measure whether they have mastered the skills that are
prerequisite to admittance to a particular course or instructional,
sequence. For instance, foreign language and mathematics programmes
are usually arranged in some hierarchical order so that achievement at
each level of learning depends on mastery of the preceding level.
The student is lead from the entering position in the hierarchy to
the terminating phase via intermediate steps, based upon the information
provided by a pretest a student can be placed:
(1) At the most appropriate point in the instructional sequence.
(2) In a programme with a particular instructional strategy on
(3) With an appropriate teacher.
Second, measurement data can be used in formative evaluation.
Tests are administered to students to monitor their success and to provide
them with relevant feedback. The information is employed les to grace a
student than to make instructions responsive to the student's strengths
anorweaknesses as identified by the measurement device, Mastery
learning procedures emphasize the use of formative tests to provide
detailed information about each student's grasp of a unit's objectives.
Third, measurement data has a place in diagnostic evaluation.
Diagnostic testing takes over where formative testing /eaves off When a
student fails to respond to the feedback corrective activities associated
with formative testing a more detailed search for the source of the
learning difficulty is indicated. Remediation is only possible when
teacher understands the basis of a student's problem and then designs
instruction to address the need.
Forth, measurement data may be used for summation purposes.
Such testing is employed to certify or grade students at the completion of
a course or unit of instruction. Often the result is `final' and follows the
student throughout his or her academic career (as in the case or college
and university transcripts). It is this aspect of evaluation that some
educators final particularly objectionable.
Fifth, measurement data are used by employers educational
institutions in making the selection by decisions. Many jobs and slots in
education& programme are limited in number, and there are more
applicants than positions. In order to identify the most promising
candidates standardized tests may be administered to the applicants. The
information provided by the tests presumably increases the accuracy and
objectivity of administrator's decisions. College Board examinations are
used by many universities in admitting students to graduate and
professional schools likewise employ data from standardized testing
programme make their entrance decisions.
Sixth, school officials in making curricular decisions in order to
evaluate existing programme use measurement data and to decide among
instructional alternative. School administrators need to assess their
students' current levels of performance the strengths and weaknesses of
the evidence.
Seventh, measurement data finds a place in personal decision-
making. Individuals confront a variety of choices at any number of points
in their lives. Should they attend college or pursue some other type of
post-high school training? What kind of Job seems most suited to their
needs? What sort of training programme should they enter? Measures of
interest, temperament, and ability can give individuals insights that can
prove helpful in the decision-making process.
Types of Assessment
Tests/ and other assessment procedures can be classified in
terms of their functional role in classroom instruction. One such
classification system follows the sequencer which assessment procedures
are likely to be used in the classroom. These categories classify the
assessment of student performance in the following – manner:
1. Placement assessment
To determine student performance at the beginning of
instruction.
2. Formative assessment
To monitor learning progress during instruction-
3. Diagnostic assessment
To diagnose learning difficulties during instruction.
4. Summative assessment
To assess achievement at the end of instruction.
Although a single instrument may sometimes be useful for more
than one purpose (e.g., both form formative and summative assessment
purposes), each of these types of classroom assessment typically requires
instruments specifically designed for the intended use.
All these types of assessment are discussed below in detail.
Placement Assessment
This is also called Need Analysis Assessment. Placement
assessment is concerned with the student's entry performance and
typically focuses on questions such-as the following: (1) Does the student
possess the knowledge and skills needecF to begin the planned
instruction? For example, is a student's reading comprehension at a level
that allows him or her to do the expected independent reading for a unit
in history, or does the beginning algebra student have a sufficient
command of essential arithmetic concepts? (2) To what extent has the
students already developed the understanding and skills that are the goals
of the planned instruction? Sufficient levels of comprehension and
proficiencies might indicate-the desirability of skipping certain units or
of being placed in a more advanced course. (3) To what extent do the
student's interests, work habits, and personality characteristics indicate
that one mode of instruction might be better than another (e.g., group
instruction versus independent study)? Answers to questions like these
require the use of a variety of techniques: records of past achievement,
pretests on course objectives, self-report inventories, observational
techniques, and so on. The goal of placement assessment is to determine
for each student the position in the instructional sequence and the mode
of instruction that is most beneficial.
Formative Assessment
According to Gron Lund (1990):
Formative assessment of work is used while it is in process of
being carried out so that the assessment affects the development of the
works.
Formative Assessment is a part of the instructional process.
When incorporated into classroom practice, it provides the information
-needed to adjust teaching and learning while they are happening. In this
sense, formative assessment informs both teachers and students about
student understanding at a point when timely adjustments can be made.
These adjustments help to ensure students achieve, targeted standards-
based learning goals within a set time frame. Although formative
assessment strategies appear in a variety of formats, there are some
distinct ways to distinguish them from summative assessments.
Formative assessment is used to monitor learning progress
during instruction; its purpose is to provide continuous feedback to both
student and teaching concerning learning successes and failures.
Feedback to students provides reinforcement of successful learning and
identifies the specific learning errors and misconceptions that need
correction. Feedback to the teacher provides information for modifying
instruction and for prescribing group and individual work. Formative
assessment depends heavily on specially prepared tests and assessments
for each segment of instruction (e.g., unit, chapter. Tests and other types
of assessment tasks used for formative assessment are most frequently
teacher made, but customized tests for publishers of textbooks and other
instructional materials also can serve this function. Observational
techniques are, of course, also useful in monitoring student progress and
identifying learning errors. Because formative .assessment is directed
toward improving learning and instruction, the results typically are not
used for assigning course grades.
Diagnostic Assessment
According to Gron Lund (1990):
Diagnostic assessment is concerned with those educational' problems
which remains unsolved even after the corrective prescription of
formative assessment.
Diagnostic assessment is a highly specialized procedure. It is
concerned with the persistent or recurring learning difficulties that are
left unresolved by the standard corrective prescriptions of formative
assessment. If a student continues to experience failure in reading,
mathematics, or other subjects, despite the use of prescribed alternative
methods of instruction, then a more detailed diagnosis is indicated. To
use a medical analogy, formative assessment provides first-aid treatment
for simple learning problems and diagnostic assessment searches for the
underlying causes of problems that do not respond to first-aid treatment.
Thus, diagnostic assessment is much more comprehensive and detailed.
It involves the use of specially prepared diagnostic tests as well as
various, observational techniques. Serious learning disabilities also are
likely to require the services of educational, psychological, and medical
specialists, and given the appropriate diagnosis, the development of an
individualized education plan (IEP) for the student. The aim of diagnostic
assessment is to determine the causes of persistent learning problems and
to formulate a plan for remedial action.
Summative Assessment
The assessment that is carried out at the end of a piece of work is called
summative assessment.
Summative assessment typically comes at the end of a course
(or unit) of instruction. It is designed to determine the extent to which the
instructional goals have been achieved and is used primarily for
assigning course grades or free certifying student mastery of the intended
learning outcomes. The techniques used in summative assessment are
determined by the instructional goals, but they typically include teacher
made achievement tests, ratings on various types of performance (e.g.,
laboratory, oral report), and assessments of products (e.g., themes,
drawing, research reports). These various sources of information about
student achievement may be systematically collects into a portfolio of
work that may be used to summarize or showcase the student's
accomplishments and progress. Although the main purpose of summative
assessment is grading, or the certification of student achievement, it also
provides information for judging the appropriateness of the course
objectives and the effectiveness of the instruction.
1.1.3 Measurement
Meaning &Definition of Measurement
Literally the verb measure means to find or determine the 'size',
`quantity' or 'quality' of anything. According to Chambers Dictionary the
term 'measure' means `to find out the size or amount of something'.
"Measurement" in the International Dictionary of Education (by G Terry
Page & J.B. Thomas) means "the act of finding the dimension of any
object and the quantity found by such an act.
The 'Oxford Advance Learner's Dictionary defines
`measurement' as the 'standard or system used in stating the size, quantity
or degree of something.' It is the way of assessing something
quantitatively. It answers the question "How much?" In other words we
can say that measurement is the quantitative aspect of evaluation. With
the help of measurement we can easily describe students' achievement by
telling their scores. These definitions show that 'measurement' is the
quantitative assessment of something. Now let's see how the term is
defined specifically in education. L. R. Gay, (1985) defines measurement
as "a process of quantifying the degree to which someone or something
possesses a given trait, i.e. quality, characteristics or features."
Educational Measurement
(The concept of measurement in education)
In Education, the term 'measurement' is used in its specific
meanings. It is the quantitative assessment of the performance of a
student, teacher, curriculum or an educational program. We can say that
the quantitative score used for educational evaluation is called
measurement. The term is used for the data collected about student or
teacher performance by using a measuring instrument in a given learning
situation. It shows the exact quantity or degree of the performance, traits
or character of the person or thing to be measured. For example instead
of saying that Hamid is underweight for his age and height, we can say
that Hamid is 18 years old, 5' 8" tall, and weight only 85 pounds.
Similarly, instead of saying that Hamid is a more intelligent than Zahid,
we can say that Hamid has a measured. IQ of 125 and Zahid has a
measured IQ of 88. In each of the above cases, the numerical statement is
more precise, more objective and less open to interpretation than the
corresponding verbal statement.
Steps of measurement
There are two steps used for in the process of measurement. The
first step is to devise a set of operations to isolate the attribute and make
it apparent to us. Just a standard is used for judging the durability of a
thing, in the same way educators and psychologists use various methods
for testing the behaviour or performance of a student. For this purpose
they often use Stanford-Binet Tests or other tests that include operations
for eliciting behaviour that we lake to be indicative of intelligence.
The second step in measurement is to express the results of the
operations established in the first step in numerical or quantitative terms.
This involves an answer to the questions, how many or how much? Just
millimetre is used as a unit for indicating the thickness of a thing, in the
same way educators and psychologists use some numerical units for
gauging intelligence, emotional maturity and other attributes. Thus each
step in measurement rests on human- fashioned definitions. In the first
step, we define the attribute that interests us. In the second step, we
define the set of operations that will allow us to identify the attribute, and
express the result of our operations.
Difference between Evaluation and Measurement
Some people use 'evaluation' and 'measurement' in the same
meaning. Both the terms are used for the process of assessing the
performance of the student and collecting information about an
educational objective. Both tell how effective the school programme has
been and refer to the collection of information, appraisal of students, and
assessment of programme. Some recognize that measurement is one of
the essential components of evaluation. But there is difference between
the two terms. Roughly speaking, `measurement' is the quantitative
assessment whereas 'evaluation' is the quantitative as well as qualitative
assessment of the performance of a student or an educational objective.
Measurement is a limited process used for the assessment of limited and
specific educational objectives. On the other hand, evaluation is much
more comprehensive term used for all kinds of educational objectives.
Moreover, for measurement Evaluation is the continuous inspection of all
available information concerning the student, teacher, educational
programme and the teaching- learning process to ascertain the degree of
change in students and form valid judgements about the students and the
effectiveness of the programme. On the other hand 'measurement' is the
collection of data about the performance of a student, teacher or
curriculum etc.
However, both 'evaluation' and 'measurement' are closed closely
related. We cannot separate one from the other. Both are used for
assessing the effectiveness of a programme of the appraisal of student.
Measurement collects data directly from the objects of concern of the
students. Other information is collected from students by non-testing
procedures. Information provided by testing and non-testing is the best
thought of material to be used in the evaluation process.
The Importance of Measurement in Education
Measurement plays very important role in the teaching-learning
process. Without measurement we cannot assess the effectiveness of an
educational programme, the school or its personnel. For effective
teaching, it is necessary for the teacher to be aware of the strengths and
weaknesses of his teaching method. Similarly, for an effective learning, it
is necessary for the student to be aware of the possible outcomes of all
the alternatives. He should also be informed about the advantages and
disadvantages of the respective outcomes. All this is impossible without
measurement. Without measurement, how can a teacher be aware his
method of teaching or how a student can be informed about the outcomes
of the alternatives. Without measurement, evaluation is impossible and
without evaluation we cannot get knowledge of the effectiveness of an
educational programme. Measurement tells us about the characteristics of
students, their progress in studies and their achievements in various
subjects. It also tells how much or to what extent the instructional
objectives of the school and the individual classroom teacher being
achieved? Measurement serves as a guideline for students to develop
their educational and vocational plans for the future. With the help of
measurement, information is gathered about school programmes,
policies, and objectives. This information is conveyed to parents and
other members of the community. Similarly, measurement data are used
by employers and educational institutions in making the selection by
decision. With the help of standardized tests, the administrators collect
information about every applicant. The information provided by the tests
increases the accuracy and objectivity of administrators & decision
makers. In this way measurement data are employed by school officials
in making curricular decisions.
In short, measurement occupies the central place in the process
of teaching and learning. It is the only mean through which the
educational condition can be improved.
The Function of Measurement and Evaluation
Measurement' and 'evaluation' are interdependent ("N. We
cannot separate one from the other just as we cannot separate the two
sides of a coin. Evaluation is the qualitative aspect of anything, which is
based on the quantitative value (measurement) of that thing. Without
measurement we cannot make an exact evaluation of a thing.
In this respect evaluation and measurement perform the same
functions in the education.
Cron Back, in his book "Essentials of psychological testing" has
discussed the following functions of measurement and evaluation.
(1) Effectiveness of Educational Programme
In education, the concerned people and personnel must be aware
of the effectiveness of an educational programme. This is possible only
by making an evaluation of that programme. By evaluation a teacher is
able to know as to what extent the method of teaching is effective. He is
also able to know as to what extent the equipment of laboratory is
effective. This will enable him to improve his method of teaching make
learning process effective.
(2) Prediction
After evaluation it is possible to predict the performance of
students in future. By evaluation we know the aptitude and interest etc.
with the help of which we guide them to take admission in institution
which is according to his aptitude and interest. So, on the basis of
evaluation we can plan for the future.
(3) Selection
Measurement and evaluation is used during the selection of
suitable persons for different jobs in Govt. as well as semi Govt.
departments.
(4) Classification
Evaluation is helpful in the classification in all educational
institutions. At the end of every year, some tests are given to students to
check their ability and make classification on the basis of results obtained
from these tests.
Another educational psychologist, Camp, adds that evaluation
plays important function in making maladjusted students, students as
useful members of the society by finding their interests and attitudes.
Students suffering from inferiority complex can also be treated after their
proper evaluation.
In short, evaluation and measurement have important functions
in education. They serve as guidelines for students, teachers, ' counsellors
and administrators.
1.1.4 Test
Measurement and evaluation are the two processes that are used
to collect information about the strengths and weaknesses of an
educational programme or the performance of a student, teacher or other
personnel. But these processes need some instruments for their
operations. Such instruments are called tests. So, the instruments that are
used to measure the sample of students' behaviour under specific
conditions are called tests. In other words we can say that:
"A test is a systematic procedure for measuring a sample of students'
behaviour under specific conditions."
Some other definitions of test are given below:
1. A procedure for critical evaluation; a means of determining the
presence, quality, or truth of something.
2. A series of questions, problems, or physical responses designed
to determine knowledge, intelligence, or ability.
3. The means by which the presence, quality, or genuineness of
anything is determined: (e.g. a test of a new product.)
4. The trial of the quality of something: (e.g. to put to the test.)
5. A particular process or method for trying or assessing.
6. A set of problems, questions, etc., for evaluating abilities or
performance.
A test consists of a number of questions to be answered, a series
of problems to be solved, or a set of tasks to be performed by the
examinees. The questions might ask the examinees to define a word, to
do arithmetic computations, or to give some information. The questions,
problems and tasks are called test items.
Difference between Test, Measurement and Evaluation:
William Wiersma and Stephen G. Jurs (1990) in their book
"Educational Measurement and Testing" remarks that the terms of
Testing, measurement, assessment and evaluation are used with similar
meanings but they are not synonymous though they are related with each
other. They define these terms as follows:-
Test: "(It) has a narrower meaning than either measurement or
assessment. Test commonly refers to a set of items or questions under
specific conditions. When a test is given, measurement takes place;
however, all measurement is not necessarily testing".
Measurement: "For all practical purposes assessment and measurement
can be considered synonymous. When assessment is taking place,
information or data are being collected and measurement is being
conducted".
Evaluation: "Evaluation is a process that includes measurement and
possibly testing but it also contains the notion of a value judgment. If a
teacher administer a test to a class and computes the percentages of
correct responses, measurement and testing have taken place. The scores
must be interpreted which may mean converting them to values like As
Bs Cs and so on or judging them to be excellent, good, fair or poor. This
process is evaluation because the value judgments are being made".
Another distinction is given by Normane E. Gronlund (1985)
who defines these terms as follows in the book "Measurement and
Evaluation in Teaching".
Test: "An instrument or systematic procedure for measuring a sample of
behaviour. (Answers the question "How well does the individual
perform-either in comparison with others or in comparison with a domain
of performance tasks"?
Measurement: "The process of obtaining numerical description of the
degree to which an individual possesses a particular characteristic.
(Answers the question "How much?").
Evaluation: "The systematic process of collecting, (Classroom)
analyzing and interpreting information to determine the extent to which
pupils are achieving instructional objectives. (Answers the question
"How good").
Similarly Anthony J. Nitko (1983) in his book "Educational
Tests and Measurement" makes the distinction between Test,
Measurement and Evaluation in the following words:
Tests: "Tests are systematic procedures for observing persons and
describing them with either a numerical scale or a category system. Thus
tests may give either qualitative or quantitative information".
Measurement: "Measurement is a procedure for assigning numbers to
specified attributes or characteristics of persons in a manner that
maintains the real world relationships among persons with regard to what
is being measured".
Evaluation: "Evaluation involves judging the value or worth of a pupil
of an instructional method or of an educational program. Such
judgements may or may not be based on information obtained from
tests".
Robert L. Ebel and David A. Frisible (1986) in their book
"Essentials of Educational Measurement" rightly observe.
"All tests are a subset of the quantitative tools or techniques that
are classified as measurements. And all measurement techniques are a
subset of the quantitative and qualitative techniques used in evaluation."
Table showing relationship Between Testing, Measurement and
Evaluation:
Test Measurement Eva
An instrument or systematic The process of obtaining a A systemati
procedure for measuring a numerical description of the collecting and
sample of behavior degree to which an individual order to make
possesses a particular
characteristic”
Answers the question ‘How Answers the question ‘How It answers the
well does the individual much?’
perform-as compared to others.
It is means of collecting It gives Numerical Value to some Involves
information trait. quantitative
decision-maki
Its objective is to find out the Its objective is to present the Its objective
facts pertaining to some aspect. information objectively. decisions abo
of educational
Test is only a instrument to Measurement quantifies data and Depends up
obtain data is essential part of evaluation measurement
Types of Tests
TESTS
MA
I.Q = CA × 100
Where:
I.Q. = Intelligence Quotient
M.A. = Mental Age
C.A. = Chronological Age (Physical Age)
(B) Personality Tests
Tests used for the assessment of personality of a student are
called personality tests. They measure the typical performance of a
student i.e. what a student will do. They are universally administered
almost all over the world in various fields, vocations, institutions, and for
the selection of recruits. In Pakistan, too, personality tests are used for
job selection and for the selection of army recruits like ISSB
examinations. Personality tests include attitude tests, interest tests,
adjustment and temperament tests, character tests, and tests of other
motivational and interpersonal characteristics.
Uses of Tests
Tests play important role in teaching- learning process. Without
tests we cannot make evaluation or assessment of a student's or neither
teacher's performance nor we can collect information about the
effectiveness of an educational programme. That is why tests are very
important in education. They motivate students for learning. They serve a
number of purposes in a variety of educational activities. The following
are the different uses of tests;
1. Uses of tests in teaching process
With the help of the result obtained from tests, teachers can
easily collect information about aptitude, intelligence, interests, attitude
and the overall performance of the students. He comes to know the
strengths and weaknesses of his teaching method. It becomes easy for the
teacher to grade students in a subject. Teats' results enable him to know
how the future success of a student in a subject can be predicted.
2. Uses of tests in learning process
The student is the centre of intere ;t in teaching –learning
process. All kinds of educational activiti .;s are performed for th sack of
student. That is why the use and importance of tests in th process of
learning is greater than in any other activity. Tests hel ,students in
knowing their strengths and weaknesses in a subject. The resul,S
obtained from these tests serve as guideline for students. They motivate
students to study.
3. Uses of Tests in Guidance
Tests show the overall performance of the students. Therefore;
they enable the examiner to know how to guide students educational and
vocational choice. Tests also make parents aware o the aptitude of their
children and can make a plan for their proper guidance. The result of the
tests in itself serves as a guideline for the students.
4. Uses of Tests in Administration
The results obtained from the tests provide the administrators of
the deportment with useful information In the light of these tests, they
can easily decide how to promote students, how to admit them an&how
to modify (trie.7) school objectives, instructional methods and curricula.
They can then easily decide how to make the teaching–learning processes
effective.
5. Uses of Tests in Research
The data collected from tests are uses as powerful tools in
research and experimentation in classroom. The research workers use
these data in their genetic or ease study research.
In short, tests are used in almost all educational activities. They
are the real tools with the help of which information about teachers,
students, curricula and etc. are gathered. And in the light of this
information, teaching and learning process is improved.
Student
Programme Evaluation
Evaluation
Only some area readily land themselves for listing specific test
can be built and this may be a constructing element for teacher.
1.6 EDUCATIONAL:
“Educational assessment can be defined as the process of
documenting knowledge skills, attitude and beliefs”.
Or
“The process of collecting synthesizing and interpreting
information to assessment.”
General Principles of Assessment:
Following are the main principles of assessment.
1. Clearly specify what is to be assessed has priority in the
assessment process.
2. An assessment procedure should be selected because of its
relevance to the characteristics or performance to be measured.
3. Comprehensive assessment requires a variety of procedures.
4. Proper use of assessment procedures requires an awareness of
their limitations.
5. Assessment is a means to an and not an end in itself.
Clearly specify what is to be assessed:
General statements from, content standard or from course
objectives can be a helpful starting point but in most cases teachers needs
to be more specific for assessment process to be effective. Thus
specification of the characteristic to be measured should precede the
selection or development of assessment procedures. Specify the intended
learning goals before selecting the assessment procedure to use.
Example:
Content standard in the field of physics night specify that
students. Should understand idea documents in field of physics.
1. Assessment may be in the form of multiple choice.
2. Short answer
3. Essay questions
4. Numerical questions
To establish assessment priorities for such a standard teacher
needs to answer the questions such as the following.
Q1. What idea?
Q2. What document?
Q3. What concepts of physics?
The general statement in standard does not answer such
questions, but they must be either explicitly or implicitly, to develop
assessments.
Assessment must be relevant to the performance to be measured:
Assessments procedures are frequently selected on the basics of
their objectivity, accuracy or convenience although there criteria are
important they are secondary to main criterion.
Examples:
If teachers goal is that students should learn written skills or
creative writing, composition, sentence structure so the if multiple choice
will be option for assessment then it will be poor one, teacher must
include story writing, easy, summaries or such type of things for
improving writing stills of a child.
“Close match between the intended learning goals and type of
assessment is must”.
Comprehensive assessment requires a variety of procedure:
A lot of procedures are required to assess the knowledge of a
person about anything. Things which are to be assessed also play a vital
role in connectivity with the procedure some of the procedures are given
below.
Multiple choice
Short answer
Essay test
Written projects
Observational technique
Definition: Test percentile scores are just one type of test scores you will
find on your child's testing reports. Many test reports include several
types of scores. Percentile scores are almostalways reported on major
achievement that are taken by your child's entire class. Percentile scores
will also be found on individual diagnostic test reports. Understanding
test percentile scores is important for you to make decisions about your
child's special education program.
Test percentile scores commonly reported on most standardized
assessments a child takes in school. Percentile literally means
perhundred. Percentile scores on teacher-made tests and homework
assignments are developed by dividing the student's raw score on her
work by the total number of points possible. Converting decimal scores
to percentiles is easy. The number is converted by moving the decimal
point two places to the right and adding a percent sign. A score of .98
would equal 98%.
Test percentiles on a commercially produced, norm-referenced or
standardized test, are calculated in much the same way, although
thecalculations are typically included in test manuals or calculated with
scoring software.
If a student scores at the 75th percentile on a norm-referenced test, it
canbe said that she has scored at least as well, or better than, 75 percent
of students her age from the normative sample of the test. Several
othertypes of standard scores may also appear on test reports.
Percentile rank
From Wikipedia, the free encyclopedia
The percentile rank of a score is the percentage of scores in its frequency
distribution that are the same or lower than it. For example, a test score
that is greater than or equal to 75% of the scores of people taking the test
is said to be at the 75th percentile rank.
Percentile ranks are commonly used to clarify the interpretation of scores
on standardized tests. For the test theory, the percentile rank of a raw
score is interpreted as the percentages of examinees in the norm group
who scored at or below the score of interest.mm
Percentile ranks (PRs) are often normally distributed (bell-shaped) while
normal curve equivalents (NLEs) are uniform and rectangular in shape.
Percentile ranks are not on an equal-interval scale; that is, the difference
between any two scores is not the same between any other two scores
whose difference in percentile ranks is the same. For example, 50 _ 25 =
25 is not the same distance as 60 _ 35 = 25 because of the bell-curve
shape of the distribution. Some percentile ranks are closer to some than
others. Percentile rank 30 is closer on the bell curve to 40 than it is to 20.
The mathematical formula is
ce+0.5 fi
N X 100%
where c is the count of all scores less than the score of interest, f is the
frequency of the score of interest, and Nis the number of examinees in
the sample. If the distribution is normally distributed, the percentile rank
can be inferred from the standard score.
8. Improper arrangement:
Test items are arranged in order of difficulty, with the easiest
items first. Placing difficult items early may cause pupils to
spend too much time.
9. Identifiable pattern of answers:
Placing correct answers in some systematic patern will enable
pupils to guess the answers more easily.
Methods of Determining Validity:
There are several methods of determining the validity of
measuring instruments which we may call.
1. Content Validity:
Content validity is evaluated by showing how well the content
of the test samples the class of situations. It is especially
important in the case of achievement and proficiency measures.
It is also known as “face validity”.
2. Concurrent Validity:
It is evaluated by showing how well test scores correspond to
already accept measures of performance or status made at the
same time. For example, we may give a social studies class a
test on knowledge of basic concepts in social studies and at the
same time obtain from its teacher report on these abilities as far
as pupils in the class are concerned. If the relationship between
the test scores and the teacher’s report of abilities is high. The
test will have high concurrent validity.
3. Predictive Validity:
It is evaluated by showing how well prediction made from the
tests are confirmed by evidence gathered at some subsequent
time. When the tester wants to estimate how well a student may
be able to do in college courses on the basis of how well he has
done on test he took in secondary schools.
4. Construct Validity:
It is evaluated by investigating what psychological qualities a
test measures. It is ordinarily used when the tester has no
definitive criterion measure of what he is concerned with and
hence must use indirect measures. This type of validity is
usually involved in such tests as those of study habits,
appreciations, understanding and interpretation of data.
Conclusion:
In short we can say that validity is specific to the purpose and
situation for which a test is used. A test can be reliable without being
valid but the converse is not true in other words. It is conceivable that a
test can measure some quality with a high degree of consistency without
measuring at all the quality it was actually intended to measure.
Reliability (K R Z1) =
K
K −1 [
1−
M ( K −M )
K S2 ]
K= the number of items in the test.
M= mean of the test scores;
S= standard deviation of the test scores.
Summary: The following methods are used for determining reliability of
a test.
A Test – Retest method i. Immediate (without interval)
B Equivalent form method ii. With time interval
C Split half method iii. Immediate
D Kuder – Richardson formula iv. With interval
5. Parallel form Reliability:
When the different sets or different parts of a test (suppose
questionnaire a and questionnaire B) are developed but they
must have a linkage (in a sense of knowledge, skills and
behaviors) and then these assessments instruments are subjected
on the same group. The result obtained from these groups are
then correlated which can show the reliability of the test in
regards of the alternate sets of instruments.
6. Inter - rater method of Reliability:
The Measures of the reliability about the different judges agree
upon the decisions about the assessment is called inter rater
method of rebility. The answers cannot effectively interpret by
human observes and for that very purpose the inter-rater
reliability is of utmost importance.
The examinee:
Fatigue burden, lack of motivation, carelessness.
Trait of Test:
Ambiguous items, poorly worded direction tricky questions in
familiar format.
Conditions of test- taking and marking:
Poor examination condition, excessive heat or cold carelessness
in marling, disregards or lack of clear standards for scoring,
computational errors.
There are also some factors which affects on reliability, which
are as under:
1. A very important factor influencing test reliability is the number
of test items. That is the greater number of items in a test, the
more reliable the test.
2. Other things being equal the narrower the rang of difficulty of
the items of a test the granter the reliability.
3. Evenness in scaling is factor influencing the reliability of a test
other things being equal a test every scaled is more reliable than
a test that has gaps in the scale of difficulty of its items.
4. Other things being equal, inter-dependent items tend to decrease
the reliability of a test.
5. The more objective the scoring of a test the more reliable is the
test.
6. Chance in getting the correct answer to an items is a factor
which lowers the test reliability.
7. Other things being equal, the more homogenous the material of
a test the greater its reliability.
8. Other thing being equal, the more common the experiences
called for in a test are the members of the group taking the test
more reliable the test.
9. Other things being equal the same test given late in the school
year (i.e. after covering the unit in the class) is more reliable that
when given Carly in the year (i.e. without teaching the unit).
10. Other things being equal, each, question in a test lower the
reliability of test. A test answered by the systematic relall or
recognition of orderly facts or experience is more reliable than a
test answered by sudden insight because of novelty.
11. Lengthy items lower the reliability because certain factors in the
item will be over or under estimated.
12. Inadequate or faulty directions, failure to provide suitable
illustrations of the task lower the reliability.
13. Strange or unusual words of items lower the reliability.
14. The accuracy with which a test is timed is an important factor in
test reliability.
15. Difference in incentive and effort tend to make tests unreliable.
The appeal of a test is stronger with some individuals than with
others, and is stronger with an individual at one time than at
another.
16. Accidents occurring during the examination such as breaking a
pencil, running out of link, or defective test booklets influence
the reliability of the test. Outside disturbances also lower the
reliability.
17. The interval between the test and retest is important for
reliability estimate.
18. Cheating in the examination is another factor which lowers the
reliability because the score of the individual may increase or
decrease unduly.
19. Illness, worry, excitement though less important still they
influence the reliability of the test.
References
Murad Ali Katozai 1st Edition, June, 2013 Measurement and
Evaluation.
Dr. Mohammad Nooman & Obaid Ullah 1st Edition June 27th
2013 A Manual of Educational & Social Science and Research
Methodologies.
2.5 PRACTICALITY:
Meaning:
The word “Practicality” means “feasibility” or “us ability”.
A test will be practicable if it is easy to administrated, easy to
interpret and economical in operation. A good test is that which have
sufficiently simple instructions so that it can be administered even by a
person of low level intelligence. Tests having difficult instructions and
requiring high level training for administering them and expensive for
wide use in schools are social to have low usability or practicability.
Practicality refers to the economy of time effort and money in testing. In
other words a test should be.
Easy to design
Easy to administer
Easy to interpret
Item Number
This is the question number taken from the student answer
sheet, and the ScorePak® Key Sheet. Up to 150 items can be scored on
the Standard Answer Sheet.
Item Difficulty
For items with one correct alternative worth a single point, the
item difficulty is simply the percentage of students who answer an item
correctly. In this case, it is also equal to the item mean. The item
difficulty index ranges from 0 to 100; the higher the value, the easier the
question. When an alternative is worth other than a single point, or when
there is more than one correct alternative per question, the item difficulty
is the average score on that item divided by the highest number of points
for any one alternative. Item difficulty is relevant for determining
whether students have learned the concept being tested. It also plays an
important role in the ability of an item to discriminate between students
who know the tested material and those who do not. The item will have
low discrimination if it is so difficult that almost everyone gets it wrong
or guesses, or so easy that almost everyone gets it right.
To maximize item discrimination, desirable difficulty levels are
slightly higher than midway between chance and perfect scores for the
item. (The chance score for five-option questions, for example, is 20
because one-fifth of the students responding to the question could be
expected to choose the correct option by guessing.) Ideal difficulty levels
for multiple-choice items in terms of discrimination potential are:
Format Ideal Difficulty
Five-response multiple-choice 70
Four-response multiple-choice 74
Three-response multiple-choice 77
True-false (two-response multiple-choice) 85
(from Lord, F.M. "The Relationship of the Reliability of Multiple-Choice
Test to the Distribution of Item Difficulties," Psychometrika, 1952, 18,
181-194.)
ScorePak® arbitrarily classifies item difficulty as "easy" if the
index is 85% or above; "moderate" if it is between 51 and 84%; and
"hard" if it is 50% or below.
Item Discrimination
Item discrimination refers to the ability of an item to
differentiate among students on the basis of how well they know the
material being tested. Various hand calculation procedures have
traditionally been used to compare item responses to total test scores
using high and low scoring groups of students. Computerized analyses
provide more accurate assessment of the discrimination power of items
because they take into account responses of all students rather than just
high and low scoring groups.
The item discrimination index provided by ScorePak® is a
Pearson Product Moment correlation2 between student responses to a
particular item and total scores on all other items on the test. This index
is the equivalent of a point-biserial coefficient in this application. It
provides an estimate of the degree to which an individual item is
measuring the same thing as the rest of the items.
Because the discrimination index reflects the degree to which an
item and the test as a whole are measuring a unitary ability or attribute,
values of the coefficient will tend to be lower for tests measuring a wide
range of content areas than for more homogeneous tests. Item
discrimination indices must always be interpreted in the context of the
type of test which is being analyzed. Items with low discrimination
indices are often ambiguously worded and should be examined. Items
with negative indices should be examined to determine why a negative
value was obtained. For example, a negative value may indicate that the
item was mis-keyed, so that students who knew the material tended to
choose an unkeyed, but correct, response option.
Tests with high internal consistency consist of items with mostly
positive relationships with total test score. In practice, values of the
discrimination index will seldom exceed .50 because of the differing
shapes of item and total score distributions. ScorePak® classifies item
discrimination as "good" if the index is above .30; "fair" if it is between .
10 and.30; and "poor" if it is below .10.
Alternate Weight
This column shows the number of points given for each
response alternative. For most tests, there will be one correct answer
which will be given one point, but ScorePak® allows multiple correct
alternatives, each of which may be assigned a different weight.
Means
The mean total test score (minus that item) is shown for students
who selected each of the possibleresponse alternatives. This information
should be looked at in conjunction with the discrimination index; higher
total test scores should be obtained by students choosing the correct, or
most highly weighted alternative. Incorrect alternatives with relatively
high means should be examined to determine why "better" students chose
that particular alternative.
All the other NRT and CRT item analysis techniques that I will
discuss here and in the next column are based on this notion of item
facility. For instance, item discrimination can be calculated by first
figuring out who the upper and lower students are on the test (using their
total scores to sort them form the highest score to the lowest). The upper
and lower groups should probably be made up of equal numbers of
students who represent approximately one third of the total group each.
In Screen 1, I have sorted the students from high to low based on their
total test scores from 77 for Hide down to 61 for Hachiko. Then I
separated the three groups such that there are five in the top group, five
in the bottom group, and six in the middle group. Notice that Issaku and
Naoyo both had scores of 68 but ended up in different groups (as did
Eriko and Kimi with their scores of 70). The decision as to which group
they were assigned to was made with a coin flip.
To calculate item discrimination (ID), I started by calculating
IFfor the upper group using the following: = AVERAGE(C2:C6), as
shown in row 22. Then, I calculated IFfor the lower group using the
following: = AVERAGE(C15:C19), as shown in row 23. With IFupper
and IFlower in hand, calculatingIDsimply required subtracting IFupper–
IFlower. I did this by subtracting C22 minus C23, or = C22 -C23, as shown
in row 24, which resulted in an IDof .20 for I1.
Once I had calculated the four item analysis statistics shown in
Screen 1 for Il, I then simply copied them and pasted them into the
spaces below the other items, which resulted in all the other item
statistics you see in Screen 1. [Note that the statistics didn't always fit in
the available spaces, so I got results that looked like ### in some cells; to
fix that, I blocked out all the statistics and typed alt oca and thusadjusted
the column widths to fit the statistics. You may also want to adjust the
number of decimal places, which is beyond the scope of this article. You
can learn about this by looking in the Help menu or in the Excel manual.
Ideal items in an NRT should have an average IFof .50. Such
items would thus be well centered, i.e., 50 percent of the students would
have answered correctly, and by extension, 50 percent would have
answered incorrectly. In reality however, items rarely have an IFof
exactly .50, so those that Ell in a range between .30 and .70 are usually
considered acceptable for NRT purposes.
Once those items that fall within the .30 to .70 range of IFs are
identified, the items among them that have the highest IDs should be
further selected for inclusion in the revised test. This process would help
the test designer to keep only those items that are well centered and
discriminate well between the high and the low scoring students. Such
items are indicated in Screen 1 by an asterisk in row 25 (cleverly labeled
"Keepers").
For more information on using item analysis to develop NRTs,
see Brown (1995, 1996, 1999). For information on calculating NRT
statistics for weighted items (i.e., items that cannot be coded 1 or 0 for
correct and incorrect), see Brown (2000). For information on calculating
item discrimination using the point-biserial correlation coefficient instead
of ID, see Brown (2001). For an example NRT development and revision
project, see Brown (1988).
Conclusion
I hope you have found my explanation of how to do norm-
referenced item analysis statistics (item facility and item discrimination)
in a spreadsheet clear and helpful. I must emphasize that these statistics
are only appropriate for developing and analyzing norm-referenced tests,
which are usually used at the institutional level, like, for example, overall
English language proficiency tests (to help with, say,admissions
decisions) or placement tests (to help place students into different levels
of English study within a program). However, these statistics are not
appropriate for developing and analyzing classroom oriented criterion-
referenced tests like the diagnostic, progress, and achievement tests of
interest to teachers. For an explanation of item analysis as it is applied to
CRTs, read the Statistics Corner column in the next issue of this
newsletter, where I will explain the distinction between the difference
index and the B-index.
24
P= 30
P= .80
A rough “role – of thumb” is that if the item difficulty is more
then 75, it is an easy item; if the difficulty is below 25, it is a difficult
item. Given these parameters, this item could be regarded moderately
easy 10ts (80%) of students got it correct. In contrast, Question # 2 is
12
P= 30
P= .40
In fail, on question # 2, more students selected an incorrect
answer (B) than selected the correct answer (A). This item should be
carefully analyzed to ensure that B is an appropriate distracter.
Therefore “Item difficulty” should have been named “item
easiness;” it expresses the proportion or percentage of students who
answered two items correctly.
Range
I= No . of class intervals
Grade Tallies N
90 - 99 ||||| |||
80 – 89 ||||| ||||| ||||| ||||| ||||| |||||
70 – 79 ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| |||||
60 – 69 ||||| ||||| ||||| ||||| ||
50 – 59 ||||| |||||
40 – 49 |||
30 – 39 ||
Frequency
The number of scores lying in a class interval is called the
frequency of that class interval. For Example two scores lie in the class
interval 30-39. Therefore 2 is the frequency of the class interval 30-39.
Mid-Point of Class Mark
The middle of a class interval is called mid point or class mark
and is usually denoted by X. It is calculated as
X́ =
∑x
N
Where
fx means the sum of product of values for ‘f’ and `x', f means
frequency of the scores and x means score. f means the sum of
all the frequencies of the distribution.
The Median:
The median of a set of scores is the middle score of the
arithmetic mean of two middle scores in an array. 50% of the scores are
less than median and 50% of the scores are greater than median.
Formula for calculating median for ungrouped data:
Median = ( N 2+ 1 ) th score
i N
Median = L + f (
+ −C
2 )
Where
L = lower class boundary of the median class interval.
I = length of the median class interval.
F - the frequency of the median class interval.
N= f
C = the cumulative frequency of the class interval below the median class
interval.
The Mode
The mode is the score that occurs greatest number of times in a
data set. Mode does not always exist. If each score occur the same
number of times, there is no mode. There may be more than one mode. If
two or more scores occur greatest number of times, then there are more
than one mode.
The mode can be calculated for grouped data with the help of
following formula.
f m−f l
Mode = L+ ×i
2 f m−f 1−f 2
Where
L = lower class boundary of the modal class interval.
Fm = the maximum frequency.
Fi = the frequency preceding to the modal class.
f2 = the frequency succeeding to the modal class,
I = the length of the modal class interval.
Note: The mode lies in the class interval having maximum frequency.
This class interval is called the modal class.
Empirical Relationship between Mean, Median and Mode:
For moderately skewed distributions, we have the following
empirical relation:
Mode = 3 Median — 2 Mean
Mode = 3 (74.61) — 2 (73.42)
Mode = 76.99
Comparison of Measures of Central Tendency:
The numerical value of every score in a data set contributes to
the mean. This is not true of the mode or median because only the mean
is based on the sum of all the scores. In a single peaked symmetrical
distribution mean = median = mode. In practice, no distribution is exactly
symmetrical, so the mode, median and mean usually have different
values. If a population is not symmetrical, the mean, median and mode
will not be equal. The mean is affected by the presence of a few extreme
scores which the median and mode are not. The mean is preferred if
extreme values are not present in the data. Median is preferred if interest
is centered on the typical rather than the total score and if the distribution
is skewed. If some scores are missing so that the mean cannot be
computed directly, the median is appropriate. Mode is preferred only if
the distribution is multimodal and a multi-valued index is satisfactory.
The Quartiles
The values that divide a set of scores into four equal parts are
called quartiles and are denoted by Ql, Q2, and Q3. Q1 is called the
lower quartile and Q3 is called the upper quartile. 25% of the scores are
less than Ql- and 75% of the scores are less than Q3. Q2 is the median.
The formulas for the quartiles are given as:
Q1 = ( N 4+ 1 ) th score
2 ( N +1 ) N + 1
Q2 = = th score
4 2
and
3 ( N +1)
Q3 = 4 th score
No correlation
The most common methods of computing the Coefficient of
correlation are:
1. Rank-difference method:
This method is useful when the number of scores to be
correlated'-is small or exact magnitude of the scores cannot be
ascertained. The scores are ranked according to size or some other
criterion using numbers 1, 2, 3 n The rank-difference coefficient of
correlation can be computed by the following formula.
2
6∑ D
Rs = 1 – 2
N (N −1)
√ (∑ ) √ ∑ (∑ )
2 2
∑ X2 X Y2 Y
N N N N
Measures of Variability:
Measures of central tendency measure the centre of a set of
scores. However, two data sets can have the same mean, median and
mode and yet be quite different in other respects. For example, consider
the heights (in inches) of the players of two basketball teams.
Team-1: 72 73 76 76 78
Team-2: 67 72 78 76 84
The two teams have the same mean height. 75 inches, but it is
clear that the heights of the players of team 2 vary much more than those
of team 2. If we have information about the centre of scores and the
manner in which they are spread out we know much more bout set of
scores. The degree to which scores tend to spread about. an average value
is called dispersion.
The Range
It is the simplest measure of dispersion. The range of a set of
scores is the difference between maximum scores and minimum scores.
In symbols
Range = Xm – Xo
Where Xm is the maximum score and Xo is the minimum score.
Quartile Deviation:
The quartile deviation is defined as half of the difference
between the third and the first quartiles.
In symbols
Q. D. = (Q3 – Ql) / 2
Where
Q1 is the first quartile and
Q3 is the third
The Mean Deviation or Average Deviation:
The average deviation is defined as the arithmetic mean of the
deviations of the scores from the mean or median; the deviations are
taken as positive. In symbols
∑ ¿ X− X́∨¿
M.D. = N
¿
∑ f ∨X− X́∨¿
M.D. = ∑f
¿
X −X ¿2
¿
¿
S= ∑¿
¿
√¿
S
C.V. X × 100
Standard Scores:
A frequently used quantity is statistical analysis is the standard
score or Z-score. The standard score for a data value in the number of
standard deviations that the data value is away from the mean of the data
set.
X− X́
Z= S
7. In a formal distribution,
4.4 PROFILE:
One advantage of converting raw scores to derived scores is that
a pupil’s performance on different tests can be compared directly. This is
usually done by means of a test profile, like the one presented in Figure
14.3. such a graphic representation of test data makes it easy to identify a
pupil’s relative strengths and weaknesses. Most standardized tests have
provisions for plotting test profiles.
The profile shown in figure 14.3 indicates a desirable tend in
profile construction, instead of plotting targets scores as specific points
on the scales, test performance is recorded in the form of bands that
extend one standard error of measurement above and below the pupil’s
obtained scores. Recall from our discussion of reliability that there are
approximately two chances out of three that a pupil’s true source will fall
within one standard error of the obtained score. Thus, these confidence
bands indicate the ranges of scores within which we can be reasonably
certain of finding the pupil’s true standings. Plotting them on the profile
enables us to take into account the inaccuracy of the test scores when
comparing performance on different tests. Interpreting differences
between tests is simple with these score bands. If the bands for two tests
overlap, we can assume that performance on the two tests does not
different significantly, and if the ands do not overlap, we can assume that
there is probably a real difference in performance.
The score bands used with the differential aptitude test can be
plotted by hand or by computer. The computer produced profile shown in
figure 14.3 is based on the same sex percentiles. There are recorded
down the left side of the profile and were obtained from the percentile
norms in table 14.3 the opposite sex percentiles are listed down the right
side of the report also to show how the scores compare with the female
norms. The differences in percentiles for some tests plot the results
directly on the profile. The use of such bands minimizes the tendency of
test profiles to present a misleading picture. Without the bands we are apt
to attribute significance to differences in test performance that can be
accounted for the chance alone.
When profiles are used to compare test performance, it is
essential that the norms for all tests be comparable. Many test publishers
provide for this by standardizing a battery of achievement tests and a
scholastic aptitude test on the same population.
Profile Narrative Reports
Some test publishers are now making available to profile of
each pupil’s cores, accompanied by a narrative report that describes how
well the pupil is achieving. The graphic profile provides a quick view of
the pupil’s strengths and weaknesses, and the narrative report aids in
interpreting the scores and in identifying areas in which instructional
emphasis is needed. A typical report of this type, for a widely used test
battery, is shown in figure 14.4.
Narrative reports should be especially useful in communicating
test result to parents. They are, of course, also helpful to those teachers
who have had little or no training in the interpretation and use fo scores
from published tests.
UNIT-5:
EVALUATING PRODUCT, PROCEDURES &
PERFORMANCE
Oral language
Reading
Writing
Digital literacy
Learning environment
Engaged in learning
Learning to learning
Engaged in Learning
Are students interested and enthused by the content and teaching
approaches used?
Do we encouraged pupil questioning considered teacher input
V'S pupil participation in your class room.
How pupils are active when teacher work?
Collaborative and independent learning.
Progressive skill learning and skill development.
Challenge and support.
Pupils enjoy learning in class room and are eager to find out
more,
All students in class room afforded the opportunity to participate
in lesson and engage with learning.
Learning Environment
To involve the students in development rules which recognize
the rights of responsibilities of the community.
Prepare supervision of pupils both within the class and at break
times within the school setting.
All the recourses well organized, labeled and clear to all
learners.
Celebrates pupils learning and achievements through a range of display.
Concrete and visual materials, centers of interest and display of pupil
work.
Learning to Learn
Learning to learning is the third sub theme of learning
experiences.
To engage the pupils to monitor their own progress in learning
for learning technique to utilize them properly in class room to develop
the skills of learner by proper planning of lessons.
To allow the learner to communicate work with other in the clam.
How do we enable the student learner to develop their personal
organization to plan out their own work study and revision skills do we
teach.
To teach the pupils how to organized prose nil the work.
To make the pupil creative and give the opportunity for collaborative
work.
3. Teac’s Practice
The quality of teacher's practice is the third theme of teacher
and learning from work. Under this theme four sub themes are identified
Assessment
Reflect on the use of assessment as an aid to teaching and
learning how do we plan for assessment.
Does our planning reflect whole school assessment policy?
How do we incorporate best practice as ????? in assessment
guide line 2007 into our teaching and learning?
Teaching Approach
Learning Outcome
Effects:
In addition to the immediate effects on a student's course grade
and grade point average, a term paper will be valuable when searching
for or advancing in careers.
Term Paper Evolution
Term paper has been graded according to the following criteria.
The Cortical Section
Range depth and quality of literature research on your topic.
The author largely points out the most striking result at the same
time; however he/she concentrates too much discussing aspects
that are not entirely relevant to research question at hand.
Timing of Evaluation:
Evaluation an occur at any time during the unit of study
program, but it usually occurs at the end of the semester or at the end of
the task that is being undertaken and evaluated.
Ideally students should be given time to reflect upon their
experiences prior to completing any form of evaluation especially if
evaluator desire some specific information about their experiences of
group work or have a specific reflection component within the work
being evaluated.
Autonomy.
Opportunity to get to know my classmates.
Opportunity to work on a real life problems students are usually
willingly to complete these lists but again the disadvantages is that they
do not give detailed responses or answer "why" or "how" questions.
Evaluation Hand-Out:
Some academies design their own evaluation hand-out that can
combine a number of evaluation methods and are anonymous, quick and
easy to complete. They can take any form, use images, diagrams,
comment boxes or questions and lists as above.
Interview:
Interview can be done individually or in small groups and
provide the opportunity for evaluator to probe for deeper analysis of the
process and experience.
The disadvantage of this method is that it can be time
consuming for both evaluator and the students, and in a larger group may
be some students may be more vocal than others.
Focus Group:
Focus group uses a facilitative rather than direct questioning
approach and is a useful way of having students discusses the process of
group work. This method allows students to work off and build upon
each other's answers.
The disadvantage is that it is time consuming for both evaluator
and students and there is the added difficult of arranging a time that will
suit everyone.
Practicality of the Evaluation Process:
Before making a choice about evaluation method also consider
the following questions:
Benefits of Portfolio:
One of the most important benefits of using portfolios is the
enhancement of critical thinking%, skills which result from the
need for students tot
Develop evaluation criteria
Students are pleased to observe their personal growth,
They have better attitudes toward their work, and
They are more likely to think of themselves as writers.
1. First, you must decide the purpose of your portfolio. For example,
the portfolios might be used to show student growth, to identify
weak spots in student work, and/or to evaluate your own
teaching methods.
2. After deciding the purpose of the portfolio, you will need to
determine how you are going to grade it. In (titer words, what
would a student need in their portfolio for it to be considered
success and for them to earn a passing grade.
3. The answer to the previous two questions helps form the answer to
the third: What should be included in the portfolio? Are you
going to have students put of all their work or only certain
assignments? Who gets to choose?
How to Build a Student Portfolio
The following suggestions will help you effectively design a
student portfolio.
1. Set a Purpose for the Portfolio. First, we need to decide what
your purpose of the portfolio is. Is it going to be used to show
student growth or identify specific skills? Are we looking for a
concrete way to quickly show parents student achievement, or
are we looking for a way to evaluate your own teaching
methods? Once we have figured out your goal of the portfolio,
then we think about how to use it.
2. Decide How ' You Will You Grade it. Next, we will need to
establish how we are going to grade the portfolio. There are
several ways you can grade students work, we can use a rubric,
letter grade, or the most efficient way would be to use a rating
scale. Is the work completed correctly and completely? Can we
comprehend it? we can use the grading scale of 4-1. 4 = Meets
all Expectations, 3 = Meets Most Expectations, 2 = Meets Some
Expectation, 1 = Meets No Expectations. Determine what skills
you will be evaluating then use the rating scale to establish a
grade.
3. What will b Included in it. How will we determine what will
go into the portfolio? Assessment portfolios usually include
specific pieces that students are required to know. For example,
work that correlates with the Common Core Learning Standards.
Working portfolios include whatever the student is currently
working on, and display portfolios showcase only the best work
students produce. Keep in, mind that we can create a portfolio
for one unit and not the next. We get to choose what is included
and how it is included. If you want to use it as a long-term
project and include various pieces throughout the year, we can.
But, we can also use it for short- term projects as well.
4. How Much Will You Involve the Students. How much we
involve the students in the portfolio depends upon the students
age. It is important that all students should understand the,
purpose of the portfolio and what is expected of them. Older
students should be give n a checklist of what is expected, and
how' it will be graded. Younger students may 1 of understand
the grading scale so we can give them the option of what w 11
be include d in their portfolio. Ask them questions such as, why
did you choose this particular piece and does it represent your
best work? Involving students in the portfolio process will
encourage them to reflect on their work.
5. Will You Use a Digital Portfolio. With the fast-paced world of
technology, paper portfolios may'become a thing of the past.
Electric portfolios (e-portfolios/digital portfolios) are Teat
because they are easily accessible, easy to transport and easy to
use. Today’s students are tuned into the latest must-have
technology, and electronic portfolios arc part of that. With
students using an abundance of multimedia outlets, digital
portfolios seem like a great fit. The uses of these portfolios are
the same, students still reflect upon the r work but only in a
digital way.
The key to designing a student portfolio is to take the time to
think about what kind it will be, and how we well manage it. Once we do
that and follow the steps;above, we will find it will be a success.
Types C F Portfolios Duo
1) Best Work Portfolio
This type of portfolio highlights and shows evidence of the best
work of learners. Frequently, this type of portfolio is called a display or
showcase portfolio. For Students, best work is often associated with pride
am a sense of accomplishment and can result in a desire to share their
work with o hers. Best work can include both product and process. It is
often correlated with the amount of effort that few learners have invested
in their work. A major advantage of this type of portfolio is that learners
(an select items that reflect their highest level of learning and canexplain
why these it (ms represent their best effort and achievement. Best work
portfolios are used for the following purposes:
Student Achievement. Students may select a given number of entries
(e.g., 10) that reflect their best effort or achievement (or both) in a course
of study. The portfolio can be presented in a student-led parent
conference or at a community open house. As students publicly share
their excellent work, work they have chosen and reflected upon, the
experience may enhance their self-esteem.
Post-Secondary Admissions. The preparation of g.post-secondary
portfolio targets work samples from high school that can be submitted
forconsideration in the process of admission to college or university. This
portfolio should show evidence of a range of knowledge, skills, and
attitudes, and may highlight particular qualities relevant to specific
programs. Many colleges and universities are adding portfolios to the
initial admissions process while others are using them to determine
particular placements once students are admitted.
Employability. The audience for this portfolio is an employer, .This
collection of work needs to be focused on specific knowledge, skills, and
attitudes necessary for a particular job or career. The school-to-work
movements in North America are influencing an increase in the use of
employ-ability portfolios. The Conference Board of Canada (1092), for
example, outlines the academic, personal management, and teamwork
skills that are the foundation of a high-quality Canadian workforce. An
employability portfolio is an excellent vehicle for showcasing these
skills.
2) Growth Portfolio
A growth portfolio demonstrates an individual's development
and growth over time. Development can be focused on academic or
thinking skills, content knowledge, self-knowledge, or any area that is
important in your setting. A focus on growth connects directly to
identified educational goals and purposes. When growth is emphasized, a
portfolio will contain evidence of struggle, failure, success, and change.
The growth will likely be' an uneven journey of highs and lows, peaks
and valleys, rather than a smooth continuum. What is significant is that
learners recognize growth whenever it occurs and can discern the reasons
behind that growth. The goal of a growth portfolio isfor learners to see
their own changes over time and, in turn, share their journey with others.
A growth portfolio ca -I be culled to extract a best work sample.
It also helps learners see how achievement is often a result of their
capacity to self-evaluate, set goals, and work over time. Growth
portfolios car be used for the following purposes:
Knowledge. This portfolio shows students' growth in knowledge in a
particular content area or across several content areas over time. This
kind of portfolio can contain samples of both satisfactory and
unsatisfactory work, along with reflections to guide further learning.
Skills and Attitudes. This portfolio shows students' growth in skills and
attitudes in areas such as academic discipline s, social skills, thinking
skills, and work habits. In this type of portfolio,challenges, difficult
experiences, and other growth events can be included to demonstrate
students' developing skills. In a thinking skills portfolio; for example,
students might include evidence showing growth in their ability to recall,
comprehend, apply, analyze, synthesize, and evaluate information
Teamwork. This portfolio demonstrates growth in social skills in a
variety of cooperative experiences. Peer responses and evaluations are
vital elements in this portfolio model, along with self-evaluations.
Evidence of changing attitudes resulting from team experiences can also
be included, especially s expressed in self-reflections and peer
evaluations.
Career. This portfolio helps students identify personal strengths related
to potential career choices: The collection can be developed over several
years, perhaps beginning in middle school and continuing throt4;hout
high school. The process of selecting pieces over time empowers young
people to make appropriate educational choices leading toward
meaningful careers. Career portfolios mat items from outside the
school setting that substantiate students' choices and create a holistic
view of the students as learners and people. This type of portfolio may be
modified for employment purposes.
3) Showcase Portfolios
Showcase portfolios highlight the best products over a particular
time period or course. For example, a showcase portfolio in a
composition class may include the best examples of different writing
genres, such an essay, a poem, a short story, a biographical piece, or a
literary analysis. In a business class, the showcase portfolio may include
a resume, sample business letters, a marketing project, and a
collaborative assignment that demonstrates the individual's ability to
work in a team. Students are often allowed to choose What they believe
to be their best work, highlighting the it achievements and skills.
Showcase reflections typically focus on the strengths of selected pieces
and discuss how each met or exceeded required standards
4) Process Portfolios
Process portfolios, by contrast, concentrate more on the journey,
of learning rather than the final destination or end pro lusts of the
learning process. In the composition class, for example, different stages
of the process—an outline, first draft, peer and teacher responses, early
revisions, and a final edited draft—may be required. A process reflection
may discuss why a particular strategy was used, what was useful or
ineffective for the individual in the writing process, and how the student
went about making progress in the face of difficulty in meeting
requirements. A process reflection typically focuses on many aspects of
the learning process, including the following: what approaches fiches
work best, which are ineffective, information about oneself as a learner,
and strategies or approaches to remember in future assignments.
5) Evaluation Portfolios.
Evaluation portfolios may vary substantially in their content.
Their basic purpose, however, remains to exhibit a series of evaluations
over a course and the learning or accomplishments of the student in
regard to previously determined criteria or goals. Essentially, this type of
portfolio documents tests, observations, records, or other assessment
artifacts required for successful completion of the course. A math
evaluation portfolio may include tests, quizzes, and written explanations
of how me went about solving a problem or determining which formula
to use, whereas a science evaluation portfolio might also include
laboratory experiments, science project outcomes with photo ; or other
artifacts, and research reports, as well as tests and quizzes. Unlike the
showcase portfolio, evaluation portfolios do not simply include the best
work, but rather a selection of predetermined evaluations that may also
demonstrate students' difficulties and unsuccessful struggles as well as
their better world. Students who reflect on why some work was
successful and other work was less so continue their learning as they
develop their met cognitive skills.
6) Online or e-portfolios
Online or e-portfolios may be one of the above portfolio types
or a combination of different types, a general requirement being that all
information and artifacts are somehow accessible online. A number of
colleges require students to maintain a virtual portfolio that may include
digital, video, or Wet -based products. The portfolio assessment process
may be linked to a specific course or an entire program. As with all
portfolios, students are able to visually track and show their
accomplishments to a wide audience,
Conclusion: The portfolio process will continue to be refined and efforts
made to improve students' perceptions if the process as it is intended to
develop the self-assessments skills they will need to improve their
knowledge and professional skills throughout their education careers.
Select objectives.
Set the criteria for judging the work (rating scales, rubrics,
checklists) and make such student understand the criteria.
Null Hypothesis:
The hypothesis to be tested in a test of hypothesis is called null
hypothesis. It is a hypothesis which is tested for possible rejection or
mollification under the assumption that it is true. It is denoted by H 0 and
usually contains and equal sign.
For example if we want to test that the population mean is 80,
then we write.
H0 : = 80
A Few Examples
For a moment suppose that we know the mean of data is 25 and
that the values are 20,10, 50, and one unknown value. To find the mean
of a list of data, we add all of the data and divide by the total number of
values. This gives us the formula (20 + 10 + 50 + x)/4 = 25, where x
denotes the unknown. Despite c ling this unknown, we can use some
algebra to determine that x = 20.
Let's alter this scenario slightly. Instead we suppose that we
know the mean of a data set is 25, with values 20, 10; and two unknown
values. These unknowns Could be different, so we use two different
variables, A and y to denote this. The resulting formula is (20 + 10 + x
+y)/4 = 25. With some algebra we obtain y = 70 - x. The formula is
written in this form to show that once we choose a value for x, the value
fory is determined. This shows 'that there is one degree of freedom.
Now we'll look at a t ample size of one hundred. If we know
that the mean of this sample data is 20, but do not know he values of any
of the data, then there are 99 degrees of freedom. All values must add up
t ) a total of 20 x 100 = 2000. Once we have the values of 99 elements in
the data set, then the last one has been determined.
Example
To compute the variance I first sum the square deviations from
the mean. The mean is a parameter: it is a characteristic of the variable
under examination as a whole and is part of describing the overall
distribution of values. If you know all the, parameters you can accurately
describe the data. The more parameters you know, that is to saythe more
you fix, the fewer samples fit this mode of the data. If you know only the
mean, there will be many possible sets of data that are consistent with
this model but if you know the mean and the standard deviation, fewer
possible sets of data fit this model.
So in computing the Variance I had first to calculate the mean.
When I have calculated the mean, I could vary any of the scores in the
data except for one. If I leave one score unexamined it can always be
calculated accurately from the rest of the data and the mean itself. Maybe
an example can make this clearer.
I take the ages of a class of students and find the mean. If I fix
the mean, how many of the other scores (there are N of them remember)
could still vary? The answer is N-1. There are N-1 independent pieces of
information that could vary while the mean is known. These are the
degrees of freedom. One piece of information cannot vary because its
value is fully determined by the parameter (in t its case the mean) and the
other scores. Each parameter that is fixed during our computations
constitutes the loss of a degree of freedom.
If we imagine starting with a small number of data points and
then fixing a relatively large number of parameter: as we compute some
statistic, we see that as more degrees of freedom are lost, fewer and
fewer different situations are accounted for by our model since fewer and
fewer pieces of information could in principle be different from what is
actually observed.
So, the interest, to put it very informally, in our data is
determined by the degrees of freedom: if there is nothing that can vary
once our parameter is fixed (because we have so very few data points -
maybe just or e) then there is nothing to investigate. Degrees of freedom
can be seen as linking sample size to explanatory power.
The Standard Deviation is a measure of how spread out numbers
are;
Its symbol is a (the greek letter sigma)
The formula is easy: It is the square root of the Variance.
To calculate the variance follow these steps:
Work out the Mean (the simple average of the numbers)
Then for each number: subtract the Mean and square the resIult
(the squared difference).
Then work out the average of those squared differences.
Let suppose we have five values i.e 600,470,170,430 & 300
= 394
−94
¿
2 ¿
Variance: σ =n ¿2
2062 + 762 +(−224 )2+36 2+ ¿
¿
42,436+5,776+50,176+1,296+ 8,836
= 5
108,520
= 5 = 21,704
−94
¿
2 ¿
Variance σ = ¿2
2062 + 762 +(−224 )2+36 2+¿
¿
42,436+5,776+50,176+1,296+ 8,836
= 5
108,520
= 5 = 21,704
UNIT-8:
SELECTED TESTS OF SIGNIFICANCE
8.1 T-TEST:
Definition:
i) A t-test helps you compare weather two groups have
different average values (For example, weather men and
women have different average heights).
ii) A t-test asks weather a different between two groups
averages unlikely to have occurred because of random
chance in sample selection. A difference is more likely to be
meaningful and “real” if (a) the difference between, the
average is large, (b) the sample size is large, and (c)
Responses are consistently close to the average values and
not widely spread out (the standard deviation is low).
iii) A statistical examination of two population means. A two-
sample. T-test examines weather two samples are different
and is commonly used when the variances of two normal
distribution are unknown and when an experiment uses a
small sample size. For example, a t-test could be used to
compare the average floor routine score of the U.S women’s
Olympic gymnastic team to the average floor routine score
of China’s women’s team.
The t-test’s statistical significance and the t-test’s effect size are
the two primary outputs of the t-test. Statistical significance indicates
weather the difference between sample averages is likely to represent an
actual difference between population and the effect size indicates wither
that difference is large enough to be practically meaningful.
The “One sample t-test” is similar to the “independent samples
t-test” except it is used to compare one group’s average value to a single
number .x. for practical purposes you can look at the confidence interval
around the average value to gain this same information.
The “paired t-test” is used when each observation in one group
is paired with a related observation in the other group. For example do
Kansans spend money on movies in January to February. Where each
respondent is asked about their January from their February spending? In
fact a period t-test subtracts each respondent’s January spending from
their February spending (yielding the increase is spending), then take the
average of all those increases in spending and looks to see wither that
average is statistically significantly greater than Zero (using a one sample
t-test).
The “ranked independent t-test” ask a similar question to the
typical unranked test but it is more robust to outliners (a few bad
outliners can make the results of an unranked t-test invalid).
T-test (Independent Samples)
Dollars spend on movies per month. Stat-wing represents t-test
results as distribution curves. Assuming there is a large enough sample
size, the difference between these samples probably represents a “real’s”
difference between population from they were sampled.
Example:
Let’s say you are curious about wether New Yorkers and
Kansans spend a different amount of money per month on movies. It is
impractical to ask every New Yorker and Kansans about their movie
spending, so instead you ask a sample of each – may be 300 New Yorkers
and 300 Kansans – and the average are 14 Dollars and 18 Dollars. The t-
test asks wether that difference is probably representative of a real
difference between Kansans and New Yorkers generally or whether that
is most likely a meaningless statistical fluke.
Technically, it asks the following. If there were in fact no
difference between Kansans and New Yorkers generally, what are
chances that randomly selected groups from those populations would be
as different as these randomly selected groups are?
For example if Kansans and New Yorks as a whole actually
spent the same amount of money on average. It is very unlikely that 300
randomly selected Kansans each spend exactly 14 Dollars and 300
randomly selected New Yorkers each spend. 18 Dollars exactly. So if you
are sampling yielded those results, you would conclude that the
difference in the sample groups is most likely representative of a
meaningful difference between the populations as a whole.
Statistical Analysis of the T-test:
The formula for the t-test is a ratio. The top part of the ratio is
just the difference between the two means or averages. The bottom part is
a measure of the variability or dispersing of the scores. This formula is
essentially another example of the signal-to-noise metaphor in research
the difference between the means is the signal that in this case, we think
our program or treatment introduced into the data, the bottom part of the
formula is a measure of variability that is essentially noise that may make
it harder to see the group difference.
Signal noise:
The top part of the formula is easy to compute----- Just find the
difference between the means. The bottom part is called the standard
error of the difference. To compute it, we take the variance for each
group and divide it by the number of people in that group. We add these
two values and then their square root. The specific formula is given in
Figure.
SE ( X́ T − X́ c )=
√ var T var c
nc
+
nc
√ nar t nar c
nT
+
nC
Formula for T-test.
References
O’Mahony, Michael (1986). Sensory Evaluation of Food: Statistical
Methods and procedures.
William H.; Saul A. Teukolsky. William T. Vetterling Br Ain P. Flannery
(1992). Numerical Recipes in C: The Art of \Scientific
Computing.
Internet Google, pre Encyclopedia.
Uses of X2 Distribution:
1. X2 is used to test the goodness of fit.
2. X2 is used to test the independence of attributes.
3. X2 is used to test the validity of a hypothetical ratios.
4. X2 is used to test the homogeneity of soosal X2 variances.
5. X2 is used to test whether the hypothical value S2 of
population variances hypothical value S2 of population
variances is true on not.
6. X2 is used to test the equality of several population
correlation co-efficient.
Goodness of Fit Test:
This test is based on the property that cell probabilities depend
upon unknown parameters, provided that the unknown parameters are
replaced with their estimates and provided that and one degree of
freedom is deducted for each parapets estimated”. To see whether there is
evidence of small or large differences, the test statistic to use is;
2 /ei
oi−ei ¿
¿
(¿−npi)2 u
=¿ ∑ ¿
npi i=1
K
x2∑ ¿
i=1
eij
∑ eij (Bj )
i=1
= 4
= So that
( Ai) n
∑ (Ai )
i=1
( Ai ) (Bj)
Eij = n
i=1 j=1
i=1 j=1
8.3 REGRESSION:
In statistics, regression analysis is a statistical technique for
estimating the relationships among variables. It includes many
techniques for modelling and analysing several variables, when the focus
is on the relationship between a dependent variable and one or more
independent variables.
In other words regression is a statistical measure that attempts to
determine the strength of the relationship between one dependent
variable (usually denoted by Y) and a series of other changing variables
(known as independent variables).
Types of 'Regression'
There are two basic types of regression:
(i) Linear regression
(ii) Multiple regression.
Linear regression uses one independent variable to explain
and/or predict the outcome of Y, while multiple regression uses two or
more independent variables to predict the outcome. The general form of
each type of regression is:
Linear Regression: Y = a + bX + u
Multiple Regression: Y = a + b1 X1+ b2 X2 + B3 X3 B3X3 + …… Bt Xt u
Where:
Y = the variable that we are trying to predict
X = the variable that we are using to predict Y
a = the intercept
b = the slope
u = the regression residual.
In multiple regression, the separate variables are differentiated
by using subscripted numbers.
Regression takes a group of random variables, thought to be
predicting Y, and tries to find a mathematical relationship between them.
This relationship is typically in the form of a straight line (linear
regression) that best approximates all the individual data points.
Regression is often used to determine how much specific factors such as
the price of a commodity, interest rates, particular industries or sectors
influence the price movement of an asset.