Analyzing & Interpreting Data

Analyzing & Interpreting Data
Assessment Institute Summer 2005
Categorical vs. Continuous Variables

Categorical
Variables
Examples Students major, enrollment status, gender, ethnicity; also whether or not the student passed the cutoff on a test
Continuous Why
Variables
Examples GPA, test scores, number of credit hours.
make this distinction?
Whether a variable is categorical or continuous affects whether a particular statistic can be used Doesnt make sense to calculate the average ethnicity of students!
Averages
Typical
value of a variable In assessment we commonly compare averages of:

Different groups
Each
group consists of different people Avg. score on a test for students in different classes
Different occasions
Same
people tested on each occasion Avg. score on a test for students who took the test as freshmen an then again when they were seniors
Before calculating an average

Check
to make sure that the variable:
Is continuous Has values in your data set that are within the possible limits
Check
minimum and maximum values
Does not have a distribution that is overly skewed

If
so, consider using median
Does not have any values that would be considered outliers
Histogram
Correlations (r)
Captures
linear relationship between two continuous variables (X and Y) Ranges from -1 to 1 with values closer to |1| indicating a stronger relationship than values closer to 0 (no relationship) Positive values:
High X associated with high Y; low X associated with low Y
Negative
values:
High X associated with low Y; low X associated with high Y
Relationship between KWH1 Fall 2003 Total Scores and KWH1 Spring 2005 Total Scores
35
30
KWH1 Total Score Spring 2005
25
20
15
10
In this example, dropping cases that appeared to be outliers did not change the relationship between the two administrations (r = .30), nor their averages.
0 5 10 15 20 25 30 35 KWH1 Total Score Fall 2003
Scatterplot: Does relationship appear linear? Is there a problem with restriction of range? Does there appear to be outliers?
Standards

May want to use standard setting procedures to establish cut-offs for proficiency on the test Could be that students are gaining knowledge/skills over time, but are they gaining enough? Another common statistic calculated in assessment is the % of students meeting or exceeding a standard
100 90 80 70
STANDARD SET BY FACULTY
Test Score
60 50 40 30 30 20 10 0 Incoming Freshmen
55
Keystone Course Completers
A. Are the 29 senior music majors in Spring 2005 scoring higher on the Vocal Techniques 10-item test than last years 20 senior music majors?
Compare
averages of different groups
Yes, this years seniors scored higher (M = 6.72) than last years (M = 6.65).
B. Are senior kinesiology majors in different concentrations (Sports Management vs. Therapeutic Recreation) scoring differently on a test used to assess their core kinesiology knowledge?
Compare
Concentration Name (Acronym) Exercise Science and Leadership (ESL) Physical and Health Education Teacher Education (PHETE) Recreation Management (RM) Sport Management (SM) Therapeutic Recreation (TR) Overall
# Unique Items 50 50 50 50 50 50
N 24 13 9 64 4 114
Mean 60.58% 54.16% 59.78% 51.84% 57.00% 54.78%
SD 7.66% 8.66% 10.12% 8.54% 4.76% 9.08%
Min 50% 42% 44% 22% 52% 22%
Max 74% 74% 74% 68% 62% 74%
100%
Mean Core Test Total (Percent Correct)
75%
60.58% 50% 54.16%
59.78% 51.84%
57.00%
25%
Concentration Mean
Overall Mean
0% Exercise Science and Leadership (ESL) Physical and Health Education Teacher Education (PHETE) Recreation Management (RM) Concentration Sport Management (SM) Therapeutic Recreation (TR)
C. On the Information Seeking Skills Test (ISST), what percent of incoming freshmen in Fall 2004 met or exceeded the score necessary to be considered as having proficient information literary skills?
Percent
of students meeting and exceeding a standard
Of the 2862 students attempting the ISST, 2751 (96%) met or exceeded the proficient standard.
D. Are the well-being levels (as measured using six subscales - e.g., self-acceptance, autonomy, etc.) of incoming JMU freshmen
different than the well-being levels of adults?

Compare
averages of different groups (JMU students vs. adults) More than one variable (six different subscales)
18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3
SelfAcceptance
Ryff & Keyes 95
F04
Positive Relations with Others 14.8 15.0
Autonomy 15.2 13.9
Environmental Purpose in Life Mastery 14.9 14.0 14.4 14.8
Personal Growth 15.7 15.7
Ryff & Keyes 95 F04
14.6 15.3
While the practical significance of the differences for Self-Acceptance and Purpose in Life are considered small (d=.14 and d=.25), the differences for Autonomy (d=.50) and Environmental Mastery (d=.35) are considered medium and small to medium, respectively.
Average
Similarities
JMU Incoming Freshmen seem to be similar to the adult sample (N = 1100) in Positive Relations with Others and Personal Growth.
Differences
JMU incoming freshmen have significantly lower Autonomy and Environmental Mastery well-being compared to the adult sample and significantly higher SelfAcceptance and Purpose in Life.
E. Are students scoring higher on the Health and Wellness Questionnaire as sophomores compared to when they were freshmen? Does the difference depend on whether or not they have completed their wellness course requirement?
Comparing
Means Across Different Occasions for Different Groups
HWQ1-Part1 Mean Total Scores from Fall 2003 and Spring 2005 by Number of Wellness Courses Completed
60 55 50
Fall 2003
Spring 2005
Non-Completers N = 21
40.52 41.38
Completers N = 283
40.19
HWQ1-Part1 Total Score
45 40 35 30 25 20 15
38.91
No Courses
Number of Wellness Courses Completed
1 Course
F. Are the writing portfolios collected in the fall semester yielding higher ratings than writing portfolios collected in the spring semester? Are the differences between the semesters the same across three academic years?
Compare
averages of different groups Six different groups (fall and spring for each academic year)
4.00
Average Portfolio Rating of Students Who Took the GWRIT Course in Different Semesters by Academic Year Fall 2.56 2.56 Spring
Average Rating
3.50 3.00 2.50 2.00 1.50 1.00 2001-2002 2.42
2.39
2.19 2.22
2002-2003 Academic Year
2003-2004
In the 2001-2002 and 2002-2003 academic years, fall portfolios were rated slightly higher than spring portfolios. In the most current academic year, the fall and spring portfolio averages were about the same. There doesnt seem to be overwhelming evidence that the difference between fall and spring portfolios is of importance.
G. Are students who obtained transfer or AP credit for their general education sociocultural domain course scoring differently on the 27-item Sociocultural Domain Assessment (SDA) than students who completed their courses at JMU?
Compare
JMU students: N = 369, M = 18.63, SD = 3.83 AP/transfer students: N = 29, M = 18.55, SD = 3.68 Difference was not statistically, t(335)=.11, p = .92, nor practically significant (d = .02).
G. What is the relationship between a students general education sociocultural domain course grade and their score on the 27-item Sociocultural Domain Assessment (SDA)?
Relationship
4.5 4
between two variables, finally!

Relationship Between GSYC160 Course Grade and SDA Total Score
4.5 4 3.5
Relationship Between GSYC101 Course Grade and SDA Total Score
3.5
GPSYC101 Course Grade
2.5
GPSYC160 Course Grade

0 3 6 9 12 15 18 21 24 27
2.5
1.5
1.5
1
0.5
0.5
0
0 0 3 6 9 12 15 18 21 24 27 SDA Total Score
SDA Total Score
r = .31
r = .23
Inferential Statistics
How
likely is it to have found results such as mine in a population where the null hypothesis is true? Comparing Averages of Different Groups
Independent Samples T-test
Null
Groups do not differ in population means
Comparing
Null
Paired Samples T-test

Null
Averages Across Different Occasions
Occasions do not differ in population means
Correlation
No relationship between variables in the population
Typically, want to reject the null: p-value < .05
Effect Sizes and Confidence Intervals

Statistical
significance is a function of both the magnitude of the effect (e.g., difference between means) and sample size Supplement with confidence intervals and effect sizes
SPSS provides you with confidence intervals Can use Wilsons Effect Size Calculator to obtain effect sizes
Wellness Domain Example Goals & Objectives

Wellness Domain Learning Objectives & Test Specification Table for KWH Objectives Goals a Identify the dimensions of wellness. Students should be able to understand the dimensions of wellness, the various factors 1 b Identify factors that influence each dimension of wellness. affecting each dimension, and how dimensions are interrelated. c Recognize how dimensions of wellness are interrelated.
a Recognize the importance of lifestyle in disease prevention 2 Students should be able to understand the relationship between personal behaviors and lifelong health and wellness. b Recognize the relationship between health behaviors and wellness. c Identify and apply the theories of health behavior change. d Examine the role of consumer health issues related to overall wellness
a Assess ones levels of health and wellness 3 Students will recognize an individuals level of Evaluate how ones levels of health and wellness compare to recommended health and wellness and understand how these b levels levels impact quality of life Recognize how genetics, environment and lifestyle behaviors influence c health and wellness levels.
Students take one of two courses to fulfill this requirement, either GHTH 100 or GKIN 100.
a Identify a realistic and adjustable personal wellness plan. Students will identify and implement strategies to improve their wellness b Recognize how to use self-management skills relating to healthy lifestyle behaviors.
c Participate in a greater number of healthy wellness-related activities. * Not an actual goal/objective; created only for Assessment Institute instructional purposes.
Test Specification Table for KWH Goals # KWH Items/ Objective % of KWH Objectives Assessment Items Spring 2005 a 1 b c 30, 16 6, 24 5,8,10,13 Total # Items for Goal 1: a b 2 c d 25, 11, 26 2, 12, 27, 19 Total # Items for Goal 2: a 3 b c 17, 28 9, 29, 7, 4, 31 1 Total # Items for Goal 3: a b 4 32, 33, 15 21, 34 Total # Items for Goal 4: Total # Items 3 4 14 2 5 1 8 3 2 5 35 14,20, 35 22, 23, 18, 3 2 2 4 8 3 4 5.71 5.71 11.43 22.86 8.57 11.43 8.57 11.43 40.00 5.71 14.29 2.86 22.86 8.57 5.71 14.29 100.00 ----
Knowledge of Health and Wellness (KWH) Test Specification Table
--c Assessed via HWQ * Not an actual goal/objective; created only for Assessment Institute instructional purposes.
Data Management Plan Wellness_Data.sav (N = 105)

Name id kwhtot03 kwhtot05 ghth100 gkin100 hth100 kin100 took_hth took_kin numwell KWH Total Fall 2003 KWH Total Spring 2005 Personal Wellness Course Grade Lifetime Fitness & Wellness Course Grade Personal Wellness Course Grade Lifetime Fitness & Wellness Course Grade Did the student take GHTH100? Did the student take GKIN100? Has the student completed their wellness domain requirement? 0 - 35 0 - 35 Letter Grade Letter Grade Numeric Grade Numeric Grade 0 = "Did NOT take GHTH100" 1 = "Did take GHTH100" 0 = "Did NOT take GKIN100" 1 = "Did take GKIN100" 0 = "Requirement NOT completed" 1 = "Requirement Completed" Fall 2003 Spring 2005 Label Possible Values When Data Collected Type Numeric Numeric Numeric
Fall 2003 thru Fall 2004 Character Fall 2003 thru Fall 2004 Character Fall 2003 thru Fall 2004 Fall 2003 thru Fall 2004 Numeric Numeric
Missing data indicated for all variables by "."
Item 1
Item Analysis
Item Difficulty
The
proportion of people who answered the item correctly (p) Used with dichotomously scored items
Correct Answer - score=1 Incorrect Answer - score=0
Item
difficulty a.k.a. p-value Dichotomous items

Mean=p Variance=pq, where q = 1-p
SPSS output for 1st 6 items of 35 item GKIN100 Test3 Spring 2005
Std Dev is a measure f the variability in the item scores
Mean is item difficulty (p)
1. 2. 3. 4. 5. 6.
ITEM1 ITEM2 ITEM3 ITEM4 ITEM5 ITEM6
Mean .5609 .9520 .8598 .7454 .6089 .5793
Std Dev .4972 .2141 .3479 .4364 .4889 .4946
Cases 271.0 271.0 271.0 271.0 271.0 271.0
58% of the sample obtained the correct response to Item 6. The difficulty or p-value of Item 6 is .58
Sample size on which analysis is based
Easiest & Hardest Items
p = .99 25. Causes of mortality today are: A. the same as in the early 20th century. EASIEST B. mostly related to lifestyle factors. C. mostly due to fewer vaccinations. D. a result of contaminated water. 34. Which of the following is a healthy lifestyle that influences wellness? A. brushing your teeth B. physical fitness p = .14 C. access to health care HARDEST D. obesogenic environment
Item Difficulty Guidelines

High
p-values, item is easy; low p-values, item is hard If p-value=1.0 (or 0), everyone answering question correctly (or incorrectly) and there will be no variability in item scores If p-value too low, item is too difficult, need revision or perhaps test is too long Good to have a mixture of difficulty in items on test Once know difficulty of items, usually sort them from easiest to hardest on test
Item Discrimination
Correlation
between item score and total score on test dealing with dichotomous items, this correlation is usually either a biserial or pointbiserial correlation range in value from -1 to 1 values closer to 1 are desirable
Since
Can
Positive
Item
discrimination: can the item separate the men from the boys (women from the girls)
Can the item differentiate between low or high scorers?
Item Discrimination Guidelines
Want
high item discrimination! Consider dropping or revising items with discriminations lower than .30 Can be negative, if so check scoring key and if the key is correct, may want to drop or revise item a.k.a. rpbis or Corrected Item-Total Correlation
35
30
25
20
Scatterplot of relationship between item 2 score (0 or 1) and total score rpbis = .52
If I know you item score, I have a pretty good idea as to what your ability level or total score is.
0
ITEM2
35
15
10
30
Scatterplot of relationship between item 17 score (0 or 1) and total score rpbis = .18
If I know you item score, I DO NOT have a pretty good idea as to what your ability level or total score is.
TOTAL
25
20
15
10
ITEM17 0
SPSS output for 1st 6 items of 35 item GKIN100 Test3 Spring 2005
Corrected Item-Total Correlation is Item Discrimination (rpbis)

Alpha if Item Deleted 0.7772 0.7673 0.7722 0.7672 0.775 0.7691
ITEM1 ITEM2 ITEM3 ITEM4 ITEM5 ITEM6
Scale Scale Corrected Mean if Variance if Item-Total Item Item Correlation Deleted Deleted 26.5424 18.0047 0.2046 26.1513 18.1141 0.5242 26.2435 18.1553 0.283 26.3579 17.5418 0.3783 26.4945 17.8805 0.2408 26.524 17.4578 0.3418
Why is it called corrected item-total correlation? The corrected implies that the total is NOT the sum of all item scores, but the sum of item scores WIHTOUT including the item in question.
Percentage of sample choosing each alternative.

TOTAL * RR2 TOTAL
A=1
B=2
C=3 D=4 9 = Missing
RR2 1 2* 3 4 9 Total
% of Total N .7% 95.2% 2.6% .7% .7% 100.0%
Mean 15.5000 27.6512 17.7143 25.5000 2.5000 27.1033
Average total test score for students who chose each alternative.
Notice how the highest average total test score (M = 27.65) is associated with the correct alternative (B). All other means are quite a bit lower. This indicates that the item is functioning well and will discriminate.
This information is for item 2, where the item difficulty and discrimination were: p = .95, rpbis = .52
Percentage of sample choosing each alternative.

TOTAL * RR17
A=1
TOTAL RR17 1 2 3* 4 9 Total % of Total N 15.1% 3.3% 69.7% 11.1% .7% 100.0% Mean 26.2927 25.6667 27.9048 25.2333 2.5000 27.1033
B=2
C=3 D=4 9 = Missing
Average total test score for students who chose each alternative.
Notice how the highest average total test score (M = 27.91) is associated with the correct alternative (C). Unlike item 2, with this item all other means are fairly close to 27.91. This indicates that the item does not discriminate as well as item 2.
This information is for item 17, where the item difficulty and discrimination were: p = .697, rpbis = .18
Took
information from SPSS distractor analysis output and put it in the following graph.
100%
Did this mainly for those items that were difficult (p < .50) or had low discrimination (rpbis < .20)
Item 4 35 28.28 80% 30 25 20 15 10 20% 16% 3% 0% A B C* D 6% 5 0
% choosing response
75% 60%
40%
% choosing response
avg. score for those choosing response
4.
The DSHEA of 1994 has: A. labeled certain drugs illegal based on their active ingredient. B. caused health food companies to lose significant business. C. made it easier for fraudulent products to stay on the market. D. caused an increase in the cost of many dietary supplements.
Average Total Score
24.32
25.14 22.65
Item 31 100% 29.28 26.96 26.49 25.47 35 30 25 20 42% 40% 27% 20% 17% 13% 15 10 5 0 A B* C D
80%
% choosing response
60%
0%
% choosing response
Hard item - but pattern of means indicates it is not problematic.
31.
A. B. C. D.
aging relates to lifestyle. Time-dependent Acquired Physical Mental
Average Total Score
Item 10 100% 35 30 80% 27.67
% choosing response
27.19 25 21.17 39%
60% 18.50 40%
56%
20 15 10
20% 5 1% 0% A B* C D 2% 0
% choosing response
This item may be problematic - students choosing "C" scoring almost as high on the test overall as those choosing "B".
10.Chris MUST get a beer during the commercials each time he watches the NFL. Which stage of addiction does this demonstrate? a) Exposure b) Compulsion c) Loss of control d) This is not an example of addiction.
Average Total Score
Other Information from SPSS

Descriptive
Statistics for SCALE N of Mean Variance Std Dev Variables 27.1033 19.1152 4.3721 35
Statistics for total score.

# items on the test
Average total score
Average # of points by which total scores are varying from the mean
An
measure of the internal consistency reliability for your test called coefficient alpha. Alpha = .7779
Ranges from 0 1 with higher values indicating higher reliability. Want it to be > .60
Test Score Reliability

Reliability
defined: extent or degree to which a scale/test consistently measures a person Need a test/scale to be reliable in order to trust the test scores! If I administered a test to you today, wiped out your memory, administered it again to you tomorrow you should receive the same score on both administrations! How much would you trust a bathroom scale if you consecutively weighed yourself 4 times and obtained weights of 145, 149, 142, 150?
Internal Consistency Reliability

Internal
consistency reliability: extent to which items on a test are highly intercorrelated SPSS reports Cronbachs coefficient alpha Alpha may be low if:
Test is short Items are measuring very different things (several different content areas or dimensions) Low variability in your total scores or small range of ability in the sample you are testing Test only contains either very easy items or very hard items

Analyzing &amp; Interpreting Data

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Analyzing &amp; Interpreting Data

Încărcat de

Drepturi de autor:

Formate disponibile

Analyzing & Interpreting Data

Assessment Institute Summer 2005

Categorical vs. Continuous Variables

Examples GPA, test scores, number of credit hours.

make this distinction?

value of a variable In assessment we commonly compare averages of:

Before calculating an average

to make sure that the variable:

minimum and maximum values

Does not have a distribution that is overly skewed

so, consider using median

Does not have any values that would be considered outliers

High X associated with low Y; low X associated with high Y

KWH1 Total Score Spring 2005

Keystone Course Completers

averages of different groups

averages of different groups

Mean 60.58% 54.16% 59.78% 51.84% 57.00% 54.78%

SD 7.66% 8.66% 10.12% 8.54% 4.76% 9.08%

Min 50% 42% 44% 22% 52% 22%

Max 74% 74% 74% 68% 62% 74%

Mean Core Test Total (Percent Correct)

60.58% 50% 54.16%

of students meeting and exceeding a standard

different than the well-being levels of adults?

Ryff & Keyes 95

Positive Relations with Others 14.8 15.0

Autonomy 15.2 13.9

Environmental Purpose in Life Mastery 14.9 14.0 14.4 14.8

Personal Growth 15.7 15.7

Ryff & Keyes 95 F04

Means Across Different Occasions for Different Groups

HWQ1-Part1 Total Score

3.50 3.00 2.50 2.00 1.50 1.00 2001-2002 2.42

2002-2003 Academic Year

averages of different groups

between two variables, finally!

Relationship Between GSYC101 Course Grade and SDA Total Score

GPSYC101 Course Grade

GPSYC160 Course Grade

0 0 3 6 9 12 15 18 21 24 27 SDA Total Score

SDA Total Score

Groups do not differ in population means

Paired Samples T-test

Averages Across Different Occasions

Occasions do not differ in population means

No relationship between variables in the population

Typically, want to reject the null: p-value < .05

Effect Sizes and Confidence Intervals

Wellness Domain Example Goals & Objectives

Knowledge of Health and Wellness (KWH) Test Specification Table

Data Management Plan Wellness_Data.sav (N = 105)

Missing data indicated for all variables by "."

difficulty a.k.a. p-value Dichotomous items

Std Dev is a measure f the variability in the item scores

Mean is item difficulty (p)

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5 ITEM6

Mean .5609 .9520 .8598 .7454 .6089 .5793

Std Dev .4972 .2141 .3479 .4364 .4889 .4946

Cases 271.0 271.0 271.0 271.0 271.0 271.0

Sample size on which analysis is based

Easiest & Hardest Items

Item Difficulty Guidelines

Analyzing & Interpreting Data

Analyzing & Interpreting Data