Identifying Careless Responses in Survey Data (2011, 21, Data Screening)

Meade, A. W., & Craig, S. B. (2011, April). Identifying careless responses in survey data.
Paper presented at the 26th Annual Meeting of the Society for Industrial and Organizational Psychology, Chicago, IL.
Identifying Careless Responses in Survey Data

Adam W. Meade & S. Bartholomew Craig
North Carolina State University
Eleven indices used to screen survey data for careless responses were examined in order to estimate the prevalence of careless responses in undergraduate internet survey data. Between 5%15% of respondents appear to respond carelessly at times during long surveys. Recommended data screening indices are described.
In any type of survey research, inattentive or careless responses are a concern. Accordingly, it is important for researchers to be able to screen such data for careless, partially random, or otherwise inattentive responses. Such data could lead to spurious within-group variability and lower reliability (Clark, Gironda, & Young, 2003), which in turn will tend to attenuate correlations, and potentially create Type II errors in hypothesis testing. This study makes two significant contributions to the extant literature. First, we provide the first investigation of a comprehensive set of screening methods for careless responding. By examining multiple methods for identifying inattentive respondents, we are able to make recommendations for standard screening methods that should be commonly used but currently are not. Second, we provide the first empirical estimate of a base rate for careless responding in a typical university online survey setting using multiple indicators of inattentive response. What is careless responding and why does it matter? We focus on raw data provided directly by respondents that does not accurately represent their true levels of the constructs being measured. Nichols, Greene, and Schmolck (1989) delineate two such types of problematic response. The first they term content responsive faking, which has two hallmarks: (a) responses are not completely accurate and (b) the response is influenced by the item content. The primary focus of the current study is on Nichols second category of response bias, content nonresponsivity. Content nonresponsivity is defined as responding without regard to item content. This includes data that have been variously described as random response (Beach, 1989; D. T. R. Berry et al., 1992) careless responding (Curran, Kotrba, & Denison, 2010), and protocol invalidity (Johnson, 2005). We prefer the terms inattentive or careless response rather than random response as the resultant
data may be decidedly non-random in the statistical sense. There are several reasons to be concerned about inattentive or careless responding. First and perhaps most intuitively, a clean dataset is highly desirable and data screening is commonly recommended part of the data analytic process (e.g., Tabachnick & Fidell, 2007). Unfortunately, common recommendations typically entail only cursory data screening methods, such as univariate outlier analysis. A second reason to be concerned with careless responses is that they can have serious psychometric implications. Random responses constitute error variance which will attenuate correlations, reduce internal consistency reliability estimates, and potentially result in erroneous factor analytic results (Johnson, 2005). Base Rate Relatively few studies have examined the prevalence of inattentive response, and among those that have, prevalence estimates vary widely. Three studies have used highly motivated samples: Johnson (2005) cites a base rate of 3.5% of careless response, Ehlers, Greene-Shortridge, Weekley, and Zajack (2009) estimated random responding to be around 5%, Curran et al. (2010) examined three indicators of random response and found prevalence around 5%, 20% , or 50%, depending on the criteria by which they assessed inattentive response. Using a more typical sample, Kurtz and Parish (2001) found random responding prevalence to be around 10.6% with college students completing the NEO-PI-R for course credit. However, their results led them to question the efficacy of the index with which they classified persons as inattentive. One commonality of each of these studies is that they used indices designed to detect pervasive careless responding across the entire survey. We believe it is much more likely is that respondents will only occasionally respond inattentively. Previous work supports this notion. For instance, Berry et al.
Careless Responses
(1992) found that across three studies, 50-60% of college student respondents self-reported answering randomly on one or more MMPI items. Similarly, Baer, Ballenger, Berry, and Wetter (1997) found that 73% of respondents to the MMPI self-reported responding randomly to one or more items. Berry et al. (1992) found that even among respondents completing the MMPI as part of a job application, 52% self-reported responding inattentively to at least one item. While participant motivation has always been a concern, there are reasons to suspect that data collected via the Internet from uncontrolled environments is likely to be of poorer quality than is typical of paper and pencil-based measures administered under controlled conditions. Internetbased survey studies using undergraduates may see low intrinsic respondent interest, long measures, virtually no social exchange, and very little control over the environment. From the perspective of the likelihood of data quality, the situation is very poor (Buchanan, 2000). Our purposes of this paper were twofold. First, we wanted to estimate approximately how many undergraduate respondents in an Internetbased survey may be providing careless responses. Second, we wanted to examine potential tools for screening such data and identifying potential inattentive responders. Methods for Identifying Careless Responders Methods of screening can be broken into roughly two types. The first type requires special items or scales to be inserted into the survey prior to administration. Examples include social desirability (e.g., Paulhus, 2002) and lie scales (e.g., MMPI L scale), bogus items (e.g., Beach, 1989), and special scales designed to assess consistent responding (e.g., the MMPI VRIN and TRIN scales). The second type of screening tools can be described as post-hoc in that they do not require specialized scales but instead require special analyses, such as examining response patterns, after data collection is complete. Many variations of these approaches are examined in the current study and are detailed in the method section. We examine a number of research questions with the goal of better understanding careless response and data cleaning methods. Also, one strategy is to try to prevent inattentive responses in the first place via instruction sets. Survey research tends to provide anonymity for respondents under the guise that anonymity will afford respondents the freedom to be honest when asked items with potentially socially desirable response options. Ironically, however, such instructions may result in less respondent accountability. As such, it is possible that forcing respondents to respond in an identified
manner would lead to fewer random responses. Thus, we ask: Research Question 1: Does manipulation of survey anonymity via instructions affect prevalence of careless responding? We sought to determine whether different data screening methods are highly correlated. If so, a single method may be sufficient to identify and remove inattentive responders. Thus we ask: Research Question 2: What are the correlations among data screening measures? Research Question 3: What percentage of respondents provide potentially flawed data? Research Question 4: Which data screening methods are most/least liberal in identifying careless responses? Given previous research suggesting that inattentive response may be more common in later parts of a lengthy survey (Baer et al., 1997; D. T. Berry, Baer, & Harris, 1991), we ask: Research Question 5: Do some respondents begin diligently responding and degrade in response quality later in a survey? Research Question 6: Are self-report measures of data quality sufficient for data screening ? Method Participants Participants were 350 respondents drawn from a participant pool comprised of students enrolled in introductory psychology courses at a large southeastern U.S. university. This university is a state-supported school with approximately 24,000 undergraduate students with average SAT Verbal + Quantitative scores of entering Freshmen between 1175 and 1200. Survey Design Survey items were spread across 12 web pages of approximately 50 items per page. Items on pages 1 to 11 used a 1 to 7 Likert-type response scale. The system was unable to prevent multiple entries for the same respondent, so only the 317 participants with a single submission for each page were retained for the study. Procedure A JavaScript routine was used to randomly assign participants to one of three survey conditions: Anonymous, identified, and stern warning.
Careless Responses
Participants in the identified condition were asked to enter their name on each page. In the stern warning condition, respondents were asked to enter their name beside a statement indicating that inattentive responses represented a violation of the universitys honor policy. Measures Personality. Our primary measure was the 300 item version of the International Personality Item Pool (IPIP) version of the NEO-PI (Goldberg, 1999). Additionally, a 26 item measure of psychopathy (Levenson, Kiehl, & Fitzpatrick, 1995) and a 40 item measure of narcissism (Raskin & Terry, 1988) were included. Several social desirability scales were also included. There were no research questions related to these constructs but they were included to be typical of the content commonly found on long surveys. Bogus Items. We included one bogus item (an items with a clear correct answer) on each webpage (see Table 1 for items). Self-Reported Scales. Scale items available from the authors were developed in order to examine attitudes and effort with respect to the response process. The results of a factor analysis indicated two clear factors, one related to diligence of response and one related to affective attitudes regarding the survey (see Table 2). Self-Reported Single Item (SRSI) Indicators. SRSI-Effort was assessed as the response to the item I put forth ____ effort towards this study with five potential response options. SRSIattention was assessed as I gave this study ____ attention with five response options. Lastly, we asked, In your honest opinion, should we use your data in our analyses in this study? with a 1=yes or 0=no response (referred to as SRSI-UseMe). Indices of Careless Responding While most of our careless responding indicators were continuous variables, some of our research questions mandated that we develop some method dichotomizing respondents as appropriate or careless responders. As such, we created dichotomous variables called flag variables based on whether the underlying continuous indices met some threshold. Typically, we developed more than one flag variable with varying degrees of sensitivity per index. These data quality indices and flag variables are described below and are summarized in Table 3. Bogus Item Flags. If participants indicated a response of either 6 (agree) or 7 (strongly agree) to a true item, the bogus item was scored correct. Initial analyses indicated that the bogus item contained on page four of the survey was not interpreted as literally as intended by several respondents, thus it
was dropped from further consideration. We computed an overall bogus item variable (range 0 to 9) as the sum of each scored bogus item. Outlier analysis. A multivariate outlier analysis approach can be used to identify respondents who are consistently providing responses far from the mean of a set of items (Ehlers et al., 2009). Three dichotomous flag variables were then computed, one each if the averaged Mahalanobis distance (distributed as 2) exceeded a critical value corresponding to p=.05, p=.01, and p=.001. Consistency Indices. Consistency indices can be formed by examining the differences in two items that are highly similar in content. Goldberg, 2000 (cited in Johnson et al., 2005) suggested a method called Psychometric Antonyms in which correlations among all survey items are computed post-hoc and 30 item pairs with the largest negative correlation are identified. The Psychometric Antonyms index is then computed as the withinperson correlation across these thirty pairs of items. We used a similar index, however we retained only five item pairs with a correlation below -.60. to ensure items were sufficiently opposite in meaning. We developed a similar index, Psychometric Synonyms, which is formed in the same way as the Psychometric Antonyms except that item pairs with the largest positive correlations are used as the relevant item pairs. In this case there were 27 pairs with r>+.60. An additional index recommended by Jackson (1976, as cited in Johnson, 2005) was examined which we term the Even-Odd Consistency measure. With this approach, unidimensional scales are divided using an even-odd split based on the order of appearance of the items. An even subscale and also an odd subscale score is then computed as the average response across subscale items. A withinperson correlation is then computed based on the two sets of subscale scores for each scale (where the number of cases equals the number of scales). Jackson also recommended that the measure be corrected using the Spearman-Brown split half prophecy formula. We formed subscales from all 30 IPIP facets as well as the Psychopathy scale (Levenson et al., 1995) and the Narcissm scale (Raskin & Terry, 1988). To create flag variables for these indices, we converted each coefficient to zscores, then used cut points associated with z values of -1.96 and -1.65 (i.e., two and one-tailed tests). Response Pattern. Johnson (2005) created an index called LongString, computed as the maximum number of consecutive items with the same response option chosen. We computed an average of the Long String variable across the 9 webpages that included 50 items. A second index was
Careless Responses
computed as the maximum LongString variable found on any one webpage. A cutoff value for the Average and Max LongString Indices was formed based on clear break points in a frequency distribution and also the sorted raw data (see Figures 1 and 2). The first flag variable was based on an average LongString index > 3.78. The second required an average > 4.56. Similarly, the first flag value associated with the maximum LongString value (FMaxLS1) was assigned if the maximum LongString was > 7 while a second (FMaxLS2) was assigned if the maximum LongString was > 10. Self-Report Cutoffs. Responses of 3 or lower on the SRSI-effort item and SRSI-attention item were treated as flagged. A SR-diligence scale mean of 4.5 or lower was treated as flagged. The SRattitude scale deals more with the affective component not suitable for data screening. Results Research Question (RQ) 1 concerned whether instruction set could impact the quality of the data. There were small but significant differences for some of our variables by condition. Table 4 includes results of one-way ANOVAs for continuous variables while Table 5 includes chi-square tests for categorical indicators. In comparison to the standard anonymous approach, identified surveys resulted in respondents missing fewer bogus items on average (identified condition) and self-reporting paying more attention (identified condition). However, the stern warning was associated with a poorer attitude regarding the study. RQ2 concerned correlations among indices (see Table 6). RQs 3 and 4 concerned the prevalence of careless responses. As bogus items tended to differ across condition, frequency results are displayed by condition for that variable in Table 7. As can be seen, a full 47% of respondents in the typical anonymous responding condition were flagged by at least one bogus item and 28% endorsed more than one. Other indicators did not vary by condition and are reported in aggregate in Table 8. On the whole, it appears that the bogus item flags were among the more sensitive measures. Among post-hoc indicators, Mahalanobis distance with a relatively low p value (.05) was among the most liberal, whereas requiring a LongString index of 10 was among the most conservative. We also computed a new index (SumFlag) as the sum of the Bogus1orMore, Mahalanobis_p.05, AvgLongSring1, MaxLongString1, PsyAntonyms1, PsySynonyms1, and Even-Odd1 variables (see Table 3). The frequency distribution of this variable is
found in Table 9. There were some persons who were flagged by many of these relatively different indices. Such persons are likely to be responding with very little attention (i.e., randomly, or with a large number of identical consecutive responses). We suggest that around 2% to 5% of respondents provide data that are meaningless or nearly so. Note that this is consistent with previous estimates of random response (Ehlers et al., 2009; Johnson, 2005). However, a full 15% of respondents were flagged by more than one index. Given the diversity of the indices that compose this variable, we suggest that this finding implies that these persons can be thought of as occasionally responding in a inattentive way (cf. Baer et al., 1997; D. T. R. Berry et al., 1992). RQ5 Fading. Initially, very few (<5%) respondents failed to respond appropriately to the bogus items. However, at the close of the survey, a full 25% of the sample responded inappropriately to a clearly bogus item (see Table 1). The LongString variable showed a steady increase in its mean value over pages (presumably as more persons engaged in inattentive responding; Table 10). RQ8 Self-Report. The SRSI-UseMe variable provides a straightforward dichotomous assessment of respondents opinion. On the whole, 89.9% of respondents indicated that their data should be used. Using the cutoff points mentioned in the method, the SR-diligence scale flagged 43 (14%), the SRSI attention item flagged 38 (12%), while the SRSI effort flagged 53 (17%) respondents. Examination of Table 11 indicates that there were a number of persons which our indices suggest did not respond in a desired way, that were comfortable with allowing their data to be used. The SRSI-UseMe variable seemed to perform better than most other indicators in that only four persons showed no flags and yet still indicated that their data should not be used. Discussion This study provides the first comprehensive investigation of the prevalence of inattentive response patterns present in survey data in undergraduate samples. Additionally, the current study compared a much larger number of data quality indicators than any previous research. As such, this study answers several questions not considered by the extant literature. First, we found that the different indicator variables do not always flag the same individuals, as is evident by the correlations among all quality indicators. The most effective data screening approach will utilize several data quality indicators simultaneously.
Careless Responses
Second, we found a significant but small effect of using instruction sets to influence data quality. There appeared to be small advantages in using identified surveys such that fewer bogus items were endorsed under such conditions and respondents self-reported paying more attention for the anonymous vs. identified data. However, strong wordings about violations of the honor code served to decrease favorability of attitudes about the study. Third, our best estimates indicate that around 3-5% of respondents engage in rampant inattentive responding and around 15% of respondents engage in sporadic careless responding. We also found evidence to suggest that inattentive responses become more common over time. This has implications for longer surveys in which important criterion variables may be placed toward the end. It also implies that randomly placed items in a long survey being examined via scale development practices may show that items toward the end are less likely to load onto a latent factor than those at the beginning. Self-report screening methods were better than doing nothing. However, a considerable number of respondents were happy to have us use their data despite considerable evidence that their responses were not trustworthy. On the whole, self-report measures were not sufficient for a thorough data screening. Limitations While we have every reason to believe that our respondent population is typical of that of most large universities, we cannot be certain that this is the case. Although we have included a far more comprehensive set of indices than any previous studies on this topic, there are an almost unlimited number of approaches that could be explored. Recommendations Every Internet-based survey research study would benefit by incorporating at least one data screening method. For instances in which robust correlations are of interest, the easiest method would be a simple self-report measure directly asking the respondent whether his or her data should be used. A more rigorous approach would be to compute many indices and sum across them. For instance, a consistency index, the LongString, and Mahalanobis distance measures could all be created post-hoc and would provide a holistic view of the data integrity. If possible, we also recommend the inclusion of bogus items as their correct/incorrect nature removes the need to estimate a cutoff value to apply to the data, perhaps erroneously.
References Baer, R. A., Ballenger, J., Berry, D. T. R., & Wetter, M. W. (1997). Detection of random responding on the MMPI-A. Journal of Personality Assessment, 68(1), 139-151. doi:10.1207/s15327752jpa6801_11 Beach, D. A. (1989). Identifying the random responder. Journal of Psychology: Interdisciplinary and Applied, 123(1), 101-103. Berry, D. T., Baer, R. A., & Harris, M. J. (1991). Detection of malingering on the MMPI: A metaanalysis. Clinical Psychology Review, 11(5), 585598. doi:10.1016/0272-7358(91)90005-F Berry, D. T. R., Wetter, M. W., Baer, R. A., Larsen, L., Clark, C., & Monroe, K. (1992). MMPI-2 random responding indices: Validation using a self-report methodology. Psychological Assessment, 4(3), 340-345. doi:10.1037/10403590.4.3.340 Buchanan, T. (2000). Potential of the internet for personality research. In M. H. Birnbaum (Ed.), Psychological experiments on the internet (pp. 121-121-140). San Diego, CA: Academic Press. Clark, M. E., Gironda, R. J., & Young, R. W. (2003). Detection of back random responding: Effectiveness of MMPI-2 and personality assessment inventory validity indices. Psychological Assessment, 15(2), 223-234. doi:10.1037/1040-3590.15.2.223 Curran, P. G., Kotrba, L., & Denison, D. (2010). Careless responding in surveys: Applying traditional techniques to organizational settings. Paper Presented at the 25th Annual Conference of the Society for Industrial/Organizational Psychology. Atlanta, GA. Ehlers, C., Greene-Shortridge, T. M., Weekley, J. A., & Zajack, M. D. (2009). The exploration of statistical methods in detecting random responding. Annual Meeting of the Society for Industrial/Organizational Psychology. Atlanta, GA. Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In I. Mervielde, I. Deary, F. D. Fruyt & F. Ostendorf (Eds.), Personality psychology in Europe (, pp. 7-28). Tilburg, The Netherlands: Tilburg University Press. Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39(1), 103-129. doi:10.1016/j.jrp.2004.09.009
Careless Responses
Kurtz, J. E., & Parrish, C. L. (2001). Semantic response consistency and protocol validity in structured personality assessment: The case of the NEO-PI-R. Journal of Personality Assessment, 76(2), 315-332. doi:10.1207/S15327752JPA7602_12 Levenson, M. R., Kiehl, K. A., & Fitzpatrick, C. M. (1995). Assessing psychopathic attributes in a noninstitutionalized population. Journal of Personality and Social Psychology, 68(1), 151158. Nichols, D. S., Greene, R. L., & Schmolck, P. (1989). Criteria for assessing inconsistent patterns of item endorsement on the MMPI: Rationale, development, and empirical trials. Journal of Clinical Psychology, 45(2), 239-250. doi:10.1002/1097-4679(198903)45:2<239::AIDJCLP2270450210>3.0.CO;2-1 Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun, D. N. Jackson, D. E. Wiley, H. I. Braun, D. N. Jackson & D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement. (, pp. 49-69). Mahwah, NJ US: Lawrence Erlbaum Associates Publishers. Raskin, R., & Terry, H. (1988). A principalcomponents analysis of the narcissistic personality inventory and further evidence of its construct validity. Journal of Personality and Social Psychology, 54(5), 890-902. Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Pearson/Allyn & Bacon.
Author Contact Info: Adam W. Meade Department of Psychology North Carolina State University Campus Box 7650 Raleigh, NC 27695-7650 Phone: 919-513-4857 Fax: 919-515-1716 E-mail: awmeade@ncsu.edu S. Bart Craig Department of Psychology North Carolina State University Campus Box 7650 Raleigh, NC 27695-7650 Phone: 919-513-0518 Fax: 919-515-1716 E-mail: bart_craig@ncsu.edu
Careless Responses
Table 1 Bogus Items and Response Rates Number Flagged by 1 or more items (%) 10 (3.2) 17 (5.4) 14 (4.4) 118* (37) 30 (9.45) 25 (7.9) 22 (6.9) 63 (19.9) 56 (17.7) 84 (26.5)
1 2 3 4 5 6 7 8 9 10
I am using a computer currently. (R) I have been to every country in the world. I am enrolled in a Psychology course currently. (R) I have never spoken to anyone who was listening. I sleep less than one hour per night. I do not understand a word of English. I have never brushed my teeth. I am paid biweekly by leprechauns. All my friends are aliens. All my friends say I would make a great poodle.
Str. D 1 1 254 1 98 236 252 269 222 223 177
D 2 1 46 4 101 51 40 26 32 38 56
Sl.D 3 3 6 5 31 11 5 5 12 12 17
NA nor D 4 2 5 3 67 9 8 5 32 22 50
Sl. A 5 3 2 1 8 5 4 8 6 8 5
A 6 44 4 44 10 3 8 3 4 5 10
Str. A 7 263 0 259 2 2 0 1 9 9 2
D=Disagree, A=Agree, Str=Strongly, Sl.=Slightly. Items flagged if (reverse coded) Strongly Disagree or Disagree not chosen (except missing data). *Item 4 was dropped as a Bogus Item based on frequent response.
Careless Responses
Table 2. Exploratory Factor Analysis of Participant Engagement Items Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 I carefully read every survey item. I couldve paid closer attention to the items than I did. I probably should have been more careful during this survey. I worked to the best of my abilities in this study. I put forth my best effort in responding to this survey. I didnt give this survey the time it deserved. I was dishonest on some items. I was actively involved in this study. I rushed through this survey. I enjoyed participating in this study. This study was a good use of my time. I was bored during the study. This survey was too long. The work I did for this study is important to me. I care about my performance in this study. I would be interested in reading about the results of this study. Im in a hurry right now. Diligence .71 .71 .71 .71 .65 .59 .55 .48 .45 -.02 -.16 -.04 -.06 .09 .29 .18 .20 Interest .01 -.16 -.08 .12 .05 -.04 -.05 .19 .15 .82 .71 .65 .59 .57 .50 .38 .29
Note: Items 16 and 17 were not retained.
Careless Responses
Table 3 Summary of data screening methods and associated flag variables

Index Sum of Bogus Description Sum of nine dichotomously scored bogus items with clear correct/incorrect answers Associated Flag Variables Bogus1or More: 1 if any bogus items were answered incorrectly. Bogus2or More: 1 if two or more bogus items were answered incorrectly. Psychometric Antonyms 1: 1 if z-score of index exceeded -1.65 Psychometric Antonyms 2: 1 if z-score of index exceeded -1.96 Psychometric Synonyms1: 1 if z-score of index exceeded -1.65 Psychometric Synonyms 2: 1 if z-score of index exceeded -196 Even-Odd 1: 1 if z-score of index exceeded -1.65 Even-Odd 2: 1 if z-score of index exceeded -1.96
Psy Antonyms
Within-person correlation across item pairs with strong negative correlation
Psy Synonyms
Within-person correlation across item pairs with strong positive correlation
Even Odd Cons.
LongString Avg.
LongString Max Mahalanobis D
Within-person correlation across subscales formed by even-odd split of unidimensional scales, with SpearmanBrown split-half formula applied. Average of 10 LongString values. LongString is the maximum number of identical consecutive responses on a webpage. Maximum of 10 LongString values. Multivariate distance between respondents response vector and the vector of sample means.
AvgLongString1: 1 if average LongString was > 3.78 AvgLongString2: 1 if average LongString was > 4.56
SRSI Use Me
SR Diligence SR SI Attention SR SI Effort SumOfFlag
Dichotomous self-reported single item yes/no response as to whether respondent feels his or her data should be used for analysis. Mean of self-reported diligence scale. Self-reported single item attention to study. Self-reported single item effort expended on study. Sum of dichotomous variables: Bogus 1 or more, Mahalanobis p.05, AvgLongSring1, MaxLongString1, Psy Antonyms 1, Psy Synonyms 1, Even-Odd 1.
MaxLongString1: 1 if maximum LongString was > 7 MaxLongString2: 1 if maximum LongString was > 10 Malalanobis p.05: 1 if p value associated with Mahalanobis D chi-square < .05. Malalanobis p.01: 1 if p value associated with Mahalanobis D chi-square < .01. Malalanobis p.001: 1 if p value associated with Mahalanobis D chi-square < .001. Inherent in index.
SR Diligence Flag: 1 if scale mean < 4.5 SRSI Attention Flag: 1 is response < 4 SRSI Effort Flag: 1 is response < 4 N/A
Careless Responses
10
Table 4 Results of One-Way ANOVA and Tukey Post-Hoc Pairwise Tests by Study Condition One-Way ANOVA Tukey Post-Hoc Pairwise Tests 2 Indicator F p R M SD M SD D # Bogus items 3.49 .03 .02 1.361 2.081 .772 1.612 .31 SR attitude 3.31 .04 .02 4.131 1.061 3.753 1.163 .34 SRSI attention 3.60 .03 .02 4.081 .801 4.332 .692 -.33 Psychometric Synonyms 1.13 .33 Psychometric antonyms 2.72 .07 Even-Odd consistency .65 .52 Average LongString .11 .89 Maximum LongString .20 .82 Average Mahalanobis distance 2.26 .11 SR diligence .16 .85 SRSI effort .25 .78 Crown-Marlow .07 .93 BIDR-SD .31 .73 BIDR-IM 2.29 .10 IPIP SD .67 .51 df = 2, 314. 1 anonymous condition, 2 identified condition, 3 stern warning condition. Only significant Tukey post hoc tests reported.
Careless Responses
11
Table 5. Chi-Square Tests of Indicators Indicators 1 or more bogus item 2(2 df) p 6.38 .04 Follow-up comparison Anonymous v identified Anonymous v stern warning Identified v stern warning Anonymous v identified Anonymous v stern warning Identified v stern warning 2 (1) 6.29 1.83 1.38 5.04 4.86 .01 p .01 .18 .24 .02 .03 .92 -
2 or more bogus item
7.29
.03
UseMe Avg LongString > 3.78 Avg LongString > 4.56 Max LongString > 7 Max LongString > 10
2.27 2.02 .98 1.29 3.93
.32 .37 .63 .53 .14
Careless Responses
12
Table 6 Correlations among Indicators

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Total Minutes Sum of Bogus Psy Antonyms Psy Synonyms Even Odd Cons. LongString Avg. LongString Max Mahalanobis D SR SI Use Me SR Diligence SR Attitude SR SI Attention SR SI Effort Crowne-Marlow BIDR Self Dec. BIDR Im. Mgmt IPIP Social Desirability 1.00 -0.44 0.27 0.39 0.42 -0.21 -0.23 -0.37 0.34 0.43 0.05 0.23 0.48 0.07 0.10 0.06 0.13 1.00 -0.41 -0.70 -0.63 0.25 0.27 0.47 -0.52 -0.49 -0.12 -0.43 -0.46 0.00 -0.12 0.02 -0.15 1.00 0.49 0.39 -0.20 -0.21 -0.27 0.33 0.34 0.18 0.28 0.33 0.03 0.03 0.03 0.11 1.00 0.75 -0.27 -0.30 -0.57 0.50 0.54 0.24 0.47 0.50 0.12 0.18 0.08 0.23 1.00 -0.35 -0.30 -0.60 0.48 0.43 0.23 0.33 0.45 0.14 0.21 0.06 0.22 1.00 0.89 0.21 -0.29 -0.24 -0.15 -0.08 -0.27 -0.05 -0.06 0.01 -0.11 1.00 0.25 -0.29 -0.28 -0.14 -0.18 -0.34 -0.08 -0.07 -0.03 -0.12 1.00 -0.40 -0.31 -0.11 -0.32 -0.33 -0.21 -0.24 -0.13 -0.19 1.00 0.50 0.18 0.45 0.53 0.05 0.06 0.02 0.13 1.00 0.43 0.52 0.66 0.17 0.14 0.12 0.23 1.00 0.27 0.33 0.22 0.07 0.16 0.17 1.00 0.63 0.01 0.06 0.06 0.12 1.00 0.12 0.09 0.12 0.19 1.00 0.41 0.67 0.63 1.00 0.31 0.35 1.00 0.77 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SR= Self-report SI=Single Item N=317. Correlations > |.12| are significant at the p.01 level.
Careless Responses
13
Table 7 Percentage of Persons per Condition flagged by Bogus Items
One or more Bogus item Missed
Condition Stern Anonymous Identified Warning 52.54 69.47 61.54 47.46 30.53 38.46
Total
No Yes
60.57 39.43
Two or more Bogus items missed Condition Stern Anonymous Identified Warning 71.19 84.21 83.65 28.81 15.79 16.35 Total
No Yes
79.18 20.82
Careless Responses
14
Table 8 Percentage of Persons Identified as Careless by Each Data Screening Flag N 51 32 19 50 19 28 11 31 29 28 21 21 19 % 16.09 10.09 5.99 15.77 5.99 8.83 3.47 9.78 9.15 8.83 6.62 6.62 5.99
Malalanobis p.05 Malalanobis p.01 Malalanobis p.001 Avg. LongString 1 Avg. LongString 2 Max LongString 1 Max LongString 2 Psychometric Antonyms 1 Psychometric Antonyms 2 Psychometric Synonyms 1 Psychometric Synonyms 2 Even-Odd Consistency 1 Even-Odd Consistency 2 Note: N=317
Careless Responses
15
Table 9 Frequency Distribution of Sum of Flag Variable Sum of Flag 0 1 2 3 4 5 6 7 Frequency 142 104 26 20 13 7 4 1 % 44.8 32.8 8.2 6.3 4.1 2.2 1.3 0.3
Careless Responses
16
Table 10 Means and SD for LongString Indicator LongString Pages Mean SD 1 3.02 0.90 2 3.26 1.55 3 3.57 2.25 4 3.57 2.87 5 3.74 3.05 6 3.78 4.08 7* NA NA 8 3.81 3.40 9 4.00 3.75 10 4.03 4.59 11** 4.25 3.07 N=317. Page 7 contains only emails to read. Page 11 contained 30 items rather than 50.
Careless Responses
17
Table 11 Frequency of Sum of flag Variables by SR Indicators Sum of Behavioral Flag Variables 1 2 3 4 5 6 5 3 5 6 5 3 99 23 15 7 2 1 12 5 7 8 5 4 92 21 13 5 2 0 5 3 5 8 5 2 99 23 15 5 2 2 6 6 6 9 4 3 98 20 14 4 3 1
SRSI-UseMe = no SRSI-UseMe = yes SRSI-effort Flag = yes SRSI-effort flag = no SRSI-attention flag = yes SRSI-attention flag = no SR diligence flag = yes SR diligence flag = no
0 4 138 11 131 9 133 8 134
7 1 0 1 0 1 0 1 0
Careless Responses
18
Figure 1. Frequency Distribution of Average LongString Variable
Careless Responses
19
Figure 2. Frequency Distribution of Max LongString Variable

Identifying Careless Responses in Survey Data (2011, 21, Data Screening)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Identifying Careless Responses in Survey Data (2011, 21, Data Screening)

Încărcat de

Drepturi de autor:

Formate disponibile

Meade, A. W., & Craig, S. B. (2011, April). Identifying careless responses in survey data.

Identifying Careless Responses in Survey Data

Str. D 1 1 254 1 98 236 252 269 222 223 177

Str. A 7 263 0 259 2 2 0 1 9 9 2

Note: Items 16 and 17 were not retained.

Table 3 Summary of data screening methods and associated flag variables

Within-person correlation across item pairs with strong negative correlation

Within-person correlation across item pairs with strong positive correlation

Even Odd Cons.

LongString Max Mahalanobis D

SR Diligence SR SI Attention SR SI Effort SumOfFlag

2 or more bogus item

2.27 2.02 .98 1.29 3.93

.32 .37 .63 .53 .14

Table 6 Correlations among Indicators

Table 7 Percentage of Persons per Condition flagged by Bogus Items

One or more Bogus item Missed

0 4 138 11 131 9 133 8 134

Figure 1. Frequency Distribution of Average LongString Variable

Figure 2. Frequency Distribution of Max LongString Variable

S-ar putea să vă placă și