Documente Academic
Documente Profesional
Documente Cultură
Authors: Abstract:
Daniel M. Doolan RN, PhD It is often advisable for researchers to use an existing
data set to answer research questions. In particular,
Jennifer Winters RN, PhD
using an existing data set can help a researcher obtain
Sahar Nouredini RN, PhD results much more quickly, at a lower cost, and
without exposing new research subjects to many of the
Affiliation: potential harms associated with research participation.
However, the many researchers seeking to use an
California State University East existing data set face a variety of challenges specific
Bay; CA, USA to this research methodology. This article reviews
some of the key differences associated with using an
Authors’ e-mail addresses: existing data set as compared with those conducting
daniel.doolan@csueastbay.edu research by recruiting research subjects. Advantages
and disadvantages associated with the use of existing
jennifer.winters@csueastbay.edu data sets are discussed as are ethical issues, strategies
sahar.nouredini@csueastbay.edu to obtain an optimal data set, and special
considerations associated with this methodology.
Additionally, suggestions are given relevant to
Correspondence author:
reporting results when conducting research using an
Daniel M. Doolan RN, PhD existing data set or a “secondary analysis”.
daniel.doolan@csueastbay.edu
Key Words:
Secondary Analysis
. Data Sets
Research Methodology
Research Techniques
TABLE I: Checklist for Secondary Analysis as Compared With Designing a Prospective Study
by engaging in a prospective research study. subjects need not only be recruited, but also
It has been erroneously implied that research followed over months or years. Second, it
using an existing data set is inherently more tends to be much less expensive to use data
suspect than other research.5 As a general that has already been collected. Third, as
guideline, it is highly preferable to answer most prospective research studies include at
research questions using an existing data set least some level of risk to the subjects, a
if one is available to answer the research researcher using an existing data set has the
questions. This is preferable for several benefit of not putting subjects in harm’s
reasons. way, as those subjects have already
experienced any burdens associated with
First, use of an existing data set tends
participating in the research.3 Table II
to save a great deal of time, as opposed to
contains several successful examples of
designing a new research protocol to recruit
researchers using an existing data set to
subjects and gather data. This benefit of
answer new research questions.6 7 8 9 10 11 12 13
obtaining results more quickly is especially
true for longitudinal studies, in which
Example 1 Secondary Statistical Same as above (1) test the Same as above Same as above Abstinence, but not
Analysis analysis using associations decreased drug use
ANOVA, between without abstinence,
Park et al./Changes in Spearman’s longitudinal illicit was associated
health outcomes as a correlation, and drug use patterns with improvement
function of abstinence linear regression including in drug use
and reduction in illicit models/2015/52 abstinence and consequences,
psychoactive drug use: 8 health outcomes compared to those
A prospective study in and (2) test whether with continued or
primary these associations increased drug use.
care vary by drug type This relationship
among illicit drug was found among
users in primary those whose main
care. drug was cocaine
and opioids, but
not marijuana
Example 2 Parent Study Survey/2011/ Members of the To profile practice Survey Professional Resource data bank
2,529 Canadian and concerns of activities, profiling Canadian
Kopansky-Giles D & Chiropractic Asso- DCs education, chiropractors
Papadopoulos ciation (CCA) research and
C./Canadian teaching
chiropractic resources activities, main
databank (CCRD): a sectors of
profile of Canadian activity, care
chiropractors provided to
patients,
chiropractic
techniques used,
type of
conditions
treated, and
referral practices
Example 2 Secondary Cross sectional Same as above To describe the Same as above Same as above Canadian DCs with
Analysis analysis/2015/ characteristics of practices oriented
2,529 Canadian DCs who toward the
Blanchette, M., tend to treat more treatment of
Cassidy, J. D., Rivard, workers’ injured workers
M., & Dionne, C. compensation cases that collaborate
E./Chiropractors' with other health
characteristics care providers and
associated with their facilitate workers’
number of workers' access to care
compensation patients reported more
workers’
compensation
patients
Example 3 Parent Study Cross-sectional Ambulatory Examined Video-tapes Wandering and Wandering and
correlational residents of 22 equivalence of for up to PNA behaviors PNA as
Algase, D. L. et al./Are design/2008/ SNFs and 6 ALFs wandering and twelve 20- overlapping, but
wandering and 181 meeting DSM-IV physically minute nonequivalent
physically criteria for nonaggressive observations phenomena
nonaggressive agitation dementia agitation (PNA) as per participant
equivalent? concepts
Example 3 Secondary Cross-sectional Same as above/ To examine the Same as Emotional An engaging
Analysis correlational participants who relationship above/ Expressions environment was
design/2017/ completed more between observed associated with
Lee, K. H., Boltz, M., 177 than three environmental displays of more positive
Lee, H., & Algase, D. emotional ambience and positive and emotional
L./ Is an Engaging or expression psychological well- negative expressions,
Soothing Environment observations that being of persons emotional however, a
Associated With the evaluated with dementia expressions soothing
Psychological Well- psychological well- environment was
Being of People With being associated with
Dementia in Long-Term neither positive nor
Care?. negative emotional
expressions.
Example 4 Parent Study Quasi- Residents, their The effects of Questionnaire/ Residents: Residents: No
experimental/ family caregivers small-scale living scales quality of life, effects were found
Verbeek, H., et al./ 2009/404 (229 and nursing staff facilities in neuropsychiatric for total quality of
Dementia care Family from small and dementia care symptoms, and life,
redesigned: Effects of caregivers, 259 large scale agitation neuropsychiatric
small-scale living residents, 305 dementia facilities symptoms, and
facilities on residents, nursing staff) Family agitation.
their family caregivers, Caregivers:
and staff perceived Family Caregivers:
burden, in small-scale
satisfaction, and living reported
involvement significantly less
with care burden and were
more satisfied with
Nursing: Job nursing staff than
satisfaction and family caregivers
motivation. in regular wards.
No differences
were found in their
involvement with
care.
Nursing: No
significant
differences were
found for staff’s
job satisfaction and
motivation,
although subgroup
analyses using
contrast groups
(revealed more job
satisfaction and
motivation in
small-scale living
compared with
regular wards.
Example 4 Secondary Cross Nursing staff (a) explore staff Same as above Job satisfaction, Job satisfaction,
Analysis sectional/2017/1 from small and perceptions about motivation, and job motivation, Job
38 large scale skills warranted in job autonomy, and
Adams, J., Verbeek, H., dementia facilities both small and characteristics social support were
& Zwakhalen, S. G./ large scale significantly higher
The Impact of dementia facilities in typical small-
Organizational (b) determine scale nursing
Innovations in Nursing differences in job homes, compared
Homes on Staff satisfaction, to those in typical
Perceptions: A motivation, and job large scale
Secondary Data characteristics of (traditional)
Analysis staff between the nursing homes.
two care settings
Table III contains some potential However, it may be possible to fine-tune the
sources of data sets. Upon identifying a research questions to better match the
potential data source, the PI should consider available data set, so long as the revised
the overall quality of the data set. Areas of research questions are still relevant and
interest may include whether the data set has important.
the appropriate variables to evaluate the For proprietary data sets, the parent
research question(s); sampling methodol- study PI may be saving certain analyses for
ogy; how variables are defined and other researchers. Thus, it is important to
measured; whether the sample size is discuss clear guidelines with the original PI
sufficient such that the inquiry would be to ensure the specific analysis of data being
adequately powered; whether there is requested is permissible. 1 If what is
excessive missing data and/or a high loss to available to the PI to analyze is insufficient,
follow up in a longitudinal study. 1 3 4 The
then alternative plans should be made to
PI should consider specific aspects of the avoid wasting time. 14
data, including original research proposals,
recruitment plans, and research protocols. When focusing in on a data source, the
PI should collaborate with individuals that
The PI of the secondary analysis will are expert in the specific data source being
need to build a persuasive case that the
considered. Especially for large data sets, it
original data set is a good fit. The measures is possible that coding and other data entry
used in the original data set, most often, will processes change over time, and the PI will
not be a perfect match with what the PI may need to make sure that the data is being
have hoped for, but they must be sufficient correctly deciphered. Erroneous informa-
to potentially make meaningful scientific tion could be reported if the PI has not
discovery. If the PI determines that a data gained sufficient knowledge of how to
set is of insufficient quality or lacking in correctly interpret information from the data
variables required to answer the research source. Ideally, the PI will work closely
question, it is best to seek out a more with the investigator from the original study,
appropriate data set. Attempting to make a receiving codebooks and other information
data set work if it is not well suited to
that can be used to correctly interpret the
answer the research questions is futile.
data. Some successful studies have occurred available. Having a sound research plan
without substantial collaboration with the PI based on a competent literature review and
of the parent study; however, this should be approved by the IRB is a good way to make
done with some caution to avoid sure that comparisons and analyses being
misinterpreting information. 3 conducted with an existing data set have
merit and are appropriate. Additionally, the
Certain types of databases tend to have
researcher should do a power analysis to
specific strengths and weaknesses. Admin-
make sure that the data set is likely to find a
istrative data sets, which are commonly used
significant result for areas of interest if such
in oncology research, tend to lack the level
a result really exists. 5
of clinical detail that might be useful since
they tend to be originally obtained for non-
research purposes. However, they are often 1.7. Special Circumstances
well suited for examining healthcare Using an existing data set may require
practice variation and other questions for the PI to transfer data from one storage
which an overview of trends would be format to another. Increasingly, existing
enlightening, whereas patient registries are data sets are digitally formatted. However,
well suited for more disease and/or when transferring a data set into another
treatment specific data. 4 Indeed, a registry statistical software program and/or
tends to be best suited for research questions otherwise manipulating the storage
relevant to the population that is mechanism, there is a risk of corrupting the
predominant within the registry. data. The PI should engage in rigorous
There is the potential to merge quality controls, with backup files, to ensure
multiple data sets together. However, that the data remains accurate and that no
exercise caution as data corruption is an human error corrupts the data set. 1 Sound
issue when manipulating data sets in this practice involves running descriptive
way. Specifically, there is a risk of merging analyses and making sure that post-data
variables that may not have been measured transfer results match those obtained prior to
as reliably in some of the merged studies. transferring the data set. As a general
guideline, it is inadvisable to attempt to
manually transpose data line by line. Such
1.6. IRB Approval
manual transcriptions are notoriously prone
Institutional Review Board (IRB) to entry error issues, and current technology
approval provides a mechanism for the tends to have much more reliable methods
researcher to demonstrate a plan to protect for data conversion.
subjects’ private information. Also,
PIs conducting a prospective study
obtaining IRB approval prior to analyzing
recruit as many participants as appropriate
data from an existing data set is one
given the power analysis and then tend to
important way to make sure that the answers
stop recruiting. However, PIs using existing
being sought from the data set are prudent.
data sets may have extremely large sample
Analysis of existing data sets should be
sizes, far larger than would be required by a
purposeful. Given an alpha of 0.05,
power analysis. Samples this large can
common in social sciences, we would expect
show statistically significant differences
1 in 20 significant findings to be based on
with very small effect sizes. If this is the
pure coincidence. So, it would be a major
case, the researcher must consider what is a
mistake to aimlessly compare variables of an
relevant effect size so as not to
existing data set simply because the data are
overemphasize trivial differences that may
References
14. Westin, GF, Dias, AL, & Go, RS. 16. Bell, BA, Onwuegbuzie, AJ, Ferron, JM,
Exploring big data in hematological Jiao, QG, Hibbard, ST, & Kromrey, JD.
malignancies: Challenges and Use of design effects and sample
opportunities. Current Hemotologic weights in complex health survey data:
Malignancy Reports. 2016;11,271-279. A review of published articles using data
DOI: 10.1007/s11899-016-0331-4 from 3 commonly used adolescent health
surveys. Research and Practice.
15. The QSEN Institute. Quality and Safety
2012;102(7) 1399-1405.
Education for Nurses. www.qsen.org.
Accessed August 1, 2017.