Sunteți pe pagina 1din 22

Sleep Medicine Reviews 25 (2016) 52e73

Contents lists available at ScienceDirect

Sleep Medicine Reviews


journal homepage: www.elsevier.com/locate/smrv

CLINICAL REVIEW

The Pittsburgh sleep quality index as a screening tool for sleep


dysfunction in clinical and non-clinical samples: A systematic review
and meta-analysis
Tatyana Mollayeva a, *, Pravheen Thurairajah b, Kirsteen Burton c, Shirin Mollayeva d, e,
Colin M. Shapiro f, g, Angela Colantonio h
a
Graduate Department of Rehabilitation Science, Collaborative Program in Neuroscience, University of Toronto, Toronto Rehabilitation Institute-University
Health Network, 550 University Avenue, Rm 11120, Toronto, ON M5G 2A2, Canada
b
The Hospital for Sick Children, Canada
c
Department of Medical Imaging, Institute of Health Policy, Management & Evaluation, University of Toronto, Canada
d
Department of Cell and Systems Biology, Faculty of Arts & Science, Acquired Brain Injury Research Laboratory, University of Toronto,
Youthdale Child & Adolescent Sleep Clinic, Ontario, Canada
e
Department of Ecology and Evolutionary Biology, Faculty of Arts & Science, Acquired Brain Injury Research Laboratory, University of Toronto,
Youthdale Child & Adolescent Sleep Clinic, Ontario, Canada
f
Department of Psychiatry and Ophthalmology, University of Toronto, Toronto Western Hospital, University Health Network, Ontario, Canada
g
Youthdale Child & Adolescent Sleep Clinic, Ontario, Canada
h
Saunderson Family Chair in Acquired Brain Injury Research, Toronto Rehabilitation Institute-University Health Network, CIHR Chair in Gender,
Work and Health, Department of Occupational Science and Occupational Therapy, University of Toronto, Canada

a r t i c l e i n f o s u m m a r y

Article history: This review appraises the process of development and the measurement properties of the Pittsburgh
Received 14 September 2014 sleep quality index (PSQI), gauging its potential as a screening tool for sleep dysfunction in non-clinical
Received in revised form and clinical samples; it also compares non-clinical and clinical populations in terms of PSQI scores.
16 January 2015
MEDLINE, Embase, PsycINFO, and HAPI databases were searched. Critical appraisal of studies of mea-
Accepted 26 January 2015
Available online 17 February 2015
surement properties was performed using COSMIN. Of 37 reviewed studies, 22 examined construct
validity, 19 e known-group validity, 15 e internal consistency, and three e test-retest reliability. Study
quality ranged from poor to excellent, with the majority designated fair. Internal consistency, based on
Keywords:
The Pittsburgh sleep quality index
Cronbach's alpha, was good. Discrepancies were observed in factor analytic studies. In non-clinical and
Psychometric properties clinical samples with known differences in sleep quality, the PSQI global scores and all subscale scores,
Sensibility with the exception of sleep disturbance, differed significantly. The best evidence synthesis for the PSQI
Systematic review showed strong reliability and validity, and moderate structural validity in a variety of samples, suggesting
Meta-analysis the tool fulfills its intended utility. A taxonometric analysis can contribute to better understanding of
sleep dysfunction as either a dichotomous or continuous construct.
© 2015 Elsevier Ltd. All rights reserved.

Introduction serious impairment in daytime performance [3,4], increase the risk


of involvement in motor-vehicle and occupational accidents [5,6],
Disturbed sleep is among the most frequent health complaints exacerbate medical, neurologic, and/or psychiatric conditions [7,8],
clinicians encounter [1,2]. It is common in the general population e and result in diminished quality of life [9]. In the past, sleep com-
more than one half of adults in the Western world experience plaints were treated with hypnotic medications without further
intermittent sleep disturbances and between 15 and 20% of adults diagnostic evaluation [10]. The last three decades of research have
report chronic sleep problems [2]. Sleep dysfunction can lead to culminated in the understanding of sleep dysfunction as a complex
entity [11,12], wherein a range of primary sleep disorder symptoms
overlap with neurophysiological, psychological, and behavioral
* Corresponding author. Tel.: þ1 416 596 3422x7848; fax: þ1 416 946 8570. factors, requiring targeted diagnostic and treatment intervention
E-mail address: tatyana.mollayeva@utoronto.ca (T. Mollayeva). (Fig. 1).

http://dx.doi.org/10.1016/j.smrv.2015.01.009
1087-0792/© 2015 Elsevier Ltd. All rights reserved.
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 53

Glossary of terms Internal consistency: the degree to which the components


making up the measure are related
Construct validity: the degree to which the measure scores Test-retest reliability: consistency of scores for the same patient
reflect the hypotheses; includes 1) over time
convergent, 2) divergent, and 3) known- Inter-rater reliability: degree of agreement between the score
group given by one rater and that by another at
Convergent validity: the degree of relatedness between two one time with respect to the same
constructs hypothesized to be related respondent; addresses the
Divergent validity: the degree of relatedness between two interpretability of the measure; falls
constructs hypothesized to be different under the broader test-retest reliability
Known-group validity: ability of the measure to discriminate category
between a group of individuals known Intra-rater reliability: degree of agreement between scores
to have a particular trait and those who given by the same respondent or rater at
do not have that trait (same as one time and those given at another time;
discriminative validity) falls under the broader test-retest
Sensibility: enlightened common sense [82]; reliability category
consists of domains 1) purpose,
population, and setting, 2) content Abbreviations
validity, 3) face validity, and 4) CFA confirmatory factor analysis
feasibility COSMIN consensus-based standards for the selection of health
Content validity: the fitness of the domains covered in the status measurement instruments
measure; reflects the appropriateness of DD daytime dysfunction
the method by which items were DSM-IV diagnostic and statistical manual of mental disorders,
selected and reduced for inclusion in the 4th edition
measure during its development EFA exploratory factor analysis
Face validity: the appearance that the measure, by its ESS Epworth sleepiness scale
wording of items, response options, and HSE habitual sleep efficiency
score meanings, is suitable to measure ICC intraclass correlation coefficient
the desired construct ICSD-2 international classification of sleep disorders, 2nd
Feasibility: practicality of administering the edition
measure; for self-administered MSLT multiple sleep latency test
questions, the measure's self- PCC Pearson product-moment correlation coefficient
explanatory nature for valid responses PSG polysomnography
and limited non-responses, completion PSQI Pittsburgh sleep quality index
time, and scoring formula SD sleep duration
Reliability: the extent to which the measure is SDI sleep disturbance
reliable, that is, free of errors in score SL sleep latency
not due to true state of construct SM sleep medications
measured in the patient; consists of 1) SQ sleep quality
internal consistency, 2) test-retest, 3) TBI traumatic brain injury
inter-rater, and 4) intra-rater WMD weighted mean difference

Measurement need judgment, that is, results are to be influenced neither by the person
doing the measuring nor the person being measured. In subjective
The multifactorial construct of sleep dysfunction causes diag- measurements, both roles can impact the outcome to some extent.
nostic confusion in determining which persons need to be exten- Given that even objective measures have a subjective component,
sively investigated for the etiology of their complaints to be often requiring an expert to read and interpret the measures, many
established. Issues of self-insight and awareness are also important feel that a patient's opinion and appraisal of his or her own status is of
to note e some persons may not be fully aware of their sleep great value [16]. This view is evident in the recent initiative of the US
impairment and will thus not emphasize such issues in the physi- federal government, which seeks a balance between outcomes that
cian's office, or do not appreciate the extent or impact of their sleep are of interest to investigators (i.e., results of laboratory testing, etc.),
problems [13]. As such, the main challenge today for primary care and those of primary interest to the patient (i.e., satisfaction, self-
and specialist clinicians is to identify the patients who may have perceived quality, etc.) in using patient reports of health status [17].
undetected sleep dysfunction and to direct further diagnostic
investigation. A tool for this purpose would be discriminative, ac- Measurement concept of ‘sleep quality’
cording to criteria defined by Kirshner and Guyatt [14] and thus,
should be evaluated by its 1) intra-rater reliability; 2) internal Self-perceived sleep quality represents something of a challenge
consistency and reliability; and 3) construct validity. to measure because there is no generally accepted reference or gold
There are numerous instruments, both subjective and objective, standard [18]. One approach would be to use a carefully constructed
that can be used to measure sleep functioning [15]. In objective questionnaire incorporating the recommendations of the American
measures, the expectation is limited involvement of personal Psychological Association pertaining to clinical sleep dysfunction
54 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

Fig. 1. Construct of sleep dysfunction. Unidirectional arrows from construct (i.e., circle) to items (i.e., rectangles) represent reflective models, and from items to constructs e
formative models. Bidirectional arrows represent a combination of reflective and formative elements. Adapted from Fayer and Hand (1997) [96].

evaluation [19]. This starts with the main complaints of a patient, Measurement tool
classified into: 1) inability to get adequate nighttime sleep given the
opportunity for sleep (i.e., insomnia), 2) negative daytime conse- The Pittsburgh sleep quality index (PSQI) [22] is the most
quences as a result of poor sleep (e.g., daytime sleepiness, fatigue, commonly used generic measure in clinical and research settings. A
and cognitive impairment), 3) episodic nocturnal movements or search conducted in March of 2014 for PubMed articles containing
behaviors, or 4) a combination of these concerns. “Pittsburgh sleep quality index” as a search term returned in total
Although no specific quantitative sleep parameters define 1512 articles and a growth trend over time, with 141 articles pub-
insomnia disorder [20], an average sleep latency over 30 min, wake lished in 2010 and 323 articles published in 2013. By contrast, a
after sleep onset lasting more than 30 min, sleep efficiency less search for “Leeds sleep evaluation questionnaire” [23] returned 66
than 85%, and/or a total sleep duration of less than six and a half articles in total; “sleep disorder questionnaire” [24] returned 52 ar-
hours are common manifestations, when reported together with ticles, and “medical outcomes study sleep scale” [25] returned a total
difficulty initiating or maintaining sleep, waking too early, or of 32 articles. The PSQI was developed in 1988, with no particular
chronically non-restorative sleep, is considered clinically signifi- clinical population in mind, to: 1) provide a reliable, valid, stan-
cant if occurring three or more nights per week, and suggestive of dardized measure of sleep quality; 2) discriminate “good” and “poor”
chronic insomnia disorder if lasting one or more months [21]. sleepers, and 3) provide an easy index for patients to complete and
Reports of daytime impairment almost invariably accompany the for clinicians and researchers to interpret [22]. Consequently, the
report of inadequate nighttime sleep [18], and these symptoms are developers' targeted concept and purpose conform to our mea-
often the main complaints in patients seeking medical care. In surement need (i.e., discriminate “good” and “poor” sleepers).
many patients, daytime impairment will include excessive sleepi- Given the PSQI's widespread use, a comprehensive review of its
ness, fatigue, low energy, low motivation, and/or cognitive symp- measurement properties is long overdue. Moreover, while indi-
toms related to poor attention, concentration, and memory. vidual papers examining the various aspects of the measurement
Sleep-related movements and/or behaviors are often reported by properties of the PSQI have been published, its applicability to
the spouse or bed partner, as the patient is usually not aware of different clinical and non-clinical groups (i.e., persons with and
episodes such as snoring, twitching or kicking of legs, bruxism (i.e., without medical or psychological conditions, respectively) has not
teeth grinding), sleep walking or talking, or violent behaviors been examined. Therefore, we undertook a systematic review of the
arising from sleep [18]. literature pertaining to the psychometric properties of the PSQI
An ideal screening instrument would incorporate all items with the purpose of: 1) appraising the clinical sensibility of the
relevant to concept of sleep dysfunction, and be able to differentiate instrument, 2) systematically evaluating its psychometric proper-
“good” and “poor” sleepers. Given the issues with self-insight and ties, specifically construct validity and reliability, and 3) summari-
awareness in some persons, the descriptive tool would incorporate zing sex-stratified results pertaining to the PSQI. Finally, this
items for the significant other (i.e., bed partner or caregiver), systematic review featured a meta-analytic component, reporting
including those related to behavioral manifestations in sleep. the weighted mean difference in PSQI global and subscale scores for
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 55

clinical and non-clinical samples. The present work intended to Quality assessment
provide information for both researchers and clinicians on the
PSQI's ability to serve as a descriptive tool for sleep dysfunction in Each measurement element was appraised by two independent
non-clinical and clinical populations, while also identifying pitfalls reviewers (TM and PT), separately, using the COSMIN checklist. The
and providing ideas for future research utilizing the measure. authors met for a calibration review, in which they independently
reviewed one study, then met and discussed each item of the
Methods COSMIN list to clarify its meaning and interpretation. Following
this, the methodological quality of each study was rated across a set
Search strategy of items related to each attribute, independently by the same two
reviewers. In cases of disagreement between the two reviewers, a
In collaboration with a medical information specialist (JB) and third reviewer (KB) performed a separate appraisal, following
utilizing proposed PubMed search filters for finding studies on a which consensus was reached in each case. Each item was assigned
measurement's properties [26], we utilized a comprehensive a rating based on descriptors on a four-point rating scale (i.e., poor,
search strategy to study measure properties of the PSQI. Four fair, good, and excellent) [30]. We took the lowest rating across the
electronic databases, MEDLINE, Embase, PsycINFO, and health and items to assess an overall methodological quality score per mea-
psychosocial instruments (HAPI, or HaPI), were searched. Table S1 surement property; we kept separate ratings for known-group
displays the terms used in searches of each database. validity (Table 1).
Second, the consistency of the level of evidence was summa-
Selection criteria rized. Evidence was designated “strong” when consistent findings
were found in multiple good or at least one excellent quality
All English language peer-reviewed studies found through the study and the total sample size of combined eligible studies was
aforementioned databases and published before January 30, 2014, 100; “moderate” when consistent findings in multiple fair or one
were considered eligible. Only full text original articles focusing on good quality study with a total sample 50, or at least one good
the development or evaluation of the measurement properties of or excellent quality study with a total sample of 50e99; “limited”
an English version of the PSQI were included. Since the PSQI was when findings were found in at least one fair, good or excellent
not developed for a specific population, no restrictions were quality study with total sample size between 25 and 49; and
applied to the populations studied. The reference lists of eligible “unknown” when findings were of indeterminate rating, in
articles were scanned for other relevant publications. Studies studies with poor methodological quality or with a sample of <25
where the primary objective did not include evaluation of the [31].
psychometric properties of the PSQI (e.g., where the PSQI was used
as one independent variable in a multivariable regression model, or
Meta-analysis
where models examined predictors of the PSQI), were excluded.
To compare PSQI global and subscale scores across samples, a
Assessment of item generation, reduction, sensibility
sorting procedure was established to separate reported scores for
non-clinical (i.e., control participants in good health, no medical or
The assessment of the process of development of the PSQI,
psychological conditions) and clinical (i.e., participants reporting a
including item generation, item reduction, and the sensibility of the
medical or psychological problem or condition, or meeting criteria
instrument was performed using Bombardier's framework [27],
for a clinical diagnosis as indicated by a clinical scale or physician's
developed based on the work of Feinstein [28] and Rowe & Oxman
interview) samples. The meta-analytic features of this work were
[29]. We refer the reader to Glossary of terms for the description of
conducted in accordance with the Cochrane handbook for sys-
difficult terminology.
tematic reviews [32]. To obtain summary estimates for mean PSQI
global and subscale scores for exposure and control participants
Measurements properties (reliability and validity)
(i.e., clinical and non-clinical, respectively), the mean, standard
deviation and sample size was recorded; data were then pooled to
Studies of the PSQI's measurement properties were appraised by
obtain weighted means for group comparison; the results were
the “COnsensus-based Standards for the selection of health status
expressed with the weighted mean difference (WMD) and 95%
Measurement INstruments” (COSMIN) checklist [30]. This instru-
confidence interval (CI). Heterogeneity between studies was
ment was developed to evaluate the methodological quality of
examined using the Cochrane Q statistics and Higgins I2 statistics. A
studies on the measurement properties of health-status-related
random effect model was utilized in the presence of statistical
questionnaires.
heterogeneity between studies.
The reliability domain consists of internal consistency, reliability
and measurement error [30]. The construct validity domain was
defined as hypothesis testing, the degree to which the global score Results
garnered by the measure was consistent with the assumption that
the instrument measures the construct it is intended to measure, Search results
covering both convergent or divergent validity, and known-group
validity. A priori hypothesis about the expected direction and Of 1376 articles identified, 50 [9,22,33e80] were selected for
magnitude of the relationship between the PSQI and a comparison full-text review and 37 [46e80] were included in the final
instrument is an important component of validity assessment review (Fig. 2). Fifteen studies evaluated internal consistency
within the COSMIN checklist. [22,48e53,58,61e63,69,70,76e78], 19 evaluated known-group val-
As the purpose of this review is appraisal of the PSQI for idity [9,22,46,49e51,53,56,58e61,64,67e69,71,73,79], and 22 evalu-
descriptive purposes, under the framework of Kirshner and Guyatt ated convergent or divergent construct validity [46,47,49,51e54,
[14], such domains as reliability and construct validity were eval- 57e60,64e68,70,73e77]. No studies evaluating intra-rater reliability
uated. Given the absence of a gold standard for measuring sleep of the PSQI were identified. Tables 2e4 display characteristics of these
dysfunction, the validity criterion was not assessed. studies.
56 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

Table 1
Methodological quality of included studies assessed by the COSMIN checklist [26].

Study, year, reference number Reliability Hypothesis testing

Internal consistency Testeretest Convergent Divergent Known-group


a
Aloba et al. 2007 [45] . . Fair (Q2,4) . Fair (Q2,4)a
Alsaadi et al., 2013 [47] . . Fair (Q2,4)a . Fair (Q2,4)a
Babson et al. 2012 [48] Poor (Q7)a . . . Fair (Q4)a
Beaudreau et al. 2012 [49] Poor (Q5,7)a . Fair (Q2,4)a . Fair (Q2,4)a
Beck et al. 2004 [50] Poor (Q5,7)a . . . Fair (Q4)a
Bush et al. 2012 [51] Poor (Q5)a . Fair (Q4)a Fair (Q4)a Fair(Q4)a
Buysse et al. 1989 [22] Fair (Q5)a Fair(Q7,9)a Fair (Q2,4)a Fair (Q2,4)a Fair (Q2,4)a
Buysse et al. 1991 [73] . . . Fair Fair
Buysse et al. 2008 [52] . . . . Fair (Q2)a
Carpenter & Andrykowski 1998 [53] Poor (Q3,5)a . Fair (Q2)a Fair (Q2)a Fair (Q2)a
Casement et al. 2012 [54] . . Good (Q4)a Good (Q4)a Good(Q4)a
Cole et al. 2006 [55] . . . . Fair (Q4)a
Elsenbruch et al. 1999 [56] . . . . Fair
Fichtenberg et al. 2001 [80] . . Fair (Q2,4)a . Fair (Q2,4)a
Glicklich et al. 2014 [57] . . Fair (Q2,4)a . Fair (Q2,4)a
Grandner et al. 2006 [58] Poor (Q9)a . Fair (Q2,4)a . Fair (Q2,4)a
Hall et al. 2009 [59] . . . . Fair (Q2)a
Hancock & Larner 2009 [60] . . Fair . Fair
Knutson et al. 2006 [61] . Fair . . Fair (Q2,4)a
Magee et al. 2008 [62] Good . . Poor (Q4)a
Mariman et al. 2012 [63] Poor (Q7)a . . Fair (Q2,4)a
Masel et al. 2001 [64] . . Fair (Q2,4)a . Fair (Q2,4)a
Mondal et al. 2013 [65] . . Good . Good
Morin et al. 2011 [66] . . Fair (Q4)a . Fair (Q4)a
Neau et al. 2012 [67] . . Fair (Q2,4)a Fair (Q2,4)a Fair (Q2,4)a
Neu et al., 2007 [68] . . Fair (Q2)a . Fair (Q2)a
Nikassio et al. 2014 [70] Poor (Q7)a . Fair (Q2,4)a . Fair (Q2,4)a
Osorio et al. 2006 [71] . . . . Fair
Otte et al. 2013 [72] . . . . Good
Parcell et al. 2008 [9] . . . . Poor (Q3)a
Rener-Sitar et al. 2014 [69] Good Good Good . Good
Ritsner et al. 2004 [74] Poor (Q5)a . Good Good Good
Scarlata et al. 2013 [75] . . Fair (Q2,4)a . Fair (Q2,4)a
Skouteris et al. 2009 [76] Fair (Q3,5)a Fair (Q11)a Fair (Q2)a . Fair (Q2)a
Spira et al. 2011 [77] Poor (Q5)a . Good Good Good
Tomfohr et al. 2013 [78] Excellent . . . Fair (Q4)a
Zohal et al. 2013 [79] . . . . Fair (Q2,4)a
a
Received “fair” or “poor” for one or two parameter (s) only, others parameters were “good” or “excellent”.

A. The process of development of the PSQI the same components derived from PSG data, using t-tests and
PCCs. For test-retest reliability, 91 patients completed the PSQI on
The first publication on the PSQI described pilot testing of the two separate occasions (range: 1e265 d, mean ¼ 28.2 d between
instrument in three groups of subjects over 18 mo [22]. Tested were first and second test). Paired t-tests for the global PSQI score, and
“good” sleepers, featuring 52 healthy persons without sleep com- the seven individual component scores, showed no significant
plaints, as controls; “poor” sleepers, 34 subjects with major differences between time-points and the PCCs also demonstrated
depressive disorder, 10 of whom were inpatients; a third group of stability in global and component scores. The authors reported all
“poor” sleepers, who unlike the previous group, featured 62 clinical seven component scores of the PSQI, subjective sleep quality (SQ),
outpatients referred by physicians to a sleep center for a variety of sleep latency (SL), sleep duration (SD), habitual sleep efficiency
sleep/wake complaints (i.e., difficulty initiating and maintaining (HSE), sleep disturbances (SDI), use of sleep medications (SM), and
sleep (n ¼ 45) and excessive daytime sleepiness (n ¼ 17)). Internal daytime dysfunction (DD), showed overall reliability, as indicated
homogeneity of the index items was assessed by Cronbach's alpha by a Cronbach's alpha of 0.83. The authors concluded that each
and corrected itemetotal correlations [81]. Pearson product- component score measured a particular aspect of the same
moment correlation coefficient (i.e., Pearson correlation coeffi- construct of sleep quality [22].
cient (PCC)) was used to determine correlation of component and
item scores with the PSQI global score. B. Assessment of the sensibility of the PSQI
In the primary analysis of construct validity, the developers
assessed the degree to which the index detected differences be- Sensibility is a term proposed by Feinstein that describes com-
tween groups recognized as clinically distinct (i.e., “good” versus mon sense aspects of an instrument [82], of great importance is the
“poor” sleepers). The relevant “gold standard” diagnoses were definition of the construct to be measured [28,29]. The PSQI de-
based on a combination of clinical interviews, structured in- velopers' construct of “sleep quality” was defined based on clinical
terviews, and polysomnographic (PSG) data. An analysis of covari- judgment alone. The target population, featuring “good” and “poor”
ance was used to compare groups in PSQI global and component sleepers, was not involved in the item generation process to pro-
scores, and the Student-Neuman-Keul's technique was used for vide insight on their understanding of “sleep quality”.
pairwise comparisons. Age and sex were used as covariates, due to The PSQI demonstrates reasonable face validity pertaining to
the age/sex ratio differences between groups. The PSQI's estimates elements, which are phrased in a suitable way. While the four
of sleep latency, duration, and sleep efficiency were compared to response categories may not appear optimal for an index with
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 57

Fig. 2. Flow chart documenting process of article selection for review. Embase (1974-30/01/2013); Medline (1946-wk4/01/2014); Medline in process (unt30/01/2014); PsycINFO
(1806-wk3/1/2014); HAPI(1985-01/2014).

discriminative purposes (i.e., where more concrete response cate- The PSQI is simple to implement in clinical practice. The
gories “yes” or “no” are preferable) [14], there is clinical meaning in completion time requires five to 10 min, and so the index can be
the format chosen. To explain, the index covers a period of one administered to a patient and his/her significant other/caregiver in
month, and the response categories “not during the last month’, the waiting room. Questions one to nine cannot be missing, as
“less than once a week”, “once or twice a week”, “three or more scores will be inaccurate. During the field-testing, ten of the 158
times per week”, are clinically relevant to assess the burden/ participants (i.e., 6.3%) failed to complete the index in its entirety
severity of the components relevant to sleep quality. Moreover, the [22]. Later studies reported omission in the range between 5.1% [46]
response category can be translated into a wide enough numeric and 10% [48], falling within the accepted “acceptable” range (i.e., no
range, which can be important for generation of the subscale score greater than 15% of respondents) [83]. The scoring process can
and the total score for a more sensitive discriminative purpose. possibly be viewed as burdensome, in the way of requiring head-
The content validity of the PSQI is appropriate, and the PSQI work on the part of the scorer for calculating subscale scores to be
seems to cover multiple aspects relevant to the sleep quality summed to obtain the global score. Nevertheless, scoring of the
construct. Lack of input from patients lowers sensibility ratings for PSQI can be performed without a calculator in less than 10 min,
content validity, however. making it suitable for use in clinical practice and research [84].
58 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

The PSQI's specific properties Four studies [48,55,62,78] using factor analysis reported low
factor loading for the SM sub-scale and another reported low factor
The PSQI has been studied in strictly clinical (n ¼ 15) or non- loading for the DD sub-scale [70]. The authors reported that
clinical (n ¼ 13) samples, and both (i.e., featuring non-clinical and improved fit statistics could be obtained after removal of these
clinical groups of participants within same study) (n ¼ 9) items.
(Tables 2e4). The samples represented a variety of clinical disor-
ders, including cancer [35,36,50,72], schizophrenia, and chronic Reliability: internal consistency
fatigue syndrome [63,68], as well as healthy participants of varying Cronbach's alpha coefficient was reported in 12 studies
ages [40,41,49,51], sexes [46e80], and races and ethnicities [49,78]. (Table 2). For all studies but three [62,63,77], these values met the
cut-point for a positive rating for within- and between-group
Reliability comparisons (i.e., 0.70 [85]), ranging from 0.70 [76] to 0.83 [22].
For quality assessment of reviewed studies, we refer the reader No studies reported Cronbach's alpha within the ideal range for use
to Table 1. in individual patients (i.e., 0.9e0.95) [86]. The three studies that
With focus on the discriminative ability of the PSQI, the central reported Cronbach's alphas below 0.70 featured patients with
measurement properties determinative of its applicability are chronic fatigue syndrome (a ¼ 0.64) [63] and non-clinical samples
reliability and construct validity. The reliability domain consists of (a ¼ 0.67 [62] and 0.69 [77]).
internal consistency, reliability and measurement error. Internal
consistency is defined by the COSMIN [30] panel as the degree of Testeretest reliability
interrelatedness among the items, given the PSQI developers' claim See Table 2 for specifics of the PSQI was evaluated in three
of the instrument's unidimensionality. In this case, assessment of studies [22,61,69]. One study [69] reported the intraclass correla-
internal consistency did not follow factor analysis, and thus uni- tion coefficient (ICC), the preferred statistic, another reported PCC
dimensionality of the PSQI was not established. Reliability param- [22], and a third reported both statistics [61]. The study by Rener-
eters assess how well patients can be distinguished from each Sitar et al. featured patients with temporomandibular disorder,
other, while the related but distinct parameter of measurement reporting an ICC of 0.86 [69] for a period of two weeks between
error assesses the magnitude of the error present in the measured test-retest. Buysse et al. reported a PCC of 0.82 for their sample
differences. Both of the aforementioned properties are of great including healthy, depressed and sleep disordered persons, with all
value to clinicians, as they provide information on the proportion of but the depressed group showing no significant differences be-
total variance in the measurements that comprises “true” differ- tween test and retest, a mean period of 28 d [22]. Knutson et al.
ences between patients, commonly assessed by test-retest reli- reported statistics for a test-retest period of one year [61]. The ICC
ability. The COSMIN checklist requires that: 1) the two tests are was 0.81 for the population-based sample of early middle-aged
administered independently; 2) the underlying concept measured adults, with 0.79 and 0.83 for white and black women, respec-
is consistent throughout measurements; and 3) the time between tively, and 0.70 and 0.83 for black and white men, respectively [61].
tests is appropriate. The hypothesis testing construct validity For the same sample, a PCC of 0.68 was reported. All ICC reports met
domain of the COSMIN checklist was determined for the PSQI. the required cut-point for the groups (i.e., 0.70) [88], but not for
Hypothesis testing assesses the degree to which the scores of an individual patients (i.e., >0.90) [85]. See Table 1 for specifics.
instrument are consistent with the assumption that the instrument
validly measures the construct intended for measurement. Construct validity
Convergent, divergent, and known-group validity were assessed for
the PSQI. The COSMIN checklist assesses presence of a priori hy- Tables 3 and 4 summarize the evidence for convergent and
pothesis about the expected direction and magnitude between the divergent construct validity, and known-group validity, respec-
compared instruments, and an adequate description of the prop- tively. Absolute value correlations above 0.70 were considered
erties of the comparator tool [30]. strong, between 0.3 and 0.7 e moderate and less than 0.3 e weak
[89]. These are equivalent, in variance terms, to shared variance of
Factor structure >50%, 10e50%, and <10%, respectively [90].
Several studies examining unidimensionality of the PSQI raised
concerns over the factor structure of the instrument (Table 5). In The convergent construct validity
studies using factor analysis, eight [48,54,55,62,63,70,72,78] out of Studies found the PSQI was of value in screening for insomnia
eleven studies reported that a single-factor model fit the data according to the fourth edition of the Diagnostic and Statistical
poorly, and the PSQI is better represented in a two- or three-factor Manual of Mental Disorders (DSM-IV) [19], and the International
model. Different factor structure models were proposed. For Classification of Sleep Disorders (ICSD-2), revised criteria [91], with
example, Cole et al. proposed that the seven PSQI sub-scales are the best cut-off for students at five, with sensitivity and specificity
best represented by three factors e sleep efficiency, including the SD at 72% and 55%, respectively [46]; greater than six for patients with
and HSE PSQI sub-scales, perceived sleep quality, including SQ, SL, low back pain, with sensitivity of 100% and specificity of 49% [47];
and SM sub-scales, and daily disturbance, featuring the SDI and DD and greater than or equal to eight for TBI patients, with sensitivity
sub-scales of the PSQI [55]. At the same time, Cole et al. also re- of 100% and specificity of 83% [80].
ported a two-factor model fit for the data: sleep efficiency with A strong association was uncovered between the PSQI total
strong loading from HSE (i.e., 0.89) and SD (i.e., 0.60) and perceived score and the insomnia severity index (ISI) total score (r ¼ 0.80)
sleep quality, with strong loading from daytime sleepiness (i.e., [66], sleep problems from symptom experience reports
0.55) and SQ (i.e., 0.77), while SM showed poor loading on both (r ¼ 0.72e0.77) [53], short form-36 health survey vitality score
factors [55]. While these models were both later replicated by (r ¼ 0.74e0.77) [53,70], sleep restlessness score (r ¼ 0.72e0.77)
others, using exploratory and confirmatory factor analyses (EFAs [53], and sleep efficiency score from the sleep diary (r ¼ 0.76)
and CFAs, respectively), still others proposed their own two- or- [58]. A moderate association was found between the PSQI and
three factors models [72,76], endorsing the multidimensionality of disability scores (r ¼ 0.31e0.58) [49,67], depression (r ¼ 0.50) [53],
the PSQI. This again was supported by Aloba et al.'s findings uti- tension/anxiety (r ¼ 0.36e0.62) [53], and confusion (r ¼ 0.45e0.46)
lizing principal component analysis [46]. [53]. There was evidence for associations between the PSQI and PSG
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 59

Table 2
Reported measurements properties of PSQI: internal consistency and testeretest reliability.

Study Population (n) Reliability

Internal consistency Testeretest

Babson et al. US veterans w/severe PTSD (n ¼ 226) Cronbach's alphas NA


2012 [48] PSQI global score:
 using individual items (a ¼ 0.78; a ¼ 0.79);
 using component scales (a ¼ 0.67; a ¼ 0.68 w/
SM removed;
SDI subscale (a ¼ 0.72)
Beaudreau et al. Community-dwelling black (n ¼ 306) Cronbach's alphas NA
2012 [49] and white (n ¼ 2659) women 65 years of age PSQI global score seven subscales as a unit for older:
 white women (a ¼ 0.72);
 black women (a ¼ 0.74)
The subscaleetotal correlation: from 0.24 to 0.62
(i.e., <0.30 for the SM and DD)
Beck et al. 1) Oncology patients receiving care (n ¼ 214) 1) Component score-global score r range ¼ 0.38e0.64; NA
2004 [50] 2) Secondary analysis from RCT for management Cronbach's a ¼ 0.81
of cancer-related fatigue (n ¼ 259) 2) Component score-global score r range ¼ 0.32e0.63;
Cronbach's a ¼ 0.77
SM and DD lowest in both 1) and 2) (0.39 and 0.40, and
0.32 and 0.37 for SM and DD, respectively)
Buysse et al. 1) Healthy (n ¼ 52) Cronbach's alphas Paired t test T1 & T2
1989 [22] 2) Depressed (n ¼ 34)  PSQI global score seven subscales as a unit (a ¼ 0.83); Pearson product-moment r
3) Sleep-disordered: DIMS (n ¼ 45) and Component score-global score r range ¼ 0.35e0.76; PSQI global score and the seven
DOES (n ¼ 17) Mean component-global score r ¼ 0.58 components: difference
Individual items strong r between T1
& T2 ¼ ns, except for depressed
patients in SDI (t ¼ 2.32,
p ¼ 0.03) and DD (t ¼ 3.46,
p ¼ 0.002) at T2
Pearson product-moment:
stable in global and component
scores: r ¼ 0.82, p < 0.001
Bush et al. Older adults w GAD, primary care (n ¼ 134) Cronbach's alphas NA
2012 [51]  PSQI global score seven subscales as a unit (a ¼ 0.80);
Component score-global score r range ¼ 0.53e0.76;
Inter-item between component scores r range ¼ 0.1e0.56
Carpenter & 1) BMT (n ¼ 155) Cronbach's alphas NA
Andrykowski 2) RT (n ¼ 56)  PSQI global score seven subscales as a unit (a ¼ 0.80);
1998 [53] 3) Women w/ BC (n ¼ 102) 8-item SDI component (a range ¼ 0.70e0.78)
4) Women w/ BBP (n ¼ 159) Homogeneity by Pearson's r btw PSQI global score and:
 component scores of SQ: (r range ¼ 0.79e0.83); DD
(r range ¼ 0.53e0.60); SDI (r range ¼ 0.42e0.58)
Grandner et al. Non-clinical sample (N ¼ 120): Homogeneity by Spearman's rho btw PSQI global score NA
2006 [58] 1) Younger adults (n ¼ 53) and:
2) Older adults (n ¼ 59)  all component scores (p ¼ 0.0005), except
 use of SM (younger group rho ¼ 0.243, NS; older
group rho ¼ 0.178, NS)
 SDI (older group rho ¼ 0.372, NS)
 SL(older group rho ¼ 0.804, NS)
Knutson et al. A population-based sample early mid-aged NA T1 & T2 (Year1 & 2)
2006 [61] adults (N ¼ 610): ICC T1 & T2
1) Black men (n ¼ 90) & women (n ¼ 167) Pearson's r (95%CI) T1 & T2
2) White men (n ¼ 166) & women (n ¼ 187) ICC:
Black men: 0.70 (0.55 & 0.80)
Black women:0.83 (0.77&0.88)
White men:0.80 (0.73&0.85)
White women:0.79 (0.72&0.84)
All: 0.81 (0.78 & 0.84)
Pearson's r:
Black men: 0.54 (0.38&0.67)
Black women: 0.72 (0.63&0.78)
White men: 0.67 (0.57&0.74)
White women: 0.66 (0.57&0.73)
All: 0.68 (0.63& 0.72)
Magee et al. Adults (n ¼ 364) Cronbach's alphas NA
2008 [62]  PSQI global score seven subscales as a unit (a ¼ 0.67);
 SM removal (a ¼ 0.70)
Component score-global score r range ¼ 0.34e0.56,
except SM r ¼ 0.13
Btw individual component scores rs < 0.43
Mariman et al. Patients w/CFS Cronbach's alphas NA
2012 [63]  PSQI global score seven subscales as a unit (a ¼ 0.64);
Homogeneity by Spearman's rho
 SD and HSE rho ¼ 0.71
 Several rs ¼ NS
(continued on next page)
60 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

Table 2 (continued )

Study Population (n) Reliability

Internal consistency Testeretest

Nicassio et al. Patients w/RA Cronbach's alphas NA


2014 [70]  PSQI global score seven subscales as a unit (a ¼ 0.73);
SM removal a ¼ 0.75);
Homogeneity by Pearson's r
Component score-global score r  0.40, except SM
r ¼ 0.29, excluded
Rener-Sitar et al. Patients w/TMD Cronbach's alphas ICC T1 & T2
2014 [69]  pain-related TMD: a ¼ 0.75; 0.72; Median: 0.86 (range ¼ 0.57
 pain-free TMD: a ¼ 0.66; 0.58; e0.91)
 cases w/test-retest data: a ¼ 0.70; 0.63 Median diff btw T1 & T2
Removal of SM in pain-free TMD: a ¼ 0.73 (range ¼ 0.08, 0.09, ns)
Homogeneity by Pearson's r Limits of agreement:
Inter-item correlation r range ¼ 0.22e0.30 range ¼ 1.28e1.36 for all
components, except for HSE:
range ¼ 2.50e2.35
Skouteris et al. Pregnant women (n ¼ 252) Cronbach's alphas ANOVA btw T1 & T2 (18.3 ± 1.61
2009 [76] At T1 & T2: a ¼ 0.70; 0.76 & 34.6 ± 1.7 wks) of gestation
Removal of SM at T1& T2: a ¼ 0.72; 0.78
Item-total scores for all components r range ¼ 0.30
e0.72
Spira et al. Older men (n ¼ 3059) Cronbach's alphas NA
2011 [77]  PSQI global score seven subscales as a unit (a ¼ 0.69);
SM & DD removal a ¼ 0.72);
Component score-global score r range ¼ 0.36e0.57,
except SM r ¼ 0.28, DD ¼ 0.25
Tomfohr et al. Community-dwelling adults (N ¼ 2352): Homogeneity by Pearson's r w/SM removed NA
2013 [78] English (n ¼ 654) Cronbach's alphas
Non-Hispanic whites (n ¼ 1698)  English: a ¼ 0.741;
 Non-Hispanic whites: a ¼ 0.775
Factor loading from six subscales range ¼ 0.470e0.768

Abbreviations: BBP, benign breast problems; BC, breast cancer; BMT, bone marrow transplant; CPS, chronic fatigue syndrome; DD, daytime dysfunction; DIMS, disorders
of initiating and maintaining sleep; DOES, disorders of excessive somnolence; GAD, generalized anxiety disorder; HSE, habitual sleep efficiency; ICC, interclass correlation
coefficient; NA, not applicable; ns, not significant; PSQI, Pittsburgh sleep quality index; PTSD, posttraumatic stress disorder; r(s), correlation(s); RA, rheumatoid arthritis;
RCT, randomized controlled trial; RT, renal transplant; SD, sleep duration; SDI, sleep disturbance; SEM, structural equation modeling; SL, sleep latency; SM, sleep
medications; SQ, sleep quality; TMD, temporomandibular disorder; w/, with.

data variables. Studies reported non-significant or low correlation diagnostic criteria by the DSM-IV or by the ICSD-2 from those who
coefficients ranging from 0.11 to 0.3 for the apnea hypopnea index do not [46,47,80]. To optimize the sensitivity-specificity balance,
[49,57] and 0.21 for the number of oxygen desaturation events [57]. the cut-off scores were as follows: five for students (sensitivity 72%,
Moderate associations were reported between the PSQI global specificity 54.5%) [46]; >6 for middle age adults with low back pain
score and PSG sleep maintenance (rho ¼ 0.33), sleep efficiency (sensitivity ¼ 100%, specificity ¼ 49%) [47]; 8 in post-acute adults
(rho ¼ 0.34), and microarousal index in younger (rho ¼ 0.39), but with TBI of varying severities (sensitivity 83%, specificity ¼ 100%)
not in older healthy subjects [73]. The reported associations be- [80]. Scarlata et al. reported that the PSQI global score 5 was able
tween the PSQI and variables derived from actigraphy data were to differentiate patients with obstructive sleep apnea syndrome
variable, with only some researchers reporting significant findings. diagnoses in a consecutive sample of adults at high risk for the
We refer the reader to Table 4 for specifics. syndrome (sensitivity ¼ 69.7%, specificity ¼ 31%) [75].

The divergent construct validity Sex differences


Divergent validity of the PSQI was evidenced in the minimal Five studies [52,53,61,65,73] featured in this review provided
associations with psychosocial constructs (i.e., perceived social sex-stratified results pertaining to the PSQI. Buysse et al. [73] found
support, r ¼ 0.14) [51], and spasticity, bladder dysfunction [67], no significant sex differences in their evaluation of healthy young
and psychopathology [74], all of which were not significant. We and elderly men and women using the PSQI. Carpenter & Andry-
refer the reader to Table 4 for specifics. kowski [53], in their samples of bone marrow and renal transplant
recipients, found no significant sex differences in global or subscale
Known-group validity PSQI scores, but all participants fell above the cut-off designating
The evidence for known-group validity was strong (Table 3). “poor quality sleep” (i.e., >5). Knutson et al. [61], in their race-sex
Studies that examined differences in PSQI global score between comparison within the general population, also reported no signif-
healthy subjects and patients suffering from a variety of disorders icant differences between white women and men, and black women
known to be associated with poor sleep, showed significant dif- and men. They did however find a significant difference in the scores
ferences between groups [9,22,45,49,51,56,60,68,71,79]. Studies of white women and black men when first measured, a difference
that examined differences within groups of people (i.e., race, age, that was no longer present when measured one year later.
sex, different symptom clusters within the same population, etc.) Buysse et al. [52], in a community sample, performed cluster
showed non-significant differences [49,50,53,58,61,64,67,73]. analysis to obtain subgroups based on PSQI and Epworth sleepiness
scale (ESS) scores (i.e., low PSQI/ESS; low PSQI/high ESS; high PSQI/
Cut-off scores low ESS; high PSQI/ESS). Results showed significant sex differences
Three studies conducted receiver operator analyses to deter- for all subgroups. Mondal et al. [65] found sex differences between
mine how well the PSQI distinguishes persons meeting insomnia similarly characterized PSQI and ESS score subgroups in their sample
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 61

Table 3
Reported measurements properties of PSQI: known-group validity.

Study Group Group Group Group P-value


Subgroups Subgroups Subgroups Subgroups
N N N N
Global score (mean Global score (mean Global score (mean Global score (mean
(SD)/median (IQR)) (SD)/median (IQR)) (SD)/median (IQR)) (SD)/median (IQR))

Aloba et al., 2007 [45] University students University students NA NA <0.001


w/insomnia (DSM-IV) w/o insomnia (DSM-IV):
25 495
6.92 (3.64) 4.30 (2.55)
Beaudreau et al., 2012 [49] Race Education Age Self-reported sleep Age: NS
Black women <High school 70e79 disturbance Education:
306 512 244 No reported diagnosis Some college vs < high
6.6 (3.9) 6.7 (3.8) 6.6 (3.9) 2688 school: <0.05
White women High school 80e84 6.0 (3.4) College vs < high school,
2662 1244 1724 Insomnia high school: <0.05
6.3 (3.6) 6.4 (3.6) 6.3 (3.6) 108 Race: NS
Some college 85e89 12.1 (3.5) Sleep disorder diagnosis:
634 819 Restless legs Insomnia, restless legs,
6.3 (3.7) 6.4 (3.7) 129 sleep apnea vs no reported
College 90þ 9.1 (3.8) diagnosis: <0.001
254 181 Sleep apnea
5.9 (3.5) 6.6 (3.7) 26
Graduate education 8.5 (3.9)
322
6.0 (3.5)
Beck et al., 2004 [50] Study 1 Study 2 NA NA Study 1
Low fatigue cancer patients Low fatigue Low fatigue vs high
104 cancer patients fatigue: <0.001
6.82 (4.57) 214 Study 2
High fatigue cancer patients 6.88 (3.77) Low fatigue vs high
56 High fatigue fatigue: <0.001
10.20 (4.28) cancer patients
42
9.50 (4.52)
Bush et al., 2012 [51] (Co-)principal GAD* Other anxiety Other-no anxiety No diagnosis GAD vs no diagnosis: 0.04
134 29 19 34 All others: NS
8.74 (4.05) 7.86 (3.67) 7.68 (4.11) 6.65 (3.70)
Buysse et al., 1989 [22] Controls Depressives DIMS DOES Controls vs depressives,
52 34 45 17 DIMS, DOES; DOES vs
2.67 (1.70) 11.09 (4.31) 10.38 (4.57) 6.53 (2.98) depressives, DIMS: 0.001
Buysse et al., 1991 [73] Young Elderly NA NA Age: 0.0003
Men Men Gender: NS
23 20 Age-gender: NS
3.1 (1.6) 4.4 (2.8)
Women Women
12 24
1.9 (1.4) 5.1 (3.2)
Carpenter & Bone marrow transplant Renal transplant Breast cancer Benign breast Sex: NS
Andrykowsky Men Men Women problems
1998 [53] 98 26 102 Women
5.4 (3.6) 7.3 (4.3) 7.0 (4.4) 159
Women Women 6.4 (4.2)
57 30
6.0 (4.3) 7.9 (4.5)
Elsenbruch et al., 1999 [56] IBS Controls NA NA <0.001
15 15
7.5 (1.0) 3.3 (0.3)
Grandner et al., 2006 [58] Younger non-clinical Older non-clinical NA NA NS
sample sample
53 59
4.07 (3.00) 3.92 (5.00)
Hall et al., 2009 [59] Caucasian African American Chinese NA African American vs
171 138 59 Caucasian: <0.001
5.96 (2.06) 7.42 (2.63) 6.27 (2.25) All others: NS
Hancock & Larner 2009 [60] Dementia diagnosis Non-dementia NA NA <0.001
(DSM-IV) 155
155 7.6 (5.1)
5.1 (4.2)
Knutson et al., 2006 [61] Year 1 Year 2 NA NA Year 1:
White women White women White women vs white men: NS
187 187 Black women vs black men: NS
5.1 (2.8) 5.5 (3.1) White women vs black men: sign
(continued on next page)
62 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

Table 3 (continued )

Study Group Group Group Group P-value


Subgroups Subgroups Subgroups Subgroups
N N N N
Global score (mean Global score (mean Global score (mean Global score (mean
(SD)/median (IQR)) (SD)/median (IQR)) (SD)/median (IQR)) (SD)/median (IQR))

White men White men Black women vs white women: NS


166 166 Black women vs white men: NS
5.0 (2.3) 5.3 (2.6) Black men vs
Black women Black women white men: NS
167 167 Year 2:
6.9 (3.8) 6.7 (3.6) White women vs
Black men Black men white men: NS
90 90 Black women vs
6.1 (3.0) 5.9 (2.8) black men: NS
Black women vs
white women: NS
Black women vs
white men: NS
Black men vs
white men: NS
White women vs
black men: NS
Masel et al., 2001 [64] Non-hypersomnolent Posttraumatic Hypersomnolent NA All groups: NS
subjects: hypersomnia subjects with an
38 subjects: abnormal AHI
5.0 (2.8) 21 or PLMI:
5.0 (3.8) 12
5.8 (4.2)
Neau et al., 2001 [67] MS patients MS patients NA NA NS
w/fatigue w/EDS: w/fatigue w/o EDS:
8 17
10.8 (4.6) 6.2 (3.4)
Neu et al., 2007 [68] CFS: Controls: NA NA 0.001
28 12
9.54 (4.0) 2.42 (1.2)
Osorio et al., 2006 [71] Fibromyalgia patients: Controls: NA NA <0.001
30 30
12 (6) 3.0 (3.0)
Parcell et al., 2008 [9] TBI: Controls: NA NA Unadjusted: <0.01
Unadjusted: Unadjusted: Adj. for anxiety: 0.02
10 10 Adj. for depression: 0.01
11.33 (1.53) 3.90 (0.62)
Adj. for anxiety: Adj. for anxiety:
10 10
10.13 (1.27) 4.99 (1.19)
Adj. for depression: Adj. for depression:
10 10
10.80 (1.39) 4.38 (1.30)
Rener-Sitar et al., 2014 [69] Pain-related TMD: Pain-free TMD: NA NA <0.05
496 113
7.1 (4.0) 5.1 (3.1)
Zohal et al., 2013 [79] COPD patients: Controls: NA NA 0.02
120 120
8.03 (3.66) 4.2 (2.8)

Abbreviations: AHI, apnea-hypopnea index; BBP, benign breast problems; BC, breast cancer; BMT, bone marrow transplant; COPD, chronic obstructive pulmonary disease;
DIMS, disorders of initiating and maintaining sleep; DOES, disorders of excessive somnolence; DSM-IV, diagnostic and statistical manual of mental disorders, 4th edition; EDS,
Ehlers Danlos syndrome; GAD, generalized anxiety disorder; HAI, hypersomnolent with abnormalities; IQR, interquartile range; NA, not applicable; NH, non-hypersomnolent;
NS, not significant; PLMI, periodic leg movement index; PSG, polysomnography; PSQI, Pittsburgh sleep quality index; PTH, posttraumatic hypersomnia; RT, renal transplant;
TBI, traumatic brain injury; TMD, temporomandibular disorder.
*
principal: GAD diagnosis most severe diagnosis; co-principal: GAD diagnosis is one of the two most severe diagnoses

of patients at risk for disorders (i.e., PSQI “good quality sleep” (i.e., scores for both groups. The clinical samples combined equated
5) and ESS “normal score” (i.e., 10); PSQI “good quality sleep” and to 801 individuals, and the non-clinical samples comprised 3433
ESS “sleepy” (i.e., >10); PSQI “poor quality sleep” (i.e., >5) and ESS persons. The results revealed a significantly higher mean global
“normal score”; PSQI “poor quality sleep” (i.e., >5) and ESS “sleepy”). PSQI score in clinical versus non-clinical persons, utilizing a
random effect model (WMD ¼ 4.74; 95% CI 3.43e6.06,
Meta-analysis of known-group validity p < 0.0001; I2 93%) (Fig. 3).
The meta-analytic component of this review was performed Six of the aforementioned studies also provided subscale scores
in accordance with the Cochrane Handbook for Systematic [22,46,51,56,69,79]. The clinical samples combined comprised 538
Reviews [32] with the purpose of identifying possible differ- individuals, and the non-clinical samples e 745 individuals. Anal-
ences in PSQI global and subscale scores between clinical and ysis revealed statistically significant differences in scores for all
non-clinical samples (Tables S2a and S2b for raw data). Seven sleep quality: WMD ¼ 0.74 (95% CI 0.34e1.14), sleep duration:
studies [22,46,49,51,56,68,79], six of fair quality WMD ¼ 0.48 (95% CI 0.21e0.74), habitual sleep latency:
[22,46,49,51,56,79] and one of good quality [68], provided global WMD ¼ 0.60 (95% CI 0.13e1.07), sleep medications: WMD ¼ 0.51
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 63

Table 4
Reported measurement properties of the PSQI: hypothesis testing (validity).

Study Sample Convergent (C)/divergent (D) Results


validity

Aloba et al., 2007 [45] Nigerian university C: PSQI vs.:


students w/and w/o  Insomnia by DSM-IV Insomnia by DSM-IV: ROC curve analysis: AUC ¼ 0.685 (95% CI 0.565
insomnia diagnosis by  GHQ-12 e0.805); best cut-off for insomnia is five (sensitivity ¼ 0.720,
DSM-IV (n ¼ 520) D: e specificity ¼ 0.545, efficiency ¼ 0.554, LR þ ve ¼ 1.565, LR eve ¼ 0.514);
GHQ-12: r ¼ 0.252, p < 0.001
Alsaadi et al., 2013 [47] Patients with lower- C: PSQI vs.:
back pain (n ¼ 79)  ISI Insomnia (sleep diary): AUC ¼ 0.79 (95% CI 0.68e0.87); best cut-off score
 RMDQ sleep item for insomnia >6 (sensitivity ¼ 100% (83e100), specificity ¼ 49% (36e63),
 ESS LR þve ¼ 1.9 (1.5e2.5), LR ve ¼ 0)
 Sleep diary (insomnia ISS: diffs between areas ¼ 0.01 (95% CI 0.07 to 0.09, z statistics ¼ 0.18,
by ICSD-2) p ¼ 0.85)
D: e RMDQ sleep item: diffs between areas ¼ 0.14 (95% CI 0.01e0.28, z
statistics ¼ 2.11, p ¼ 0.03)
ESS: diffs between areas ¼ 0.26 (95% CI 0.09e0.41, z statistics ¼ 3.09,
p ¼ 0.002)
Beaudreau et al., Black (n ¼ 306) and C: Correlations (Spearman's r):
2012 [49] white (n ¼ 2662)  ESS PSQI vs.:
women 70 yoa  Actigraphy (DI, TSI, WASO) ESS: r ¼ 0.11, p < 0.001; DI: r ¼ 0.05, p ¼ 0.01; TSI: NS; WASO: r ¼ 0.14,
 Trails B p < 0.001; Trails B: r ¼ 0.05, p ¼ 0.007; AHI: NS; GDS: r ¼ 0.31, p < 0.001;
 AHI Mobility/IADL: r ¼ 0.18, p < 0.001
 GDS
 Mobility/IADL
D: e
Bush et al., 2012 [51] Primary care (co) C: Correlation/p value order: PSWQ, BDI-II, MSPSS, LOT)
principal GAD patients  PSWQ PSQI global score: r ¼ 0.25, p < 0.01, r ¼ 0.50, p < 0.001, NS, r ¼ 0.26,
60 yoa (n ¼ 134)  BDI-II p < 0.01; PSQI SQ: r ¼ 0.25, p < 0.01, r ¼ 0.44, p < 0.001, NS, NS; PSQI SL: all
D: NS; PSQI SD: r ¼ 0.19, p < 0.05, r ¼ 0.38, p < 0.001, NS, NS; PSQI HSE:
 MSPSS r ¼ 0.19, p < 0.05, r ¼ 0.28, p < 0.01, NS, NS; PSQI SDI: r ¼ 0.28, p < 0.01,
 LOT r ¼ 0.28, p < 0.01, NS, NS; PSQI SM: NS, r ¼ 0.22, p < 0.05, NS, r ¼ 0.24,
p < 0.01; PSQI DD: r ¼ 0.22, p < 0.05, r-0.53, p < 0.001, r ¼ 0.24, p < 0.01,
r ¼ 0.24, p < 0.001
Buysse et al., 1991 [73] Healthy subjects C: Correlations (Spearman's r):
80 yoa (n ¼ 44)  PSG (sleep latency) PSQI global score/subscales scores vs.:
Healthy subjects 20  PSG (sleep efficiency) PSG data: NS, except
e30 yoa (n ¼ 35)  PSG (time spent asleep) Younger subjects:
 PSG (%delta) sleep maintenance: rho ¼ 0.36, p < 0.04
 PSG (%REM) PSQI SQ vs PSG SE: rho ¼ 0.34, p < 0.04
 PSG (MAI) sleep maintenance: rho ¼ 0.48, p < 0.003
 HRSD delta%: rho ¼ 0.46, p < 0.005
D: MAI: rho ¼ 0.39p ¼ 0.02
 CTQ (vigor) CTQ Vigor: rho ¼ 0.43,p < 0.02
 Maudsley neuroticism Maudsley neuroticism: rho ¼ 0.38,p < 0.03
Elderly subjects:
HRSD score: rho ¼ 0.48,p < 0.002
PSQI SQ score vs HRSD score: rho ¼ 0.42,p < 0.006
Buysse et al., 2008 [52] Communityedwelling C: PSQI global score vs ESS total score: r ¼ 0.16/0.03
adults (n ¼ 187)  ESS PSQI DD vs ESS total score: r ¼ 0.34/<0.001¼>when component deleted:
D: e r ¼ 0.10/0.16
Carpenter &  Bone marrow C: Correlation/p value order: BMT, Renal, BC, BBP
Andrykowski, transplant patients  SER (sleep problems) PSQI global score vs.:
1998 [53] (BMT) (n ¼ 155)  CES-D (sleep restlessness) SER subscales (past week): sleep problems: r ¼ 0.72, 0.74, 0.65, 0.77, all
 Renal transplant  SEAS (energy) p < 0.001; feeling tired: r ¼ 0.54, 0.67, 0.44, 0.49, all p < 0.001; weakness:
patients (Renal)  SF-36 (vitality) r ¼ 0.52, 0.43, 0.43, 0.52, p < 0.001; nausea: r ¼ 0.17, p < 0.05, r ¼ 0.37,
(n ¼ 56)  POMS (vigor) p < 0.01, r ¼ 0.36, p < 0.001, r ¼ 0.27, p < 0.001; vomiting: NS, NS, r ¼ 0.26,
 Women w/breast  FLIC total p < 0.01, r ¼ 0.04, p < 0.001; change in taste: NS, NS, r ¼ 0.34, p > 0.001,
cancer (BC) (n ¼ 102) D: r ¼ 0.28, p < 0.001
 Women w/benign  SER (nausea) POMS subscales: total mood disturbance: r ¼ 0.59, 0.47, 0.53, 0.62, all
breast problems  SER (change in taste) p < 0.001; tension/anxiety: r ¼ 0.60, p < 0.001, r ¼ 0.36, p < 0.01, r ¼ 0.50,
(BBP) (n ¼ 159)  SER (vomiting) p < 0.001, r ¼ 0.62, p < 0.001; fatigue/inertia: r ¼ 0.58, 0.54, 0.48, 0.53, all
p < 0.001; depression/dejection: r ¼ 0.48, 0.42, 0.49, 0.58, all p < 0.001;
confusion/bewilderment: r ¼ 0.45, p < 0.001, r ¼ 0.32, p < 0.05, r ¼ 0.50,
p < 0.001, r ¼ 0.46, p < 0.001; vigor/activity: r ¼ 0.45, 0.56, 0.39, 0.45,
all p < 0.001; anger/hostility: r ¼ 0.36, p < 0.001, NS, r ¼ 0.45, p < 0.001,
r ¼ 0.45, p < 0.001
FLIC total (correlation/p value order: BMT, Renal): r ¼ 0.63, 0.57, both
p < 0.001
SEAS subscales (past week) (correlation/p value order: BMT, Renal):
energy: r ¼ 0.54, 0.52, both p < 0.001; sleep quality: r ¼ 0.75, 0.76,
both p < 0.001; appetite: r ¼ 0.37, p < 0.001, r ¼ 0.27, p < 0.05
SF-36 vitality subscale (correlation/p value order: BC, BPP):
r ¼ 0.53, 0.59, both p < 0.001
CES-D (past week) (correlation/p value order: BC, BPP): sleep restlessness:
r ¼ 0.69, 0.75, all p < 0.001; total: r ¼ 0.50, 0.65, all p < 0.001
(continued on next page)
64 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

Table 4 (continued )

Study Sample Convergent (C)/divergent (D) Results


validity

Casement et al., 2012 [54] Women w/PTSD related C: Correlation/p value order: CAPS, BDI, STAXI trait anger, PILL
to sexual or physical  CAPS PSQI global score: r ¼ 0.48, p < 0.05, r ¼ 0.41, p < 0.05, NS, r ¼ 0.43, p < 0.05
assault (n ¼ 319)  BDI PSQI daily disturbances: r ¼ 0.37, p < 0.05, r ¼ 0.47, p < 0.05, NS, r ¼ 0.53,
 PILL p < 0.05
PSQI perceived sleep quality: r ¼ 0.40, p < 0.05, r ¼ 0.34, p < 0.05, NS,
D: r ¼ 0.34, p < 0.05
PSQI sleep efficiency: r ¼ 0.36, p < 0.05, r ¼ 0.22, p < 0.05, NS, r ¼ 0.20,
 STAXI trait anger p < 0.05
Gliklich et al., 2014 [57] Consecutive patients C: Correlation/p value order: PSQI SQ, SL, SD, HSE, SDI, SM, DD, global
w/sleep disturbances  PSG (RDI,#O2 desaturations RDI: r ¼ 0.109, p ¼ 0.024, NS, NS, NS, r ¼ 0.143, p ¼ 0.005, NS, NS, r ¼ 0.114,
requiring PSG (n ¼ 435) <85%) p ¼ 0.048
D: e #O2 desaturations <85%: r ¼ 0.143, p ¼ 0.004, NS, r ¼ 0.149, p ¼ 0.002,
r ¼ 0.015, p ¼ 0.015, r ¼ 0.161, p ¼ 0.002, NS, r ¼ 0.099, p ¼ 0.049, r ¼ 0.214,
p ¼ 0.0001, r ¼ 0.190, p ¼ 0.0001, r ¼ 0.206, p ¼ 0.0003
Grandner et al., 2006 [58] Non-clinical younger C: Correlation/p value order: PSQI global, SQ, SL, SD, HSE, SDI, SM, DD
(n ¼ 53) and older  Actigraphy (sleep efficiency, total Younger group:
(n ¼ 59) adults sleep time, WASO, sleep latency) Actigraphy: sleep efficiency: all NS; total sleep time: NS, NS, r ¼ 0.303,
 Sleep diary (sleep efficiency, total p < 0.05, r ¼ 0.275, p < 0.05, NS, NS, NS, NS; WASO: all NS; sleep latency:
sleep time, WASO, sleep latency) all NS
 CESD Sleep diary: sleep efficiency: NS, NS, NS, NS, NS, NS, NS, r ¼ 0.322,
D: e p < 0.01; total sleep time: NS, NS, NS, r ¼ 0.331, p < 0.001, NS, NS, NS, NS;
WASO: all NS; sleep latency: r ¼ 0.349, p < 0.01, NS, r ¼ 0.432, p < 0.01, NS,
NS, r ¼ 0.367, p < 0.01, NS, NS
CESD total: all NS
Older group:
Actigraphy: sleep efficiency: all NS; total sleep time: all NS; WASO: all NS;
sleep latency: all NS
Sleep diary: sleep efficiency: r ¼ 0.764, r ¼ 0.707, r ¼ 0.644,
r ¼ 0.631, r ¼ 0.789, all p < 0.01, NS, NS, r ¼ 0.430, p < 0.01; total sleep
time: r ¼ 0.548, r ¼ 0.581, r ¼ 0.417, r ¼ 0.581, r ¼ 0.642, all
p < 0.01, NS, NS, r ¼ 0.329, p < 0.01; WASO: r ¼ 0.561, r ¼ 0.542, r ¼ 0.555,
r ¼ 0.530, r ¼ 0.611, all p < 0.01, NS, NS, NS; sleep latency: r ¼ 0.557,
p < 0.01, r ¼ 0.511, p < 0.01, r ¼ 0.560, p < 0.01, r ¼ 0.464, p < 0.01, r ¼ 0.495,
p < 0.01, NS, NS, r ¼ 0.312, p < 0.05
CESD total: r ¼ 0.364, p < 0.01, r ¼ 0.281, p < 0.05, r ¼ 0.396, p < 0.01,
r ¼ 0.331, p < 0.05, NS, NS, NS, r ¼ 0.421, p < 0.01
Combined:
Actigraphy: sleep efficiency: all NS, total sleep time: NS, NS, r ¼ 0.275,
p < 0.01, r ¼ 0.204, p < 0.05, NS, NS, NS, NS; WASO: all NS, sleep latency: all
NS
Sleep diary: sleep efficiency: r ¼ 0.562, r ¼ 0.432, r ¼ 0.378,
r ¼ 0.473, r ¼ 0.563, all p < 0.01, NS, r ¼ 0.199, p < 0.05, r ¼ 0.307,
p < 0.01; total sleep time: r ¼ 0.307, p < 0.01, r ¼ 0.240, p < 0.05, NS.
r ¼ 0.454, p < 0.01, r ¼ 0.411, p < 0.01, NS, NS, NS; WASO: r ¼ 0.262,
p < 0.01, r ¼ 0.210, p < 0.05, r ¼ 0.241, p < 0.05, r ¼ 0.235, p < 0.05, r ¼ 0.260,
p < 0.01, NS, NS, NS; sleep latency: r ¼ 0.480, r ¼ 0.354, r ¼ 0.488,
r ¼ 0.0.275, all p < 0.01, r ¼ 0.242, p < 0.05, NS, NS, r ¼ 0.206, p < 0.05
CESD total: r ¼ 0.305, p < 0.01, NS, r ¼ 0.193, p < 0.05, r ¼ 0.205, p < 0.05, NS,
NS, NS, r ¼ 0.317, p < 0.01
Hancock & Patients w/dementia C: ROC curve analysis: AUC ¼ 0.64, (95% CI 0.58e0.70); best cut-off for
Larner, 2009 [60] diagnosis by DSM-IV  Dementia by DSM-IV dementia 8 (sensitivity ¼ 0.79 (0.73e0.86), specificity ¼ 0.41 (0.33e0.48),
(n ¼ 155) and w/o D: e LR þve ¼ 1.33 (1.15e1.56), LR ve ¼ 0.51 (0.44e0.59), test accuracy ¼ 0.60
(n ¼ 155) (0.56e0.64))
Masel et al., 2001 [64] Adults (N ¼ 71) w/brain C: Correlations (Spearman's rho):
injuries: NH (n ¼ 38);  MSLT (sleep latency) PSQI vs.:
PTH (n ¼ 2); HAI D: e MSLT (sleep latency): NS
(n ¼ 12)
Mondal et al., 2013 [65] Patients (n ¼ 236) C: Correlations (Spearman's rho):
undergoing overnight  ESS PSQI vs.:
PSG D: e ESS: r ¼ 0.13, p ¼ 0.5
Morin et al., 2011 [66] Community sample C: Correlations (Pearson's r):
(n ¼ 959)  ISS PSQI vs.:
D: e ISS: r ¼ 0.80, p < 0.5
Neau et al., 2012 [67] Patients w MS (n ¼ 205) C: Correlation ManneWhitney/p value order: depression (ESDD), HADS,
 Depression (EDSS) FMGPQ, spasticity (Ashworth), bladder dysfunction (ESDD) vs
 HADS PSQI global score: NR/p < 0001, NR/p < 0001, NR/p < 0001, NR/NS, NR/NS
 FMGPQ

D:

 Spasticity (Ashworth)
 Bladder dysfunction (EDSS)
Neu et al., 2007 [68] Patients w CFS (n ¼ 28) C: Correlations (ManneWhitney's r):
 PSG MAI PSQI global score/subscales vs.:
D: e PSG MAI: r ¼ 0.195, p ¼ NS
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 65

Table 4 (continued )

Study Sample Convergent (C)/divergent (D) Results


validity

Nicassio et al., 2014 [70] Patients w RA (n ¼ 107) C: Correlations (Pearson's r):


 CESD PSQI global score vs.:
 MAF MAF: r ¼ 0.513, p < 0.001
 SF-36 CESD: r ¼ 0.337, p < 0.001
D: e SF-36 Vitality: r ¼ 0.524, p < 0.001
Fichtenberg et al., Patients w TBI, mTBI- C: PSQI global score 8 vs.:
2001 [80] 33%, modTBI-21%,  Insomnia by DSM-IV DSM-IV for Insomnia: sensitivity ¼ 83%, specificity ¼ 100%
sevTBI- 46% (n ¼ 50)  Sleep diary (sleep onset latency) PSQI global score ≥9 vs.:
 Sleep diary (sleep disturbance) DSM-IV for Insomnia: sensitivity ¼ 93%, specificity ¼ 100%
 Sleep diary (sleep efficiency) Correlations (r):
 BDI PSQI global score vs.:
 ESS Sleep diary SOL: r ¼ 0.796, NS
 MPI Sleep diary SD: r ¼ 0.633, NS
D: e Sleep diary SE: r ¼ 0.641, NS
PSQI mood item vs. BDI (sign. differences btw groups-w/ depression and w/
out)
PSQI daytime sleepiness item vs. ESS (sign. differences btw groups-w/
excessive sleepiness and w/out)
PSQI pain item vs. MPI (sign. differences btw groups-w/ pain and w/out)
Ritsner et al., 2004 [74] Patients w stable C: Correlations (Pearson's r):
schizophrenia (n ¼ 145)  Depression (MADRS) PSQI global score vs.:
 TBDI MADRS: r ¼ 0.32, p < 0.001
 Q-LES-Q TBDI: r ¼ 0.52, p < 0.001
D: Q-LES-Q: r ¼ 0.53, p < 0.001
 PANSS PANSS: ns
Scarlata et al., 2013 [75] Consecutive patients at C: PSQI global score 5 vs.:
high risk for OSAS  PSG AHI OSAS diagnoses: sensitivity ¼ 69.7 (60.2e78.1)%, specificity ¼ 31(22.8
(n ¼ 254) D: e e40.3) %
Diagnostic accuracy ¼ 49.8 (43.1e56.5)
PPV ¼ 48.7(40.6e56.8)
NPV ¼ 52.2(39.8e64.3)
AUC ¼ 0.509
PSQI SQ vs AHI: r ¼ 0.142, p ¼ 0.029
PSQI SD vs AHI: r ¼ 0.139, p ¼ 0.034
*Remaining PSQI's items/factors vs. AHI: NS
Skouteris et al., 2009 [76] Pregnant women C: PSQI global score vs.
(n ¼ 252)  Depression (BDI) T1: BDI: r ¼ 0.47, p < 0.001
D: e T2: BDI: r ¼ 0.36, p < 0.001
Spira et al., 2011 [77] Participants in the C: Correlations (Spearman's rho):
Osteoporotic Fractures  ESS PSQI global score vs.
in Men Study  Actigraphy (WASO) ESS: rho ¼ 0.13, p < 0.001
(n ¼ 3059)  Depression (GDS) WASO: rho ¼ 0.18, p < 0.001
 Mobility (IADL) GDS: rho ¼ 0.34, p < 0.001
D: IADL mobility: rho ¼ 0.23,p < 0.001
 SF-12 (physical) SF-12 mental: rho ¼ 0.30, p < 0.001
 SF-12 (mental) SF-12 physical: rho ¼ 0.31, p < 0.0001

Abbreviations: btw, between; DD, daytime dysfunction; DSM-IV, diagnostic and statistical manual of mental disorders, 4th edition; EDSS, extended disability status scale; ESS,
Epworth sleepiness scale; FLIC, functional living index for cancer patients; FMGPQ, French version of the McGill pain questionnaire; GAD, generalized anxiety disorder; GDS,
geriatric depression scale; GDS, global disability scale; GHQ-12, global health questionnaire; HADS, hospital anxiety and depression scale; HAI, hypersomnolent with abnormal
indices; HRSD, Hamilton rating scale for depression; IADL, instrumental activities of daily living; ICSD-2, international classification of sleep disorders, 2nd edition; ISI,
insomnia severity index; LOT, life orientation test; LR, likelihood ratio; MADRS, Montgomery and Asberg depression rating scale; MAI, microarousal index; MPI,
multidimensional pain inventory; MS, multiple sclerosis; MSLT, multiple sleep latency test; MSPSS, multidimensional scale of perceived social support; NH, non-
hypersomnolent; NPV, negative predictive value; NR, not reported; NS, not significant; OSAS, obstructive sleep apnea syndrome; PANSS, positive and negative syndrome
scale; PILL, Pennebaker inventory of limbic languidness; POMS, profile of mood states; PPV, positive predictive value; PSG, polysomnography; PSQI, Pittsburgh sleep
quality index; PSWQ, Penn State worry questionnaire; PTH, posttraumatic hypersomnia; PTSD, posttraumatic stress disorder; Q-LES-Q, quality of life enjoyment and
satisfaction questionnaire; RA, rheumatoid arthritis; RDI, respiratory disturbance index; ROC, receiver operating characteristic; RT, renal transplant; SD, sleep duration;
SM, sleep medications; SDI, sleep disturbance; SE, sleep efficiency; SEAS, sleep, energy and appetite scale; SER, symptom experience report; SF-12, short form (12) health
survey; SF-36, short form (36) health survey; SL, sleep lantency; SOL, sleep-onset latency; SQ, sleep quality; STAXI, state-trait anger expression index; TBDI, Talbieg brief
distress inventory; TST, total sleep time; w/, with; WASO, wake after sleep onset.

(95% CI 0.1e0.92), daytime dysfunction: WMD ¼ 0.89 (95% CI quality, discriminating “good” and “poor” sleepers, and in clinical
0.54e1.24) but the sleep disturbance subscale (WMD ¼ 0.25, 95% assessment of a variety of sleep disturbances. Interestingly, the PSQI,
CI 0.26e0.77) between non-clinical and clinical samples conceptualized and developed as a clinimetric measure (i.e., aimed
(Tables S3). for all items to measure a particular aspect of a complex clinical
construct in the absence of a gold standard for said construct;
Discussion emphasis on heterogeneity or “clinical phenomena”) [28], had its
properties subsequently evaluated and tested as a measure devel-
Sensibility oped by a psychometric strategy (i.e., all items measure a particular
construct or aspect of a construct; emphasis on homogeneity) [94].
Since its development, the PSQI has been widely used in research In adopting either approach, according to the practical guide-
and clinical practice, providing information on a respondent's sleep lines for the development of a measurement tool [83,92,93],
66 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

Fig. 3. Results of random effects analysis of the PSQI global scores, comparing clinical and non-clinical samples.

measurements in medicine should be performed using the most formulating independent hypotheses, specifically if the instru-
adequate method. When constructing the PSQI, developers ment is applied in another target population, another language,
acknowledged the clinical construct of sleep quality as a “complex or by another means of administration (i.e., interview vs self-
phenomenon that is difficult to define and measure objectively” report) [87].
[22], and declared that, from a clinical perspective, the concept of While the PSQI was developed with no specific population in
“sleep quality” includes quantitative aspects of sleep (i.e., sleep mind, it has been used and validated in a variety of populations
duration, sleep latency, number of arousals), as well as purely other than those to whom the instrument was administered in
subjective (i.e., self-perceived) aspects such as “depth” or “restful- the original publication. This may explain the conflicting results
ness” of sleep, where the exact elements that compose sleep we observed in the factor analytic research: different two- and
quality, and their relative importance can vary between individuals. three-factors models for the scoring structure of the PSQI were
Although the sensibility of the PSQI has not been formally proposed, and item #7 (i.e., during the past month, how often
evaluated previously, its features are important criteria for deter- have you taken medicine to help you sleep (prescribed or “over
mining the success or failure of a clinical tool and should precede the counter”)?) loaded poorly onto various factor structures, with
any psychometric evaluation [83]. As we have shown, the PSQI is removal of this item improving the fit in several studies, but not
currently the most commonly used generic sleep measure in clin- all (Table 5). The research varied in choice of sample (i.e., uni-
ical and research settings, indirectly supporting the sensibility of versity students, military veterans with posttraumatic stress
the instrument. Our structural evaluation of the sensibility using disorder, community dwelling depressed and non-depressed
the Bombardier framework supported the view of the PSQI as a adults, patients with chronic fatigue syndrome, etc.). These
sensible clinical index, which warranted all of the observed sub- groups have highly variable characteristics in terms of societal
sequent efforts that were applied to describe its psychometric stressors, medical pathology, sleep medication use, pain, etc.
properties in various clinical and non-clinical settings. Therefore it is reasonable to expect that the PSQI might function
differently in different populations and settings. The caveat to
Construct validity and reliability this point is that a “poor” sleeper as defined by the PSQI, may not
share the same set of symptoms with another “poor” sleeper.
The COSMIN panel defines validity as “the degree to which an Moreover, high levels of a symptom item may imply that a person
instrument truly measures the construct(s) it purports to measure” is a “poor” sleeper, and low symptom levels may indicate “good”
[30]. Dealing with unobservable constructs such as sleep quality sleep; in fact, this may not be the case. Given the results were not
makes it difficult to determine exactly what the tool measures. consistent, and taking into account the various populations
Construct validity applies in situations where there is no gold tested, there appears limited value in applying specific changes to
standard, and refers to whether the instrument provides the scores the factor structure of the original instrument (Table 5). Fayer and
that are expected based on existing knowledge about the construct Hand have shown the inadequacy of factor analysis in analyzing
[83]. models including a mixture of causal and effectual indicators,
pointing to the minimal importance of internal consistency for
Internal consistency, dimensionality and factor analytic research subscales and irrelevance of construct validity in testing for ho-
Cronbach's alphas ranged from 0.70 to 0.83, meeting the cut- mogeneity [96]. They proposed broad indicator coverage for
point for a positive rating for within- and between-group com- clinical indices which can be achieved by asking patients “Is there
parisons (i.e., 0.70). No studies reported Cronbach's alpha within anything else which has caused your problem?”. The PSQI in-
the ideal range for use in individual patients (i.e., 0.9e0.95). Three cludes such an item. Moreover, important issues such as lack of
studies, featuring patients with chronic fatigue syndrome and a self-insight and awareness in some persons can lead to under-
non-clinical sample, reported Cronbach's alphas below 0.70. The reporting of sleep issues, in which case the responses of signifi-
results indirectly support the notion of sleep quality as being based cant others can guide clinical decision making. Therefore, where
on both reflective and formative models (Fig. 1). clinicians are working one-on-one with individuals with the aim
Knowledge of the sleep quality construct under study and a of directing further investigation or intervention, focus on the
priori hypotheses on its relationship to other constructs in a individual items of an instrument is key. In research, if the
given population is crucial for a rigorous validation of the PSQI in sample size is sufficiently large, the mean PSQI global score will
future studies. Even more, each individual dimension measured provide a sufficient estimate for sleep quality in a given popu-
in the PSQI (i.e., subscale) should be validated separately by lation. De Vet et al. provide theoretical and practical points of
Table 5
Reported measurements properties of PSQI: structural validity.

Study Structural validity Results: Factor Models

Population Methods 1- 2- 3-

Aloba et al. 2006 [46] University students (n ¼ 520) Factor analysis: NA NA 1st factor: SQ (0.587), SL (0.443),
 Eigenvalues >0.4 considered HSE (0.487), SDI (0.642), SM
as loading on a factor (0.562), DD (0.671)
 All tests two-tailed 2nd factors: SD (0.832), SDI
 Level of significance at (0.421)*
p < 0.05 3rd factors: SQ (0.560), SM (0.561),
HSE (0.518)*
*Variance not reported
Babson et al. 2012 [48] Military veterans w PTSD EFA: Removal of SM factor: poor Removal of SM factor: good model Removal of SM factor: 3-factor
(n ¼ 226)  Fit of 1-,2-, and 3-factor model fit for 1-factor solution* fit for a 2- factor solution* solution did not converge*
models *All factor loading from a Х2 (52) ¼ 74.62 p ¼ 0.08, CFI ¼ 0.98, *All factor loading from a Geomin
 Factor structure replication in Geomin rotation TLI ¼ 0.98, RMSEA ¼ 0.03, rotation
randomly split samples SRMR ¼ 0.04.
(n ¼ 111), (n ¼ 115) *All factor loading from a Geomin
rotation

T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73


Buysse et al. 1989 [22] 1) Healthy subjects (n ¼ 52) Internal consistency Cronbach's All seven components measure NA NA
2) Depressed patients (n ¼ 34) a ¼ 0.83 particular aspect of the same
3) Sleep-disordered patients: Component score-global score construct
DIMS (n ¼ 45) and DOES: rs:
(n ¼ 17)  Largest: HSE and SQ (0.76
each)
 Smallest: SDI (0.35)
 Mean component
score-global score r ¼ 0.58
 Individual items strongly
correlated with each other
Casement et al. 2012 [54] Women w/PTSD due to sexual/ CFA: Poor fit Kotonoulas et al.'s (Greek version) Cole et al.'s [50] 3-factor:
physical assault (n ¼ 319)  GFI statistics 2-factor model: poor fit acceptable, best fit (c2diff ¼ 33.48;
 Chi-squared difference test (c2diff ¼ 0.32; p ¼ 0.57); p < 0.001)
for 1-, 2-, and 3-factor model Magee et al.'s 2- factor model:
acceptable (c2diff ¼ 22.49; p < 0.001)
for two GFI indices (WRMR, CFI)
Cole et al. 2006 [55] Community-dwelling EFA: Poor fit; loading of individual 1st factor: SE e strong loading from Best fit:
depressed (n ¼ 143) and non-  (i.e., >0.71 e excellent components ¼ 0.63 HSE (0.89) and SD (0.60) 1st factor: SE
depressed (n ¼ 207) adults loading; 0.63e0.70 e very 2nd factor: SQ e strong loading 2nd factor: SQ
60 yoa good; 0.55e0.62 e good; from SQ (0.77) and DS (0.55); 3rd factor: DD
0.45e0.54 e fair, SM e poor loading on both Loading of individual
0.32e0.44 e poor; factors; SE e 39.9% of the variance; components ¼ 0.73; range ¼ 0.43
<0.32 e discarded) SQ e 17.4% of variance; correlation e0.91 rs between factors 0.42e0.82
 GFI and AGFI: >0.90; between two (r ¼ 0.33)
CFI: >0.95; RMSEA:
<0.06
Mariman et al. 2012 [63] N ¼ 413, CFS patients EFA and CFA 1-factor: poor fit (Х2 ¼ 109.90, 2-factor: poor fit (significant Х2; Cole et al.'s 3-factor: acceptable fit
EFA: d.f. ¼ 14, p < 0.001; GFI ¼ 0.92, results not provided) (Х2 ¼ 14.70, d.f. ¼ 11, p ¼ 0.20;
 validity of the single factor AGFI ¼ 0.85, CFI ¼ 0.84; GFI ¼ 0.99, AGFI ¼ 0.97, CFI ¼ 0.99;
 validity of all 2- factor models RMSEA ¼ 0.13, CAIC ¼ 208.23) RMSEA ¼ 0.03, CAIC ¼ 134.10)
(continued on next page)

67
Table 5 (continued )

68
Study Structural validity Results: Factor Models

Population Methods 1- 2- 3-

Maximum Likelihood
Algorithm for model's fit
Magee et al. 2008 [62] N ¼ 364, adults EFA and CFA 1-factor: poor fit (Х2 ¼ 26.44, 2 efactors from EFA: good fit Cole et al.'s 3-factor model:
EFA: d.f. ¼ 14, p < 0.05; GFI ¼ 0.96, (Х2 ¼ 16.84, d.f. ¼ 13; GFI ¼ 0.97, acceptable fit, similar to 2efactors
 to identify factor structure AGFI ¼ 0.92, CFI ¼ 0.94; AGFI ¼ 0.94, CFI ¼ 0.98; from EFA (Х2 ¼ 14.20, d.f. ¼ 11;
CFA: RMSEA ¼ 0.064) RMSEA ¼ 0.04) GFI ¼ 0.98, AGFI ¼ 0.94, CFI ¼ 0.98;
 to test 2-factor model from 1-factor w SM removed: poor fit 2-factors from EFA w/SM removed- RMSEA ¼ 0.04)
EFA; single factor structure, (Х2 ¼ 21.82, d.f. ¼ 9, p < 0.05; no improvement in model fit Cole et al.'s [50] 3-factor model w
3-factor structure by Cole GFI ¼ 0.96, AGFI ¼ 0.91, indices (Х2 ¼ 12.57, d.f. ¼ 8; SM removed: all factor loadings
et al.'s CFI ¼ 0.93; RMSEA ¼ 0.09) GFI ¼ 0.98, AGFI ¼ 0.94, CFI ¼ 0.98; significant, all goodness-of-fit
Maximum Likelihood RMSEA ¼ 0.06) acceptable, a little weaker
Algorithm used to investigate (Х2 ¼ 10.42, d.f. ¼ 6, GFI ¼ 0.98,
model's fit AGFI ¼ 0.93, CFI ¼ 0.98;
RMSEA ¼ 0.06)
Nicassio et al. 2014 [70] N ¼ 107, patients w RA CFA to evaluate 1-, 2-, and 3- 1-factor: poor fit (Х2 ¼ 19.88, 2-factor: satisfactory fit; internal 3-factor: best fit, but Chronbach's
factor(s) solution d.f. ¼ 9, p ¼ 0.019; Х2/ consistency Chronbach's a ¼ 0.70 a ¼ 0.58 and DD factor

T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73


d.f. ¼ 2.21; CFI ¼ 0.894; e0.71 ¼> optimal solution (a ¼ 0.53) ¼> not considered in
RMSEA ¼ 0.107) (Х2 ¼ 3.63, d.f. ¼ 8, p ¼ 0.889; Х2/ further analyses (Х2 ¼ 0.60, d.f. ¼ 6,
d.f. ¼ 0.45; CFI ¼ 1.00; p ¼ 0.996; Х2/d.f. ¼ 0.10; CFI ¼ 1.00;
RMSEA < 0.001) RMSEA < 0.001)
Rener-Sitar et al. N ¼ 609, patients w TMD EFA and CFA EFA: NR NR
2014 [69]  Principal factors method Factors explained 41% of
 Scree plot variance
 Orthogonal varimax or in pain-related TMD and 37% in
oblique promax technique pain-free TMD
CFA:
1-factor: good fit
(Х2 ¼ 32.75, d.f. ¼ 14, p ¼ 0.003;
CFI and TMI > 0.95;
SRMR ¼ 0.04; RMSEA ¼ 0.05)
Robust method for estimation:
model fit worsened
(Х2 ¼ 66.24, d.f. ¼ 14, p < 0.001;
CFI ¼ 0.93; TMI ¼ 0.90;
SRMR ¼ 0.04; RMSEA ¼ 0.08)
Otte et al. 2013 [72] N ¼ 1174, non-depressed BC CFA to evaluate 1-, 2-, and 3- 1-factor: poor fit (Х2 ¼ 199.88, Cole et al.'s [50] 2-factor: not Cole et al.'s [50] 3-factor (i.e., SE,
patients, African-American and factor(s) solution d.f. ¼ 14, p < 0.05; acceptable fit (Х2 ¼ 89.70, d.f. ¼ 13, SQ, DD): not acceptable fit
Caucasian SRMR ¼ 0.071; CFI ¼ 0.96; p < 0.05; SRMR ¼ 0.0048; (Х2 ¼ 80.32, d.f. ¼ 11, p < 0.05;
RMSEA ¼ 0.105) CFI ¼ 0.98; RMSEA ¼ 0.075) SRMR ¼ 0.0044; CFI ¼ 0.98;
New 2-factor model (i.e., SE, SQ): RMSEA ¼ 0.076)
good fit (Х2 ¼ 51.25, d.f. ¼ 35, New 3-factor: acceptable fit
p<.0.37; SRMR ¼ 0.028; CFI ¼ 1.00; (Х2 ¼ 3.34, d.f. ¼ 6, p < 0.76;
RMSEA ¼ 0.026) SRMR ¼ 0.012; CFI ¼ 1.00;
RMSEA ¼ 0.000)
Skouteris et al. 2009 [76] N ¼ 252, pregnant women SEM to test 2- factor models at NR Model 1: NR
T1: 18.32 (1.61) wks pregnancy T1 and T2 T1: Х2 ¼ 41.3, d.f. ¼ 8; ECVI ¼ 0.27
T2: 34.63 (1.71) wks pregnancy Model 1: CFI ¼ 0.91; RMSEA ¼ 0.13
 Factor 1 e overall SQ, SL, SD, T2: Х2 ¼ 6.5, d.f. ¼ 7; CFI ¼ 1.00;
and SE RMSEA ¼ 0.00; ECVI ¼ 0.14
 Factor 2 e SDI and DD Model 2: improved fit
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 69

dysfunction; DIMS, disorders of initiating and maintaining sleep; DOES, disorders of excessive somnolence; ECVI, expected cross validation index; EFA, evaluative factor analysis; GFI, goodness-of-fit index; HSE, habitual sleep
efficiency; NA, not applicable; NR, not reported; PTSD, posttraumatic stress disorder; RA, rheumatoid arthritis; RMSEA, root mean square error of approximation; SD, sleep duration; SDI, sleep disturbances; SEM, structural
Abbreviations: AGFI, adjusted goodness-of-fit index; BC, breast cancer; CAIS, consistent Akaike information criterion; CFA, confirmatory factor analysis; CFI, comparable fit index; CFS, chronic fatigue syndrome; DD, daytime

equation modeling; SL, sleep latency; SM, sleep medications; SQ, sleep quality; SRMR, standardized root mean square residual; TLI, Tucker-Lewis index; TMD, temporomandibular disorder; WRMR, weighted root mean squared
view in discussing scoring of multidimensional instruments,

(Х2 ¼ 213.295, d.f. ¼ 19, p < 0.0001;

(Х2 ¼ 104.15, d.f. ¼ 19, p < 0.0001;

(Х2 ¼ 118.289, d.f. ¼ 12, p < 0.0001;

(Х2 ¼ 53.50, d.f. ¼ 12, p < 0.0001;


stating that although summation of subscales results in the loss
Both groups: Cole et al.'s 3-factor

SM low (0.45) ¼> removed ¼>


of insight with regard to individual subscales and their impact, a
global score is practical and often sufficient to address the main

CFI ¼ 0.95; SRMR ¼ 0.028)


CFI ¼ 0.93, SRMR ¼ 0.037)

CFI ¼ 0.96, SRMR ¼ 0.026)


CFI ¼ 0.91; SRMR ¼ 0.04)
non-Hispanic whites:
aim [26] e the overall quality of one's sleep in our case. With

non-Hispanic whites:
regards to a clinical setting, it is important to know which item or
subscale is most affected.
model: good fit

We therefore suggest that the global score is a practical

improved fit
approach for a researcher looking to quantify a population's sleep
English:

English:
quality, or compare populations, as initially proposed by Dr.
Buysse's group [22], and to discriminate “good” and “poor”
sleepers, while in the clinical setting, focusing on individual items
within the PSQI to detect attributes of poor sleep quality necessary
T2: Х2 ¼ 18.2, d.f. ¼ 7; CFI ¼ 0.98;

to direct targeted investigation and treatment. The reasons for this


conclusion are: 1) the global PSQI score demonstrated acceptable
RMSEA ¼ 0.08; ECVI ¼ 0.18
CFI ¼ 0.91; RMSEA ¼ 0.15;

internal consistency across various populations and clinical set-


T1: T2: Х2 ¼ 53.4, d.f. ¼ 8;

tings, and correlations between most subscale scores and global


scores suggest that the global score sufficiently represents each
domain of sleep quality; 2) the factor structure of sleep quality is
not constant between and within groups, therein, structural vari-
ECVI ¼ 0.32

ance is expected between and within groups; 3) because the


prevalence of sleep dysfunction and its main attributes varies
NR

across and within clinical and non-clinical samples, an optimal cut-


off score is not a stable value. However, the comprehensive
coverage of the main domains of sleep quality within the PSQI, as
1-factor model (6 subscales) fit-

per the DSM-IV and ICSD-2, is obvious, and individual items can
1-factor model (7 subscales):

SM low (0.314) ¼> removed


p < 0.0001; CFI ¼ 0.861; but
SMRM ¼ 0.049 ¼> fit well

direct clinical decision-making for individual patients.


p < 0.0001; CFI ¼ 0.881;

p < 0.0001; CFI ¼ 0.843;


(Х2 ¼ 424.83, d.f. ¼ 26,

(Х2 ¼ 333.81, d.f. ¼ 19,

(Х2 ¼ 155.89, d.f. ¼ 19,


non-Hispanic whites:

Convergent/divergent and known-group validity


The published findings we assessed outlined high correlations
SMRM ¼ 0.045)

SMRM ¼ 0.053)
from the model
descriptively)

of the PSQI with other measures of sleep quality (i.e., clinical


improved:

diagnosis of insomnia, the ISI score, some variables of PSG and


English:
poor fit

actigraphy, etc.) and weak or no association with less related con-


structs (i.e., vomiting, anger, spasticity, bladder dysfunction, de-
mentia, etc.). The basic principle of construct validation is that
hypotheses are formulated about the relationship between scores
3-factor structure found by Cole

of the instrument of interest (i.e., the PSQI) and scores of other


CFA to examine the fit of a

instruments measuring a similar or different construct [30]. These


hypotheses should be set a priori; the specific expectations with
regards to certain relationships can be based either on an under-
Model 2 (revised)

1-factor structure

lying conceptual model or on the data from the literature [83]. Our
et al. tested

data have shown that only a few researchers tested hypotheses


related to the relationship between sleep quality and other con-
structs by stating ahead the expected direction and magnitude of
associations based on what was known about the constructs under
study. In future studies, to assess similarity or dissimilarity between
constructs, when formulating hypotheses, one should first have
whites (n ¼ 1698) and English

insight into the content of comparable measurement instruments,


edwelling, non-Hispanic

as conceptually similar items within any given tool are expected to


N ¼ 2352, community

be correlated strongly with conceptually similar scales. Further-


more, there should be a clear description of what is known about
the population under the study. For example, in a population with
the hypothalamic dysfunction (i.e., due to tumor, increased intra-
(n ¼ 654)

cranial pressure that can exert pressure on various hypothalamic


nuclei, etc.), disturbances of neuroendocrine, autonomic, homeo-
static functions, sleepewake cycles, and emotional behavior are
coming together. Thus, the construct of sleep dysfunction in this
Tomfohr et al. 2013 [78]

population could be hypothesized to be related to all of the named


constructs. Therefore, future testing of divergent or convergent
validity of the PSQI should be driven by a priori hypotheses based
on what is known about the population under the study and the
content of utilized instruments.
The evidence for the PSQI's known-group validity was strong, in
residual.

line with clinical expectations (Table 3). Intra-rater reliability


questions whether a person (i.e., respondent) would give the same
70 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

responses to a test administered by different raters on different number of studies, there is tendency to draw conclusions, usually
occasions, a short period during which no true changes to the ad hoc summaries, which can often be misleading [97]. Therefore, a
measured construct can occur [30]. Such data was not provided in meta-analysis with known limitations may nevertheless be pref-
the reviewed studies. Repeated measurements of any construct erable to an ad hoc summary.
may differ due to day-to-day variation, the instrument used, the
persons administering the measure, or the circumstances under Conclusion
which the measurements are taken [83,95]. All sources of variation
play a role in test-retest reliability. There is limited evidence on The PSQI is currently the only standardized clinical instrument
test-retest for the PSQI e while three studies have shown stability that covers a broad range of indicators relevant to sleep quality.
in measured sleep quality dimensions between test and retest, it is Items pertaining to circadian rhythm disorders and medication
still unclear what the appropriate length of time between test and effects other than those by sleep aids, although not covered, may be
retest is. inferred based on analysis of data from available items, together
with a detailed clinical history of the patient. We found strong
Sex differences in PSQI positive evidence for reliability and validity (hypothesis testing),
and moderate positive evidence for structural validity testing in a
Lack of consistency in sex differences between studies can be variety of non-clinical and clinical samples. While the internal
attributed to a variety of factors, the individual impacts of each a consistency of the PSQI did not reach the level recommended for
challenge to interpret. To determine whether differences in sleep individual level comparison, it is worth noting that, to date, no
quality between study samples was attributable to age, general study of agreement among clinicians in assessing quality of sleep in
health, clinical disorders, psychosocial stressors, cultural differ- a patient has been conducted [98]. It would not be a surprise
ences, sex, or a combination, is a complex task. At this time the however, if the agreement is low, given the limited knowledge
nature and implications of conflicting results have not been about sleep function; such differences of opinion have been
adequately explored given limited information available. Never- demonstrated [98e100]. Thus, the utility of a standardized tool
theless, considering extensive evidence on morphologic differences such as the PSQI holds great potential for clinical practice, as the
between sexes in circadian clock genes, respiratory control, stress agreement is potentially much higher and the findings more
responses and the action of sex hormones on sleep mechanisms, consistent, specifically if progress is made on focusing on individual
sleep quality is likely to be influenced by sex. items and reports of a significant other.

Meta-analysis

We aimed to evaluate the potential significance of the PSQI Practice points


score for screening purposes. Our results highlight the statistically
and potentially clinically meaningful mean score difference in the  The PSQI developers' construct of “sleep quality” was
global and all subscale scores between non-clinical and clinical defined based on clinical judgment alone. Nevertheless, it
samples, with the exception of sleep disturbance, suggesting that covers a broad range of indicators relevant to sleep
the PSQI score may be helpful in identifying poor sleep quality. quality.
 In the majority of studies, Cronbach's alphas ranged from
Limitations 0.70 to 0.83, meeting the cut-point for a positive rating for
within- and between-group comparisons (i.e., 0.70). No
There are potential limitations to this review. The review team studies reported Cronbach's alpha within the ideal range
did not contact authors of reviewed papers for additional meth- for use in individual patients (i.e., 0.9e0.95).
odological details that were not available from the publication, with  Several studies examining the unidimensionality of the
the exception of the developer of the scale, who was contacted for PSQI raised concerns over the factor structure of the in-
clarification pertaining to the first research goal (i.e., appraisal of strument; in studies using factor analysis, eight out of
tool development). Further, we appraised methodological quality eleven reported that a single-factor model fits the data
utilizing the COSMIN approach, a relatively new and still-evolving poorly, and the PSQI is better represented in a two- or
method [30]. We provided reasons for poor to fair overall quality three-factor model. The results were not consistent.
ratings in each of the featured studies to clarify the resultant  The published findings outlined high correlations of the
assessment grades. Also, our analysis was limited to the English PSQI with other measures of sleep quality (i.e., clinical
version of the PSQI, omitting cross-cultural adaptations of the in- diagnosis of insomnia, the ISI score, some variables of
strument and their possibly relevant results. Further, while the PSG and actigraphy, etc.) and weak or no associations
potential application of the PSQI may also include predictive and with less related constructs (i.e., vomiting, anger, spas-
evaluative categories, the main focus of our work was examining ticity, bladder dysfunction, etc.).
the properties of the PSQI as a discriminative index (i.e., its ability to  Known-group construct validity was established, and was
distinguish between individuals or groups on an underlying in line with clinical expectations.
dimension), when no external criterion is available for validating  There is limited evidence on test-retest; it is still unclear
the measure [14]. Thus, the qualities of the PSQI as a predictive and what the appropriate length of time between test and
evaluative tool of sleep quality require future evaluation. retest is, requiring further testing and definition.
In our meta-analysis, study heterogeneity problems arose.  Lack of consistency in sex differences were observed
Although our procedure was a practical way to generate a more between studies.
powerful estimate of mean score differences between clinical and  In non-clinical and clinical samples with known differ-
non-clinical samples, it did come with potential limitations due to ences in sleep quality, the PSQI global scores and all
the small number of studies, and the random-effects analysis, subscale scores, with the exception of sleep disturbance,
where our estimate of the error may itself have been unreliable. differed significantly.
However, many researchers would argue that when faced with any
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 71

[5] Leger D, Bayon V, Ohayon MM, Philip P, Ement P, Metlaine A, et al.


Insomnia and accidents: cross-sectional study (EQUINOX) on sleep-
Research agenda related home, work and car accidents in 5293 subjects with insomnia
from 10 countries. J Sleep Res 2014;23:143e52.
 Future studies should focus on establishing the universal [6] Laugsand LE, Strand LB, Vatten LJ, Janszky I, Bjørngaard JH. Insomnia
symptoms and risk for unintentional fatal injuries-the HUNT study. Sleep
applicability of the PSQI to respondents and identify 2014;37:1777e86.
important components of the sleep dysfunction domains [7] Soldatos CR, Paparrigopoulos TJ. Sleep physiology and pathology: perti-
relevant to the population of interest. nence to psychiatry. Int Rev Psychiatry 2005;17:213e28.
[8] Barthlen GM, Stacy C. Dyssomnias, parasomnias, and sleep disorders
 Future divergent or convergent validity study of the PSQI associated with medical and psychiatric diseases. Mt Sinai J Med 1994;61:
should be driven by a priori hypotheses. 139e59.
 High quality methodological studies in clinical settings [9] Guallar-Castillo n P, Bayan-Bravo A, Leo n-Mun~ oz LM, Balboa-Castillo T,
Lopez-García E, Gutierrez-Fisac JL, et al. The association of major patterns
utilizing the PSQI are warranted, specifically in such do- of physical activity, sedentary behavior and sleep with health-related
mains as test-retest reliability; relationship between index quality of life: a cohort study. Prev Med 2014;67:248e54.
and external measures at a single point of time; study of [10] Schenck CH, Mahowald MW. Long-term, nightly benzodiazepine treat-
ment of injurious parasomnias and other disorders of disrupted nocturnal
interperson variation in reporting.
sleep in 170 adults. Am J Med 1996;100:333e7.
 A taxonometric analysis of the PSQI may contribute to our [11] Buysse DJ. Sleep health: can we define it? Does it matter? Sleep 2013;37:
understanding of sleep dysfunction as dichotomous or 9e17.
continuous construct in various clinical and non-clinical *[12] Bassetti CL, Dogas Z, Dijk DJ, Levy P, Nobili LL, Peigneux P, et al. The future
of sleep research and sleep medicine in Europe: a need for academic
samples. multidisciplinary sleep centres. In: Bassetti CL, Knobl B, Schulz H, editors.
 One of the most promising applications of the discrimi- European Sleep Research Society (1972e2012). 40th Anniversary of the
native function of the PSQI is in attempting to quantify the ESRS, Regensburg, Bern; 2012. p. 7e8.
[13] Morin CM, LeBlanc M, Daley M, Gregoire JP, Me rette C. Epidemiology of
burden of sleep dysfunction in clinical and non-clinical insomnia: prevalence, self-help treatments, consultations, and de-
settings. terminants of help-seeking behaviors. Sleep Med 2006;7:123e30.
*[14] Kirshner B, Guyatt G. A methodological framework for assessing health
indices. J Chron Dis 1985;38:27e36.
[15] Shahid A, Wilkinson K, Marcu S, Shapiro CM. Stop, that and one hundred
other sleep scales. Toronto: Springer; 2011.
Conflict of interest [16] Patrick DL, Burke LB, Powers JH, Scott JA, Rock EP, Dawisha S, et al. Patient-
reported outcomes to support medical product labeling claims: FDA
perspective. Value Health 2007;10(Suppl. 2):S125e37.
The authors have no conflict of interest or outside funding
*[17] Quatrano LA, Cruz TH. Future of outcome measurement: impact on
sources to disclose. research in medical rehabilitation and neurologic populations. Arch Phys
Med Rehabil 2011;92(Suppl. 10):S7e11.
[18] Shelgikar AV, Chervin R. Approach to and evaluation of sleep disorders.
Acknowledgments Contin (Minneap Minn) 2013;19:32e49.
[19] American Psychiatric Association. Diagnostic and statistical manual of
mental disorders. 4th ed. Arlington: American Psychiatric Association;
Our study had no external funding source. The first author was
2000.
supported by 2012/2013 Toronto Rehabilitation Institute Scholar- [20] Buysse DJ. Insomnia JAMA 2013;309:706e16.
ship, the Ontario Graduate Scholarship 2012/2013 and the 2013/ [21] Lichstein KL, Durrence HH, Taylor DJ, Bush AJ, Riedel BW. Quantitative
2015 Frederick Banting and Charles Best Doctoral Research Award criteria for insomnia. Behav Res Ther 2003;41:427e45.
*[22] Buysse DJ, Reynolds III CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh
from the Canadian Institutes of Health Research. Angela Colantonio sleep quality index: a new instrument for psychiatric practice and
was supported by the Saunderson Family Chair in Acquired Brain research. Psychiatry Res 1989;28:193e213.
Injury Research and the Canadian Institutes for Health Research [23] Parrott AC, Hindmarch I. The Leeds sleep evaluation questionnaire for
psychopharmacology research. In: Parrott AC, Hindmarch I, editors. Sleep
GranteInstitute for Gender and Health. (#CGW-126580). Support disorders: diagnosis and therapeutics. London: Informa Health Care;
was also provided through the Ontario Work Study Program and 2008. p. 685e9.
the Youthdale Foundation. We gratefully acknowledge the contri- [24] Douglass AB, Bornstein R, Nino-Murcia G, Keenan S, Miles L, Zarcone VP,
et al. The sleep disorders questionnaire I: creation and multivariate
butions of Jessica Babineau, information specialist at the Toronto structure of SDQ. Sleep 1994;17:160e7.
Rehabilitation Institute, for her help with the literature search, [25] Hays RD, Martin SA, Sesti AM, Spritzer KL. Psychometric properties of the
Suhail Doi, professor of clinical epidemiology at University of medical outcomes study sleep measure. Sleep Med 2005;6:41e4.
[26] Terwee CB, Jansma EP, Riphagen II , de Vet HC. Development of a
Queensland, for statistical consultation, and Wayne Khuu for methodological PubMed search filter for finding studies on measure-
involvement in meta-analysis featured in this study. ment properties of measurement instruments. Qual Life Res 2009;18:
1115e23.
[27] Bombardier C, Tugwell P. A methodological framework to develop and
Appendix A. Supplementary data select indices for clinical trials: statistical and judgmental approaches.
J Rheumatol 1982;9:169e72.
*[28] Feinstein AR. The theory and evaluation of sensibility. In: Feinstein AR,
Supplementary data related to this article can be found at http:// editor. In clinimetrics. New Haven, London: Yale University Press; 1987.
dx.doi.org/10.1016/j.smrv.2015.01.009. p. 141e66.
[29] Rowe BH, Oxman AD. An assessment of the sensibility of the quality of life
instrument. Am J Emerg Med 1993;11:364e80.
References *[30] Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al.
The COSMIN study reached international consensus on taxonomy, ter-
*[1] National Sleep Foundation: sleep in America, Poll. Washington DC: Na- minology and definition of measurement properties for health-related
tional Sleep Foundation; 2008. patient-reported outcomes. J Clin Epidemiol 2010;6:737e45.
[2] Czeisler AC, Winkelman LW, Richardson GS. Chapter 27. Sleep disorders. [31] van Tulder M, Furlan A, Bombardier C, Bouter L. Editorial board of the
In Harrison's 15th ed. Principles of internal medicine: eds Brauman E, Cochrane Collaboration Back Review Group. Spine (Phila Pa 1976)
Fauci AS, Kasper DL, Houser SL, Longo DL, Jameson J. 2003;28:1290e9.
[3] Shekleton JA, Flynn-Evans EE, Miller B, Epstein LJ, Kirsch D, Brogna LA, [32] Higgins JPT, Green S (eds). Cochrane handbook for systematic reviews of in-
et al. Neurobehavioral performance impairment in insomnia: relation- terventions, version 5.0.1. Available at: http://www.cochrane-handbook.org/.
ships with self-reported sleep and daytime functioning. Sleep 2014;37: [33] Fung CH, Martin JL, Chung C, Fiorentino L, Mitchell M, Josephson KR, et al.
107e16. Sleep disturbance among older adults in assisted living facilities. Am J
[4] Fortier-Brochu E, Beaulieu-Bonneau S, Ivers H, Morin CM. Insomnia and Geriatr Psychiatry 2012;20:485e93.
daytime cognitive performance: a meta-analysis. Sleep Med Rev 2012;16: [34] Nokes KM, Kendrew J. Correlates of sleep quality in persons with HIV
83e94. disease. J Assoc Nurses AIDS Care 2001;12:17e22.
72 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73

[35] Sandadi S, Frasure HE, Broderick MJ, Waggoner SE, Miller J, von [63] Mariman A, Vogelaers D, Hanoulle I, Delesie L, Tobback E, Pevernagie D.
Gruenigen VE. The effect of sleep disturbance on quality of life in women Validation of the three-factor model of the PSQI in a large sample of
with ovarian cancer. Gynecol Oncol 2011;123:351e5. chronic fatigue syndrome (CFS) patients. J Psychosom Res 2012;72:
[36] Sanford SD, Wagner LI, Beaumont JL, Butt Z, Sweet JJ, Cella D. Longi- 111e3.
tudinal prospective assessment of sleep quality: before, during, and [64] Masel BE, Scheibel RS, Kimbark T, Kuna ST. Excessive daytime sleepi-
after adjuvant chemotherapy for breast cancer. Support Care Cancer ness in adults with brain injuries. Arch Phys Med Rehabil 2001;82:
2013;21:959e67. 1526e32.
[37] Soehner AM, Kennedy KS, Monk TH. Circadian preference and sleep-wake [65] Mondal P, Gjevre JA, Taylor-Gjevre RM, Lim HJ. Relationship between the
regularity: associations with self-report sleep parameters in daytime- Pittsburgh sleep quality index and the Epworth sleepiness scale in a sleep
working adults. Chronobiol Int 2011;28:802e9. laboratory referral population. Nat Sci Sleep 2013;5:15e21.
[38] Toor P, Kim K, Buffington CK. Sleep quality and duration before and after [66] Morin CM, Belleville G, Belanger L, Ivers H. The insomnia severity index:
bariatric surgery. Obes Surg 2012;22:890e5. psychometric indicators to detect insomnia cases and evaluate treatment
[39] Sharkey KM, Kurth ME, Anderson BJ, Corso RP, Millman RP, Stein MD. response. Sleep 2011;34:601e8.
Assessing sleep in opioid dependence: a comparison of subjective ratings, [67] Neau J, Paaquereau J, Auche V, Mathis S, Godeneche G, Ciron J, et al. Sleep
sleep diaries, and home polysomnography in methadone maintenance disorders and multiple sclerosis: a clinical and polysomnography study.
patients. Drug Alcohol Depen 2011;11:245e8. Eur Neurol 2012;68:8e15.
[40] Cross NE, Lagopoulos J, Duffy SL, Cockayne NL, Hickie IB, Lewis SJG, et al. [68] Neu D, Mairesse O, Hoffman G, Dris A, Lambrecht LJ, Linkowski P, et al.
Sleep quality in healthy older people: relationship with 1H magnetic Sleep quality perception in the chronic fatigue syndrome: correlations
resonance spectroscopy markers of glial and neuronal integrity. Behav with sleep efficiency, affective symptoms and intensity of fatigue. Neu-
Neurosci 2012;127:803e10. ropsychobiology 2007;56:40e6.
[41] Blackwell T, Redline S, Ancoli-Israel S, Schneider JL, Surovec S, Johnson NL, [69] Rener-Sitar K, John MT, Bandyopadhyay D, Howell MJ, Schiffman EL.
et al. Comparison of sleep parameters from actigraphy and poly- Exploration of dimensionality and psychometric properties of the Pitts-
somnography in older women: the SOF study. Sleep 2008;31:283e91. burgh sleep quality index in cases with temporomandibular disorders.
[42] Burkhalter H, Serelka SM, Engberg S, Wirz-Justice A, Steiger J, De Geest S. Health Qual Life Out 2014;12:10.
Validity of 2 sleep quality items to be used in a large cohort of kidney [70] Nicassio PM, Ormseth SR, Custodio MK, Olmstead R, Weisman MH,
transplant recipients. Prog Transpl 2011;21:27e35. Irwin MR. Confirmatory analysis of the Pittsburgh sleep quality index in
[43] Broderick JE, Junghaenel DU, Schneider S, Pilosi JJ, Stone AA. Pittsburgh rheumatoid arthritis patients. Behav Sleep Med 2014;12:1e12.
and Epworth sleep scale items: accuracy of ratings across different [71] Osorio CD, Gallinaro AL, Lorenzi-Filho G, Lage LV. Sleep quality in patients
reporting periods. Behav Sleep Med 2013;11:173e88. with fibromyalgia using the Pittsburgh sleep quality index. J Rheumatol
[44] Backhaus J, Junghanns K, Broocks A, Riemann D, Hohagen F. Test-retest 2006;33:1863e5.
reliability and validity of the Pittsburgh sleep quality index in primary [72] Otte JL, Rand KL, Carpenter JS, Russell KM, Champion VL. Factor analysis of
insomnia. J Psychosom Res 2002;53:737e40. the Pittsburgh sleep quality index in breast cancer survivors. J Pain
[45] Afsar B, Elsurer R. The relationship between sleep quality and daytime Symptom Manage 2013;45:620e7.
sleepiness and various anthropometric parameters in stable patients [73] Buysse DJ, Reynolds III CF, Monk TH, Hoch CC, Yeager AL, Kupfer DJ.
undergoing hemodialysis. J Ren Nutr 2013;23:296e301. Quantification of subjective sleep quality in healthy elderly men and
[46] Aloba OO, Adewuya AO, Ola BA, Mapayi BM. Validity of the Pittsburgh women using the Pittsburgh sleep quality index (PSQI). Sleep 1991;14:
sleep quality index among Nigerian university students. Sleep Med 331e8.
2007;8:266e70. [74] Ritsner M, Kurs R, Ponizovsky A, Hadjez J. Perceived quality of life in
[47] Alsaadi SM, McAuley JH, Hush JM, Bartlett DJ, Henschke N, Grunstein RR, schizophrenia: relationships to sleep quality. Qual Life Res 2004;12:
et al. Detecting insomnia in patients with low back pain: accuracy of four 783e91.
self-report measures. BMC Musculoskel Dis 2013;14:196. [75] Scarlata S, Pedone C, Curcio G, Cortese L, Chiurco D, Fontana D, et al. Pre-
[48] Babson KA, Blonigen DM, Boden MT, Drescher KD, Bonn-Miller MO. Sleep polysomnographic assessment using the Pittsburgh sleep quality index
quality among U.S. military veterans with PTSD: a factor analysis and questionnaire is not useful in identifying people at higher risk for
structural model of symptoms. J Trauma Stress 2012;25:665e74. obstructive sleep apnea. J Med Screen 2013;20:220.
[49] Beaudreau SA, Spira AP, Stewart A, Kezirian EJ, Lui L, Ensrud K, et al. [76] Skouteris H, Wertheim EH, Germano C, Paxton SJ, Milgrom J. Assessing
Validation of the Pittsburgh sleep quality index and the Epworth sleepi- sleep during pregnancy. A study across two time points examining the
ness scale in older black and white women. Sleep Med 2012;13:36e42. Pittsburgh sleep quality index and associations with depressive symp-
[50] Beck S, Schwartz AL, Towsley G, Dudley W, Barsevick A. Psychometric toms. Women Health Iss 2009;19:45e51.
evaluation of the Pittsburgh sleep quality index in cancer patients. J Pain [77] Spira AP, Beaudreau SA, Stone KL, Kezirian EJ, Lui L, Redline S, et al.
Symptom Manage 2004;27:140e8. Reliability and validity of the Pittsburgh sleep quality index and the
[51] Bush AL, Armento MEA, Weiss BJ, Rhoades HM, Novy DM, Wilson NL, et al. Epworth sleepiness scale in older men. J Gerontol A Biol Sci Med Sci
The Pittsburgh sleep quality index in older primary care patients with 2012;67A:433e9.
generalized anxiety disorder: psychometrics and outcomes following [78] Tomfohr LM, Schweizer CA, Dimsdale JE, Loredo JS. Psychometric char-
cognitive behavioral therapy. Psychiatry Res 2012;199:24e30. acteristics of the Pittsburgh sleep quality index in English speaking non-
[52] Buysse DJ, Hall ML, Strollo PJ, Kamarck TW, Owens J, Lee L, et al. Re- Hispanic Whites and English and Spanish speaking Hispanics of
lationships between the Pittsburgh sleep quality index (PSQI), Epworth Mexican descent. J Clin Sleep Med 2013;9:61e6.
sleepiness scale (ESS), and clinical/polysomnographic measures in a [79] Zohal MA, Yazdi Z, Kazemifar AM. Daytime sleepiness and quality of sleep
community sample. J Clin Sleep Med 2008;4:563e71. in patients with COPD compared to control group. Glob J Health Sci
[53] Carpenter JS, Andrykowski MA. Psychometric evaluation of the Pittsburgh 2013;5:150e5.
sleep quality index. J Psychosom Res 1998;45:5e13. [80] Fichtenberg NL, Zafonte RD, Putnam S, Mann NR, Millard AE. Insomnia in a
[54] Casement MD, Harrington KM, Miller MW, Resick PA. Associations be- post-acute brain injury sample. Brain Inj 2002;16:197e206.
tween Pittsburgh sleep quality index factors and health outcomes in [81] Cronbach LJ, Warrington WG. Time-limit tests: estimating their reliability
women with posttraumatic stress disorder. Sleep Med 2012;13:752e8. and degree of speeding. Psychometrika 1951;16:167e88.
[55] Cole JC, Motivala SJ, Buysse DJ, Oxman MN, Levin MJ, Irwin MR. Validation *[82] Feinstein AR, Wells CK, Joyce CM, Josephy BR. The evaluation of sensibility
of a 3-factor scoring model for the Pittsburgh sleep quality index in older and the role of patient collaboration in clinimetric indexes. Trans Assoc
adults. Sleep 2006;29:112e6. Am Physicians 1985;98:146e9.
[56] Elsenbruch S, Harnish MJ, Orr WC. Subjective and objective sleep quality [83] deVet CHW, Terwee CB, Mokkink LB, Knol DL. Measurements in medicine.
in irritable bowel syndrome. Am J Gastroenterol 1999;94:2447e52. A practical guide. Cambridge University Press; 2011.
[57] Gliklich RE, Taghizadeh F, Winkelman JW. Health status in patients with [84] Starfield B. Primary care: balancing health needs, services, and technol-
disturbed sleep and obstructive sleep apnea. Otolaryngol Head Neck Surg ogy. Cambridge Oxford University Press; 1998.
2000;122:542e6. [85] McHorney CA, Tarlov AR. Individual-patient monitoring in clinical prac-
[58] Grandner MA, Kripke DF, Yoon I, YounGstedt SD. Criterion validity of the tice: are available health status surveys adequate? Qual Life Res 1995;4:
Pittsburgh sleep quality index: investigation in a non-clinical sample. 293e307.
Sleep Biol Rhythms 2006;4:129e39. [86] Nunnally JC. In: Assessment of reliability. Psychometric theory. New York:
[59] Hall MH, Matthews KA, Kravitz HM, Gold EB, Buysse DJ, Bromberger JT, McGraw-Hill Book Co; 1978. p. 225e55.
et al. Race and financial strain are independent correlates of sleep in *[87] FSDA Guidance for Industry. Patient-reported outcome measures: use in
midlife women: the SWAN sleep study. Sleep 2009;32:73e82. medical product development to support labelling claims. Silver Spring,
[60] Hancock P, Larner AJ. Diagnostic utility of the Pittsburgh sleep quality USA: US Department of Health and Human Services, Food and Drug
index in memory clinics. Int J Geriatr Psychiatry 2009;24:1237e41. Administration; 2009.
[61] Knutson KL, Rathouz PJ, Yan LL, Liu K, Lauderdale DS. Stability of the Pitts- [88] Lohr KN, Aaronson NK, Alonso J, Burnam MA, Patrick DL, Perrin EB, et al.
burgh sleep quality index and the Epworth sleepiness questionnaires over 1 Evaluating quality-of-life and health status instruments: development of
year in middle-aged adults: the CARDIA study. Sleep 2006;29:1503e6. scientific review criteria. Clin Ther 1996;18:979e92.
[62] Magee CA, Caputi P, Iverson DC, Huang X. An investigation of the *[89] Scientific Advisory Committee of the Medical Outcomes Trust. Assessing
dimensionality of the Pittsburgh sleep quality index in Australian adults. health status and quality-of-life instruments: attributes and review
Sleep Biol Rhythms 2008;6:222e7. criteria. Qual Life Res 2002;11:193e205.
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 73

[90] McHorney CA, Ware Jr JE, Raczek AE. The MOS 36-item short-form [95] Thomadsen B, Lin SW. Taxonometric guidance for developing quality
health survey (SF-36): II. Psychometric and clinical tests of validity assurance. Int J Radiat Oncol Biol Phys 2008;71(Suppl1):S204e9.
in measuring physical and mental health constructs. Med Care [96] Fayers PM, Hand DJ. Factor analysis, causal indicators, and quality of life.
1993;31:247e63. Qual Life Res 1997;6:139e50.
[91] Schutte-Rodin S, Broch L, Buysse D, Dorsey C, Sateia M. Clinical guideline [97] Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-
for the evaluation and management of chronic insomnia in adults. J Clin analysis. J Wiley & Sons, Ltd; 2009.
Sleep Med 2008;4:487e504. [98] Soderberg E, Alexanderson K. Sickness certification practices of physi-
[92] Marx RG, Bombardier C, Hogg-Johnson S, Wright JG. Clinimetric and cians: a review of the literature. Scand J Public Health 2003;31:460e74.
psychometric strategies for development of a health measurement scale. *[99] Lax MB, Manetti FA, Klein RA. Medical evaluation of work-related illness:
J Clin Epidemiol 1999;52:105e11. evaluations by a treating occupational medicine specialist and by inde-
[93] Wright JG, Feinstein AR. A comparative contrast of clinimetric and psy- pendent medical examiners compared. Int J Occup Environ Health
chometric methods for constructing indexes and rating scales. J Clin 2004;10:1e12.
Epidemiol 1992;42:1201e18. [100] Wolfson AM, Doctor JN, Burns SP. Clinical judgments of functional out-
[94] Browne MW. An overview of analytic rotation in exploratory factor comes: how bias and perceived accuracy affect rating. Arch Phys Med
analysis. Multivar Behav Res 2001;36:111e50. Rehabil 2000;81:1567e74.

S-ar putea să vă placă și