Documente Academic
Documente Profesional
Documente Cultură
CLINICAL REVIEW
a r t i c l e i n f o s u m m a r y
Article history: This review appraises the process of development and the measurement properties of the Pittsburgh
Received 14 September 2014 sleep quality index (PSQI), gauging its potential as a screening tool for sleep dysfunction in non-clinical
Received in revised form and clinical samples; it also compares non-clinical and clinical populations in terms of PSQI scores.
16 January 2015
MEDLINE, Embase, PsycINFO, and HAPI databases were searched. Critical appraisal of studies of mea-
Accepted 26 January 2015
Available online 17 February 2015
surement properties was performed using COSMIN. Of 37 reviewed studies, 22 examined construct
validity, 19 e known-group validity, 15 e internal consistency, and three e test-retest reliability. Study
quality ranged from poor to excellent, with the majority designated fair. Internal consistency, based on
Keywords:
The Pittsburgh sleep quality index
Cronbach's alpha, was good. Discrepancies were observed in factor analytic studies. In non-clinical and
Psychometric properties clinical samples with known differences in sleep quality, the PSQI global scores and all subscale scores,
Sensibility with the exception of sleep disturbance, differed significantly. The best evidence synthesis for the PSQI
Systematic review showed strong reliability and validity, and moderate structural validity in a variety of samples, suggesting
Meta-analysis the tool fulfills its intended utility. A taxonometric analysis can contribute to better understanding of
sleep dysfunction as either a dichotomous or continuous construct.
© 2015 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.smrv.2015.01.009
1087-0792/© 2015 Elsevier Ltd. All rights reserved.
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 53
Measurement need judgment, that is, results are to be influenced neither by the person
doing the measuring nor the person being measured. In subjective
The multifactorial construct of sleep dysfunction causes diag- measurements, both roles can impact the outcome to some extent.
nostic confusion in determining which persons need to be exten- Given that even objective measures have a subjective component,
sively investigated for the etiology of their complaints to be often requiring an expert to read and interpret the measures, many
established. Issues of self-insight and awareness are also important feel that a patient's opinion and appraisal of his or her own status is of
to note e some persons may not be fully aware of their sleep great value [16]. This view is evident in the recent initiative of the US
impairment and will thus not emphasize such issues in the physi- federal government, which seeks a balance between outcomes that
cian's office, or do not appreciate the extent or impact of their sleep are of interest to investigators (i.e., results of laboratory testing, etc.),
problems [13]. As such, the main challenge today for primary care and those of primary interest to the patient (i.e., satisfaction, self-
and specialist clinicians is to identify the patients who may have perceived quality, etc.) in using patient reports of health status [17].
undetected sleep dysfunction and to direct further diagnostic
investigation. A tool for this purpose would be discriminative, ac- Measurement concept of ‘sleep quality’
cording to criteria defined by Kirshner and Guyatt [14] and thus,
should be evaluated by its 1) intra-rater reliability; 2) internal Self-perceived sleep quality represents something of a challenge
consistency and reliability; and 3) construct validity. to measure because there is no generally accepted reference or gold
There are numerous instruments, both subjective and objective, standard [18]. One approach would be to use a carefully constructed
that can be used to measure sleep functioning [15]. In objective questionnaire incorporating the recommendations of the American
measures, the expectation is limited involvement of personal Psychological Association pertaining to clinical sleep dysfunction
54 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73
Fig. 1. Construct of sleep dysfunction. Unidirectional arrows from construct (i.e., circle) to items (i.e., rectangles) represent reflective models, and from items to constructs e
formative models. Bidirectional arrows represent a combination of reflective and formative elements. Adapted from Fayer and Hand (1997) [96].
evaluation [19]. This starts with the main complaints of a patient, Measurement tool
classified into: 1) inability to get adequate nighttime sleep given the
opportunity for sleep (i.e., insomnia), 2) negative daytime conse- The Pittsburgh sleep quality index (PSQI) [22] is the most
quences as a result of poor sleep (e.g., daytime sleepiness, fatigue, commonly used generic measure in clinical and research settings. A
and cognitive impairment), 3) episodic nocturnal movements or search conducted in March of 2014 for PubMed articles containing
behaviors, or 4) a combination of these concerns. “Pittsburgh sleep quality index” as a search term returned in total
Although no specific quantitative sleep parameters define 1512 articles and a growth trend over time, with 141 articles pub-
insomnia disorder [20], an average sleep latency over 30 min, wake lished in 2010 and 323 articles published in 2013. By contrast, a
after sleep onset lasting more than 30 min, sleep efficiency less search for “Leeds sleep evaluation questionnaire” [23] returned 66
than 85%, and/or a total sleep duration of less than six and a half articles in total; “sleep disorder questionnaire” [24] returned 52 ar-
hours are common manifestations, when reported together with ticles, and “medical outcomes study sleep scale” [25] returned a total
difficulty initiating or maintaining sleep, waking too early, or of 32 articles. The PSQI was developed in 1988, with no particular
chronically non-restorative sleep, is considered clinically signifi- clinical population in mind, to: 1) provide a reliable, valid, stan-
cant if occurring three or more nights per week, and suggestive of dardized measure of sleep quality; 2) discriminate “good” and “poor”
chronic insomnia disorder if lasting one or more months [21]. sleepers, and 3) provide an easy index for patients to complete and
Reports of daytime impairment almost invariably accompany the for clinicians and researchers to interpret [22]. Consequently, the
report of inadequate nighttime sleep [18], and these symptoms are developers' targeted concept and purpose conform to our mea-
often the main complaints in patients seeking medical care. In surement need (i.e., discriminate “good” and “poor” sleepers).
many patients, daytime impairment will include excessive sleepi- Given the PSQI's widespread use, a comprehensive review of its
ness, fatigue, low energy, low motivation, and/or cognitive symp- measurement properties is long overdue. Moreover, while indi-
toms related to poor attention, concentration, and memory. vidual papers examining the various aspects of the measurement
Sleep-related movements and/or behaviors are often reported by properties of the PSQI have been published, its applicability to
the spouse or bed partner, as the patient is usually not aware of different clinical and non-clinical groups (i.e., persons with and
episodes such as snoring, twitching or kicking of legs, bruxism (i.e., without medical or psychological conditions, respectively) has not
teeth grinding), sleep walking or talking, or violent behaviors been examined. Therefore, we undertook a systematic review of the
arising from sleep [18]. literature pertaining to the psychometric properties of the PSQI
An ideal screening instrument would incorporate all items with the purpose of: 1) appraising the clinical sensibility of the
relevant to concept of sleep dysfunction, and be able to differentiate instrument, 2) systematically evaluating its psychometric proper-
“good” and “poor” sleepers. Given the issues with self-insight and ties, specifically construct validity and reliability, and 3) summari-
awareness in some persons, the descriptive tool would incorporate zing sex-stratified results pertaining to the PSQI. Finally, this
items for the significant other (i.e., bed partner or caregiver), systematic review featured a meta-analytic component, reporting
including those related to behavioral manifestations in sleep. the weighted mean difference in PSQI global and subscale scores for
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 55
clinical and non-clinical samples. The present work intended to Quality assessment
provide information for both researchers and clinicians on the
PSQI's ability to serve as a descriptive tool for sleep dysfunction in Each measurement element was appraised by two independent
non-clinical and clinical populations, while also identifying pitfalls reviewers (TM and PT), separately, using the COSMIN checklist. The
and providing ideas for future research utilizing the measure. authors met for a calibration review, in which they independently
reviewed one study, then met and discussed each item of the
Methods COSMIN list to clarify its meaning and interpretation. Following
this, the methodological quality of each study was rated across a set
Search strategy of items related to each attribute, independently by the same two
reviewers. In cases of disagreement between the two reviewers, a
In collaboration with a medical information specialist (JB) and third reviewer (KB) performed a separate appraisal, following
utilizing proposed PubMed search filters for finding studies on a which consensus was reached in each case. Each item was assigned
measurement's properties [26], we utilized a comprehensive a rating based on descriptors on a four-point rating scale (i.e., poor,
search strategy to study measure properties of the PSQI. Four fair, good, and excellent) [30]. We took the lowest rating across the
electronic databases, MEDLINE, Embase, PsycINFO, and health and items to assess an overall methodological quality score per mea-
psychosocial instruments (HAPI, or HaPI), were searched. Table S1 surement property; we kept separate ratings for known-group
displays the terms used in searches of each database. validity (Table 1).
Second, the consistency of the level of evidence was summa-
Selection criteria rized. Evidence was designated “strong” when consistent findings
were found in multiple good or at least one excellent quality
All English language peer-reviewed studies found through the study and the total sample size of combined eligible studies was
aforementioned databases and published before January 30, 2014, 100; “moderate” when consistent findings in multiple fair or one
were considered eligible. Only full text original articles focusing on good quality study with a total sample 50, or at least one good
the development or evaluation of the measurement properties of or excellent quality study with a total sample of 50e99; “limited”
an English version of the PSQI were included. Since the PSQI was when findings were found in at least one fair, good or excellent
not developed for a specific population, no restrictions were quality study with total sample size between 25 and 49; and
applied to the populations studied. The reference lists of eligible “unknown” when findings were of indeterminate rating, in
articles were scanned for other relevant publications. Studies studies with poor methodological quality or with a sample of <25
where the primary objective did not include evaluation of the [31].
psychometric properties of the PSQI (e.g., where the PSQI was used
as one independent variable in a multivariable regression model, or
Meta-analysis
where models examined predictors of the PSQI), were excluded.
To compare PSQI global and subscale scores across samples, a
Assessment of item generation, reduction, sensibility
sorting procedure was established to separate reported scores for
non-clinical (i.e., control participants in good health, no medical or
The assessment of the process of development of the PSQI,
psychological conditions) and clinical (i.e., participants reporting a
including item generation, item reduction, and the sensibility of the
medical or psychological problem or condition, or meeting criteria
instrument was performed using Bombardier's framework [27],
for a clinical diagnosis as indicated by a clinical scale or physician's
developed based on the work of Feinstein [28] and Rowe & Oxman
interview) samples. The meta-analytic features of this work were
[29]. We refer the reader to Glossary of terms for the description of
conducted in accordance with the Cochrane handbook for sys-
difficult terminology.
tematic reviews [32]. To obtain summary estimates for mean PSQI
global and subscale scores for exposure and control participants
Measurements properties (reliability and validity)
(i.e., clinical and non-clinical, respectively), the mean, standard
deviation and sample size was recorded; data were then pooled to
Studies of the PSQI's measurement properties were appraised by
obtain weighted means for group comparison; the results were
the “COnsensus-based Standards for the selection of health status
expressed with the weighted mean difference (WMD) and 95%
Measurement INstruments” (COSMIN) checklist [30]. This instru-
confidence interval (CI). Heterogeneity between studies was
ment was developed to evaluate the methodological quality of
examined using the Cochrane Q statistics and Higgins I2 statistics. A
studies on the measurement properties of health-status-related
random effect model was utilized in the presence of statistical
questionnaires.
heterogeneity between studies.
The reliability domain consists of internal consistency, reliability
and measurement error [30]. The construct validity domain was
defined as hypothesis testing, the degree to which the global score Results
garnered by the measure was consistent with the assumption that
the instrument measures the construct it is intended to measure, Search results
covering both convergent or divergent validity, and known-group
validity. A priori hypothesis about the expected direction and Of 1376 articles identified, 50 [9,22,33e80] were selected for
magnitude of the relationship between the PSQI and a comparison full-text review and 37 [46e80] were included in the final
instrument is an important component of validity assessment review (Fig. 2). Fifteen studies evaluated internal consistency
within the COSMIN checklist. [22,48e53,58,61e63,69,70,76e78], 19 evaluated known-group val-
As the purpose of this review is appraisal of the PSQI for idity [9,22,46,49e51,53,56,58e61,64,67e69,71,73,79], and 22 evalu-
descriptive purposes, under the framework of Kirshner and Guyatt ated convergent or divergent construct validity [46,47,49,51e54,
[14], such domains as reliability and construct validity were eval- 57e60,64e68,70,73e77]. No studies evaluating intra-rater reliability
uated. Given the absence of a gold standard for measuring sleep of the PSQI were identified. Tables 2e4 display characteristics of these
dysfunction, the validity criterion was not assessed. studies.
56 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73
Table 1
Methodological quality of included studies assessed by the COSMIN checklist [26].
A. The process of development of the PSQI the same components derived from PSG data, using t-tests and
PCCs. For test-retest reliability, 91 patients completed the PSQI on
The first publication on the PSQI described pilot testing of the two separate occasions (range: 1e265 d, mean ¼ 28.2 d between
instrument in three groups of subjects over 18 mo [22]. Tested were first and second test). Paired t-tests for the global PSQI score, and
“good” sleepers, featuring 52 healthy persons without sleep com- the seven individual component scores, showed no significant
plaints, as controls; “poor” sleepers, 34 subjects with major differences between time-points and the PCCs also demonstrated
depressive disorder, 10 of whom were inpatients; a third group of stability in global and component scores. The authors reported all
“poor” sleepers, who unlike the previous group, featured 62 clinical seven component scores of the PSQI, subjective sleep quality (SQ),
outpatients referred by physicians to a sleep center for a variety of sleep latency (SL), sleep duration (SD), habitual sleep efficiency
sleep/wake complaints (i.e., difficulty initiating and maintaining (HSE), sleep disturbances (SDI), use of sleep medications (SM), and
sleep (n ¼ 45) and excessive daytime sleepiness (n ¼ 17)). Internal daytime dysfunction (DD), showed overall reliability, as indicated
homogeneity of the index items was assessed by Cronbach's alpha by a Cronbach's alpha of 0.83. The authors concluded that each
and corrected itemetotal correlations [81]. Pearson product- component score measured a particular aspect of the same
moment correlation coefficient (i.e., Pearson correlation coeffi- construct of sleep quality [22].
cient (PCC)) was used to determine correlation of component and
item scores with the PSQI global score. B. Assessment of the sensibility of the PSQI
In the primary analysis of construct validity, the developers
assessed the degree to which the index detected differences be- Sensibility is a term proposed by Feinstein that describes com-
tween groups recognized as clinically distinct (i.e., “good” versus mon sense aspects of an instrument [82], of great importance is the
“poor” sleepers). The relevant “gold standard” diagnoses were definition of the construct to be measured [28,29]. The PSQI de-
based on a combination of clinical interviews, structured in- velopers' construct of “sleep quality” was defined based on clinical
terviews, and polysomnographic (PSG) data. An analysis of covari- judgment alone. The target population, featuring “good” and “poor”
ance was used to compare groups in PSQI global and component sleepers, was not involved in the item generation process to pro-
scores, and the Student-Neuman-Keul's technique was used for vide insight on their understanding of “sleep quality”.
pairwise comparisons. Age and sex were used as covariates, due to The PSQI demonstrates reasonable face validity pertaining to
the age/sex ratio differences between groups. The PSQI's estimates elements, which are phrased in a suitable way. While the four
of sleep latency, duration, and sleep efficiency were compared to response categories may not appear optimal for an index with
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 57
Fig. 2. Flow chart documenting process of article selection for review. Embase (1974-30/01/2013); Medline (1946-wk4/01/2014); Medline in process (unt30/01/2014); PsycINFO
(1806-wk3/1/2014); HAPI(1985-01/2014).
discriminative purposes (i.e., where more concrete response cate- The PSQI is simple to implement in clinical practice. The
gories “yes” or “no” are preferable) [14], there is clinical meaning in completion time requires five to 10 min, and so the index can be
the format chosen. To explain, the index covers a period of one administered to a patient and his/her significant other/caregiver in
month, and the response categories “not during the last month’, the waiting room. Questions one to nine cannot be missing, as
“less than once a week”, “once or twice a week”, “three or more scores will be inaccurate. During the field-testing, ten of the 158
times per week”, are clinically relevant to assess the burden/ participants (i.e., 6.3%) failed to complete the index in its entirety
severity of the components relevant to sleep quality. Moreover, the [22]. Later studies reported omission in the range between 5.1% [46]
response category can be translated into a wide enough numeric and 10% [48], falling within the accepted “acceptable” range (i.e., no
range, which can be important for generation of the subscale score greater than 15% of respondents) [83]. The scoring process can
and the total score for a more sensitive discriminative purpose. possibly be viewed as burdensome, in the way of requiring head-
The content validity of the PSQI is appropriate, and the PSQI work on the part of the scorer for calculating subscale scores to be
seems to cover multiple aspects relevant to the sleep quality summed to obtain the global score. Nevertheless, scoring of the
construct. Lack of input from patients lowers sensibility ratings for PSQI can be performed without a calculator in less than 10 min,
content validity, however. making it suitable for use in clinical practice and research [84].
58 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73
The PSQI's specific properties Four studies [48,55,62,78] using factor analysis reported low
factor loading for the SM sub-scale and another reported low factor
The PSQI has been studied in strictly clinical (n ¼ 15) or non- loading for the DD sub-scale [70]. The authors reported that
clinical (n ¼ 13) samples, and both (i.e., featuring non-clinical and improved fit statistics could be obtained after removal of these
clinical groups of participants within same study) (n ¼ 9) items.
(Tables 2e4). The samples represented a variety of clinical disor-
ders, including cancer [35,36,50,72], schizophrenia, and chronic Reliability: internal consistency
fatigue syndrome [63,68], as well as healthy participants of varying Cronbach's alpha coefficient was reported in 12 studies
ages [40,41,49,51], sexes [46e80], and races and ethnicities [49,78]. (Table 2). For all studies but three [62,63,77], these values met the
cut-point for a positive rating for within- and between-group
Reliability comparisons (i.e., 0.70 [85]), ranging from 0.70 [76] to 0.83 [22].
For quality assessment of reviewed studies, we refer the reader No studies reported Cronbach's alpha within the ideal range for use
to Table 1. in individual patients (i.e., 0.9e0.95) [86]. The three studies that
With focus on the discriminative ability of the PSQI, the central reported Cronbach's alphas below 0.70 featured patients with
measurement properties determinative of its applicability are chronic fatigue syndrome (a ¼ 0.64) [63] and non-clinical samples
reliability and construct validity. The reliability domain consists of (a ¼ 0.67 [62] and 0.69 [77]).
internal consistency, reliability and measurement error. Internal
consistency is defined by the COSMIN [30] panel as the degree of Testeretest reliability
interrelatedness among the items, given the PSQI developers' claim See Table 2 for specifics of the PSQI was evaluated in three
of the instrument's unidimensionality. In this case, assessment of studies [22,61,69]. One study [69] reported the intraclass correla-
internal consistency did not follow factor analysis, and thus uni- tion coefficient (ICC), the preferred statistic, another reported PCC
dimensionality of the PSQI was not established. Reliability param- [22], and a third reported both statistics [61]. The study by Rener-
eters assess how well patients can be distinguished from each Sitar et al. featured patients with temporomandibular disorder,
other, while the related but distinct parameter of measurement reporting an ICC of 0.86 [69] for a period of two weeks between
error assesses the magnitude of the error present in the measured test-retest. Buysse et al. reported a PCC of 0.82 for their sample
differences. Both of the aforementioned properties are of great including healthy, depressed and sleep disordered persons, with all
value to clinicians, as they provide information on the proportion of but the depressed group showing no significant differences be-
total variance in the measurements that comprises “true” differ- tween test and retest, a mean period of 28 d [22]. Knutson et al.
ences between patients, commonly assessed by test-retest reli- reported statistics for a test-retest period of one year [61]. The ICC
ability. The COSMIN checklist requires that: 1) the two tests are was 0.81 for the population-based sample of early middle-aged
administered independently; 2) the underlying concept measured adults, with 0.79 and 0.83 for white and black women, respec-
is consistent throughout measurements; and 3) the time between tively, and 0.70 and 0.83 for black and white men, respectively [61].
tests is appropriate. The hypothesis testing construct validity For the same sample, a PCC of 0.68 was reported. All ICC reports met
domain of the COSMIN checklist was determined for the PSQI. the required cut-point for the groups (i.e., 0.70) [88], but not for
Hypothesis testing assesses the degree to which the scores of an individual patients (i.e., >0.90) [85]. See Table 1 for specifics.
instrument are consistent with the assumption that the instrument
validly measures the construct intended for measurement. Construct validity
Convergent, divergent, and known-group validity were assessed for
the PSQI. The COSMIN checklist assesses presence of a priori hy- Tables 3 and 4 summarize the evidence for convergent and
pothesis about the expected direction and magnitude between the divergent construct validity, and known-group validity, respec-
compared instruments, and an adequate description of the prop- tively. Absolute value correlations above 0.70 were considered
erties of the comparator tool [30]. strong, between 0.3 and 0.7 e moderate and less than 0.3 e weak
[89]. These are equivalent, in variance terms, to shared variance of
Factor structure >50%, 10e50%, and <10%, respectively [90].
Several studies examining unidimensionality of the PSQI raised
concerns over the factor structure of the instrument (Table 5). In The convergent construct validity
studies using factor analysis, eight [48,54,55,62,63,70,72,78] out of Studies found the PSQI was of value in screening for insomnia
eleven studies reported that a single-factor model fit the data according to the fourth edition of the Diagnostic and Statistical
poorly, and the PSQI is better represented in a two- or three-factor Manual of Mental Disorders (DSM-IV) [19], and the International
model. Different factor structure models were proposed. For Classification of Sleep Disorders (ICSD-2), revised criteria [91], with
example, Cole et al. proposed that the seven PSQI sub-scales are the best cut-off for students at five, with sensitivity and specificity
best represented by three factors e sleep efficiency, including the SD at 72% and 55%, respectively [46]; greater than six for patients with
and HSE PSQI sub-scales, perceived sleep quality, including SQ, SL, low back pain, with sensitivity of 100% and specificity of 49% [47];
and SM sub-scales, and daily disturbance, featuring the SDI and DD and greater than or equal to eight for TBI patients, with sensitivity
sub-scales of the PSQI [55]. At the same time, Cole et al. also re- of 100% and specificity of 83% [80].
ported a two-factor model fit for the data: sleep efficiency with A strong association was uncovered between the PSQI total
strong loading from HSE (i.e., 0.89) and SD (i.e., 0.60) and perceived score and the insomnia severity index (ISI) total score (r ¼ 0.80)
sleep quality, with strong loading from daytime sleepiness (i.e., [66], sleep problems from symptom experience reports
0.55) and SQ (i.e., 0.77), while SM showed poor loading on both (r ¼ 0.72e0.77) [53], short form-36 health survey vitality score
factors [55]. While these models were both later replicated by (r ¼ 0.74e0.77) [53,70], sleep restlessness score (r ¼ 0.72e0.77)
others, using exploratory and confirmatory factor analyses (EFAs [53], and sleep efficiency score from the sleep diary (r ¼ 0.76)
and CFAs, respectively), still others proposed their own two- or- [58]. A moderate association was found between the PSQI and
three factors models [72,76], endorsing the multidimensionality of disability scores (r ¼ 0.31e0.58) [49,67], depression (r ¼ 0.50) [53],
the PSQI. This again was supported by Aloba et al.'s findings uti- tension/anxiety (r ¼ 0.36e0.62) [53], and confusion (r ¼ 0.45e0.46)
lizing principal component analysis [46]. [53]. There was evidence for associations between the PSQI and PSG
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 59
Table 2
Reported measurements properties of PSQI: internal consistency and testeretest reliability.
Table 2 (continued )
Abbreviations: BBP, benign breast problems; BC, breast cancer; BMT, bone marrow transplant; CPS, chronic fatigue syndrome; DD, daytime dysfunction; DIMS, disorders
of initiating and maintaining sleep; DOES, disorders of excessive somnolence; GAD, generalized anxiety disorder; HSE, habitual sleep efficiency; ICC, interclass correlation
coefficient; NA, not applicable; ns, not significant; PSQI, Pittsburgh sleep quality index; PTSD, posttraumatic stress disorder; r(s), correlation(s); RA, rheumatoid arthritis;
RCT, randomized controlled trial; RT, renal transplant; SD, sleep duration; SDI, sleep disturbance; SEM, structural equation modeling; SL, sleep latency; SM, sleep
medications; SQ, sleep quality; TMD, temporomandibular disorder; w/, with.
data variables. Studies reported non-significant or low correlation diagnostic criteria by the DSM-IV or by the ICSD-2 from those who
coefficients ranging from 0.11 to 0.3 for the apnea hypopnea index do not [46,47,80]. To optimize the sensitivity-specificity balance,
[49,57] and 0.21 for the number of oxygen desaturation events [57]. the cut-off scores were as follows: five for students (sensitivity 72%,
Moderate associations were reported between the PSQI global specificity 54.5%) [46]; >6 for middle age adults with low back pain
score and PSG sleep maintenance (rho ¼ 0.33), sleep efficiency (sensitivity ¼ 100%, specificity ¼ 49%) [47]; 8 in post-acute adults
(rho ¼ 0.34), and microarousal index in younger (rho ¼ 0.39), but with TBI of varying severities (sensitivity 83%, specificity ¼ 100%)
not in older healthy subjects [73]. The reported associations be- [80]. Scarlata et al. reported that the PSQI global score 5 was able
tween the PSQI and variables derived from actigraphy data were to differentiate patients with obstructive sleep apnea syndrome
variable, with only some researchers reporting significant findings. diagnoses in a consecutive sample of adults at high risk for the
We refer the reader to Table 4 for specifics. syndrome (sensitivity ¼ 69.7%, specificity ¼ 31%) [75].
Table 3
Reported measurements properties of PSQI: known-group validity.
Table 3 (continued )
Abbreviations: AHI, apnea-hypopnea index; BBP, benign breast problems; BC, breast cancer; BMT, bone marrow transplant; COPD, chronic obstructive pulmonary disease;
DIMS, disorders of initiating and maintaining sleep; DOES, disorders of excessive somnolence; DSM-IV, diagnostic and statistical manual of mental disorders, 4th edition; EDS,
Ehlers Danlos syndrome; GAD, generalized anxiety disorder; HAI, hypersomnolent with abnormalities; IQR, interquartile range; NA, not applicable; NH, non-hypersomnolent;
NS, not significant; PLMI, periodic leg movement index; PSG, polysomnography; PSQI, Pittsburgh sleep quality index; PTH, posttraumatic hypersomnia; RT, renal transplant;
TBI, traumatic brain injury; TMD, temporomandibular disorder.
*
principal: GAD diagnosis most severe diagnosis; co-principal: GAD diagnosis is one of the two most severe diagnoses
of patients at risk for disorders (i.e., PSQI “good quality sleep” (i.e., scores for both groups. The clinical samples combined equated
5) and ESS “normal score” (i.e., 10); PSQI “good quality sleep” and to 801 individuals, and the non-clinical samples comprised 3433
ESS “sleepy” (i.e., >10); PSQI “poor quality sleep” (i.e., >5) and ESS persons. The results revealed a significantly higher mean global
“normal score”; PSQI “poor quality sleep” (i.e., >5) and ESS “sleepy”). PSQI score in clinical versus non-clinical persons, utilizing a
random effect model (WMD ¼ 4.74; 95% CI 3.43e6.06,
Meta-analysis of known-group validity p < 0.0001; I2 93%) (Fig. 3).
The meta-analytic component of this review was performed Six of the aforementioned studies also provided subscale scores
in accordance with the Cochrane Handbook for Systematic [22,46,51,56,69,79]. The clinical samples combined comprised 538
Reviews [32] with the purpose of identifying possible differ- individuals, and the non-clinical samples e 745 individuals. Anal-
ences in PSQI global and subscale scores between clinical and ysis revealed statistically significant differences in scores for all
non-clinical samples (Tables S2a and S2b for raw data). Seven sleep quality: WMD ¼ 0.74 (95% CI 0.34e1.14), sleep duration:
studies [22,46,49,51,56,68,79], six of fair quality WMD ¼ 0.48 (95% CI 0.21e0.74), habitual sleep latency:
[22,46,49,51,56,79] and one of good quality [68], provided global WMD ¼ 0.60 (95% CI 0.13e1.07), sleep medications: WMD ¼ 0.51
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 63
Table 4
Reported measurement properties of the PSQI: hypothesis testing (validity).
Table 4 (continued )
Casement et al., 2012 [54] Women w/PTSD related C: Correlation/p value order: CAPS, BDI, STAXI trait anger, PILL
to sexual or physical CAPS PSQI global score: r ¼ 0.48, p < 0.05, r ¼ 0.41, p < 0.05, NS, r ¼ 0.43, p < 0.05
assault (n ¼ 319) BDI PSQI daily disturbances: r ¼ 0.37, p < 0.05, r ¼ 0.47, p < 0.05, NS, r ¼ 0.53,
PILL p < 0.05
PSQI perceived sleep quality: r ¼ 0.40, p < 0.05, r ¼ 0.34, p < 0.05, NS,
D: r ¼ 0.34, p < 0.05
PSQI sleep efficiency: r ¼ 0.36, p < 0.05, r ¼ 0.22, p < 0.05, NS, r ¼ 0.20,
STAXI trait anger p < 0.05
Gliklich et al., 2014 [57] Consecutive patients C: Correlation/p value order: PSQI SQ, SL, SD, HSE, SDI, SM, DD, global
w/sleep disturbances PSG (RDI,#O2 desaturations RDI: r ¼ 0.109, p ¼ 0.024, NS, NS, NS, r ¼ 0.143, p ¼ 0.005, NS, NS, r ¼ 0.114,
requiring PSG (n ¼ 435) <85%) p ¼ 0.048
D: e #O2 desaturations <85%: r ¼ 0.143, p ¼ 0.004, NS, r ¼ 0.149, p ¼ 0.002,
r ¼ 0.015, p ¼ 0.015, r ¼ 0.161, p ¼ 0.002, NS, r ¼ 0.099, p ¼ 0.049, r ¼ 0.214,
p ¼ 0.0001, r ¼ 0.190, p ¼ 0.0001, r ¼ 0.206, p ¼ 0.0003
Grandner et al., 2006 [58] Non-clinical younger C: Correlation/p value order: PSQI global, SQ, SL, SD, HSE, SDI, SM, DD
(n ¼ 53) and older Actigraphy (sleep efficiency, total Younger group:
(n ¼ 59) adults sleep time, WASO, sleep latency) Actigraphy: sleep efficiency: all NS; total sleep time: NS, NS, r ¼ 0.303,
Sleep diary (sleep efficiency, total p < 0.05, r ¼ 0.275, p < 0.05, NS, NS, NS, NS; WASO: all NS; sleep latency:
sleep time, WASO, sleep latency) all NS
CESD Sleep diary: sleep efficiency: NS, NS, NS, NS, NS, NS, NS, r ¼ 0.322,
D: e p < 0.01; total sleep time: NS, NS, NS, r ¼ 0.331, p < 0.001, NS, NS, NS, NS;
WASO: all NS; sleep latency: r ¼ 0.349, p < 0.01, NS, r ¼ 0.432, p < 0.01, NS,
NS, r ¼ 0.367, p < 0.01, NS, NS
CESD total: all NS
Older group:
Actigraphy: sleep efficiency: all NS; total sleep time: all NS; WASO: all NS;
sleep latency: all NS
Sleep diary: sleep efficiency: r ¼ 0.764, r ¼ 0.707, r ¼ 0.644,
r ¼ 0.631, r ¼ 0.789, all p < 0.01, NS, NS, r ¼ 0.430, p < 0.01; total sleep
time: r ¼ 0.548, r ¼ 0.581, r ¼ 0.417, r ¼ 0.581, r ¼ 0.642, all
p < 0.01, NS, NS, r ¼ 0.329, p < 0.01; WASO: r ¼ 0.561, r ¼ 0.542, r ¼ 0.555,
r ¼ 0.530, r ¼ 0.611, all p < 0.01, NS, NS, NS; sleep latency: r ¼ 0.557,
p < 0.01, r ¼ 0.511, p < 0.01, r ¼ 0.560, p < 0.01, r ¼ 0.464, p < 0.01, r ¼ 0.495,
p < 0.01, NS, NS, r ¼ 0.312, p < 0.05
CESD total: r ¼ 0.364, p < 0.01, r ¼ 0.281, p < 0.05, r ¼ 0.396, p < 0.01,
r ¼ 0.331, p < 0.05, NS, NS, NS, r ¼ 0.421, p < 0.01
Combined:
Actigraphy: sleep efficiency: all NS, total sleep time: NS, NS, r ¼ 0.275,
p < 0.01, r ¼ 0.204, p < 0.05, NS, NS, NS, NS; WASO: all NS, sleep latency: all
NS
Sleep diary: sleep efficiency: r ¼ 0.562, r ¼ 0.432, r ¼ 0.378,
r ¼ 0.473, r ¼ 0.563, all p < 0.01, NS, r ¼ 0.199, p < 0.05, r ¼ 0.307,
p < 0.01; total sleep time: r ¼ 0.307, p < 0.01, r ¼ 0.240, p < 0.05, NS.
r ¼ 0.454, p < 0.01, r ¼ 0.411, p < 0.01, NS, NS, NS; WASO: r ¼ 0.262,
p < 0.01, r ¼ 0.210, p < 0.05, r ¼ 0.241, p < 0.05, r ¼ 0.235, p < 0.05, r ¼ 0.260,
p < 0.01, NS, NS, NS; sleep latency: r ¼ 0.480, r ¼ 0.354, r ¼ 0.488,
r ¼ 0.0.275, all p < 0.01, r ¼ 0.242, p < 0.05, NS, NS, r ¼ 0.206, p < 0.05
CESD total: r ¼ 0.305, p < 0.01, NS, r ¼ 0.193, p < 0.05, r ¼ 0.205, p < 0.05, NS,
NS, NS, r ¼ 0.317, p < 0.01
Hancock & Patients w/dementia C: ROC curve analysis: AUC ¼ 0.64, (95% CI 0.58e0.70); best cut-off for
Larner, 2009 [60] diagnosis by DSM-IV Dementia by DSM-IV dementia 8 (sensitivity ¼ 0.79 (0.73e0.86), specificity ¼ 0.41 (0.33e0.48),
(n ¼ 155) and w/o D: e LR þve ¼ 1.33 (1.15e1.56), LR ve ¼ 0.51 (0.44e0.59), test accuracy ¼ 0.60
(n ¼ 155) (0.56e0.64))
Masel et al., 2001 [64] Adults (N ¼ 71) w/brain C: Correlations (Spearman's rho):
injuries: NH (n ¼ 38); MSLT (sleep latency) PSQI vs.:
PTH (n ¼ 2); HAI D: e MSLT (sleep latency): NS
(n ¼ 12)
Mondal et al., 2013 [65] Patients (n ¼ 236) C: Correlations (Spearman's rho):
undergoing overnight ESS PSQI vs.:
PSG D: e ESS: r ¼ 0.13, p ¼ 0.5
Morin et al., 2011 [66] Community sample C: Correlations (Pearson's r):
(n ¼ 959) ISS PSQI vs.:
D: e ISS: r ¼ 0.80, p < 0.5
Neau et al., 2012 [67] Patients w MS (n ¼ 205) C: Correlation ManneWhitney/p value order: depression (ESDD), HADS,
Depression (EDSS) FMGPQ, spasticity (Ashworth), bladder dysfunction (ESDD) vs
HADS PSQI global score: NR/p < 0001, NR/p < 0001, NR/p < 0001, NR/NS, NR/NS
FMGPQ
D:
Spasticity (Ashworth)
Bladder dysfunction (EDSS)
Neu et al., 2007 [68] Patients w CFS (n ¼ 28) C: Correlations (ManneWhitney's r):
PSG MAI PSQI global score/subscales vs.:
D: e PSG MAI: r ¼ 0.195, p ¼ NS
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 65
Table 4 (continued )
Abbreviations: btw, between; DD, daytime dysfunction; DSM-IV, diagnostic and statistical manual of mental disorders, 4th edition; EDSS, extended disability status scale; ESS,
Epworth sleepiness scale; FLIC, functional living index for cancer patients; FMGPQ, French version of the McGill pain questionnaire; GAD, generalized anxiety disorder; GDS,
geriatric depression scale; GDS, global disability scale; GHQ-12, global health questionnaire; HADS, hospital anxiety and depression scale; HAI, hypersomnolent with abnormal
indices; HRSD, Hamilton rating scale for depression; IADL, instrumental activities of daily living; ICSD-2, international classification of sleep disorders, 2nd edition; ISI,
insomnia severity index; LOT, life orientation test; LR, likelihood ratio; MADRS, Montgomery and Asberg depression rating scale; MAI, microarousal index; MPI,
multidimensional pain inventory; MS, multiple sclerosis; MSLT, multiple sleep latency test; MSPSS, multidimensional scale of perceived social support; NH, non-
hypersomnolent; NPV, negative predictive value; NR, not reported; NS, not significant; OSAS, obstructive sleep apnea syndrome; PANSS, positive and negative syndrome
scale; PILL, Pennebaker inventory of limbic languidness; POMS, profile of mood states; PPV, positive predictive value; PSG, polysomnography; PSQI, Pittsburgh sleep
quality index; PSWQ, Penn State worry questionnaire; PTH, posttraumatic hypersomnia; PTSD, posttraumatic stress disorder; Q-LES-Q, quality of life enjoyment and
satisfaction questionnaire; RA, rheumatoid arthritis; RDI, respiratory disturbance index; ROC, receiver operating characteristic; RT, renal transplant; SD, sleep duration;
SM, sleep medications; SDI, sleep disturbance; SE, sleep efficiency; SEAS, sleep, energy and appetite scale; SER, symptom experience report; SF-12, short form (12) health
survey; SF-36, short form (36) health survey; SL, sleep lantency; SOL, sleep-onset latency; SQ, sleep quality; STAXI, state-trait anger expression index; TBDI, Talbieg brief
distress inventory; TST, total sleep time; w/, with; WASO, wake after sleep onset.
(95% CI 0.1e0.92), daytime dysfunction: WMD ¼ 0.89 (95% CI quality, discriminating “good” and “poor” sleepers, and in clinical
0.54e1.24) but the sleep disturbance subscale (WMD ¼ 0.25, 95% assessment of a variety of sleep disturbances. Interestingly, the PSQI,
CI 0.26e0.77) between non-clinical and clinical samples conceptualized and developed as a clinimetric measure (i.e., aimed
(Tables S3). for all items to measure a particular aspect of a complex clinical
construct in the absence of a gold standard for said construct;
Discussion emphasis on heterogeneity or “clinical phenomena”) [28], had its
properties subsequently evaluated and tested as a measure devel-
Sensibility oped by a psychometric strategy (i.e., all items measure a particular
construct or aspect of a construct; emphasis on homogeneity) [94].
Since its development, the PSQI has been widely used in research In adopting either approach, according to the practical guide-
and clinical practice, providing information on a respondent's sleep lines for the development of a measurement tool [83,92,93],
66 T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73
Fig. 3. Results of random effects analysis of the PSQI global scores, comparing clinical and non-clinical samples.
measurements in medicine should be performed using the most formulating independent hypotheses, specifically if the instru-
adequate method. When constructing the PSQI, developers ment is applied in another target population, another language,
acknowledged the clinical construct of sleep quality as a “complex or by another means of administration (i.e., interview vs self-
phenomenon that is difficult to define and measure objectively” report) [87].
[22], and declared that, from a clinical perspective, the concept of While the PSQI was developed with no specific population in
“sleep quality” includes quantitative aspects of sleep (i.e., sleep mind, it has been used and validated in a variety of populations
duration, sleep latency, number of arousals), as well as purely other than those to whom the instrument was administered in
subjective (i.e., self-perceived) aspects such as “depth” or “restful- the original publication. This may explain the conflicting results
ness” of sleep, where the exact elements that compose sleep we observed in the factor analytic research: different two- and
quality, and their relative importance can vary between individuals. three-factors models for the scoring structure of the PSQI were
Although the sensibility of the PSQI has not been formally proposed, and item #7 (i.e., during the past month, how often
evaluated previously, its features are important criteria for deter- have you taken medicine to help you sleep (prescribed or “over
mining the success or failure of a clinical tool and should precede the counter”)?) loaded poorly onto various factor structures, with
any psychometric evaluation [83]. As we have shown, the PSQI is removal of this item improving the fit in several studies, but not
currently the most commonly used generic sleep measure in clin- all (Table 5). The research varied in choice of sample (i.e., uni-
ical and research settings, indirectly supporting the sensibility of versity students, military veterans with posttraumatic stress
the instrument. Our structural evaluation of the sensibility using disorder, community dwelling depressed and non-depressed
the Bombardier framework supported the view of the PSQI as a adults, patients with chronic fatigue syndrome, etc.). These
sensible clinical index, which warranted all of the observed sub- groups have highly variable characteristics in terms of societal
sequent efforts that were applied to describe its psychometric stressors, medical pathology, sleep medication use, pain, etc.
properties in various clinical and non-clinical settings. Therefore it is reasonable to expect that the PSQI might function
differently in different populations and settings. The caveat to
Construct validity and reliability this point is that a “poor” sleeper as defined by the PSQI, may not
share the same set of symptoms with another “poor” sleeper.
The COSMIN panel defines validity as “the degree to which an Moreover, high levels of a symptom item may imply that a person
instrument truly measures the construct(s) it purports to measure” is a “poor” sleeper, and low symptom levels may indicate “good”
[30]. Dealing with unobservable constructs such as sleep quality sleep; in fact, this may not be the case. Given the results were not
makes it difficult to determine exactly what the tool measures. consistent, and taking into account the various populations
Construct validity applies in situations where there is no gold tested, there appears limited value in applying specific changes to
standard, and refers to whether the instrument provides the scores the factor structure of the original instrument (Table 5). Fayer and
that are expected based on existing knowledge about the construct Hand have shown the inadequacy of factor analysis in analyzing
[83]. models including a mixture of causal and effectual indicators,
pointing to the minimal importance of internal consistency for
Internal consistency, dimensionality and factor analytic research subscales and irrelevance of construct validity in testing for ho-
Cronbach's alphas ranged from 0.70 to 0.83, meeting the cut- mogeneity [96]. They proposed broad indicator coverage for
point for a positive rating for within- and between-group com- clinical indices which can be achieved by asking patients “Is there
parisons (i.e., 0.70). No studies reported Cronbach's alpha within anything else which has caused your problem?”. The PSQI in-
the ideal range for use in individual patients (i.e., 0.9e0.95). Three cludes such an item. Moreover, important issues such as lack of
studies, featuring patients with chronic fatigue syndrome and a self-insight and awareness in some persons can lead to under-
non-clinical sample, reported Cronbach's alphas below 0.70. The reporting of sleep issues, in which case the responses of signifi-
results indirectly support the notion of sleep quality as being based cant others can guide clinical decision making. Therefore, where
on both reflective and formative models (Fig. 1). clinicians are working one-on-one with individuals with the aim
Knowledge of the sleep quality construct under study and a of directing further investigation or intervention, focus on the
priori hypotheses on its relationship to other constructs in a individual items of an instrument is key. In research, if the
given population is crucial for a rigorous validation of the PSQI in sample size is sufficiently large, the mean PSQI global score will
future studies. Even more, each individual dimension measured provide a sufficient estimate for sleep quality in a given popu-
in the PSQI (i.e., subscale) should be validated separately by lation. De Vet et al. provide theoretical and practical points of
Table 5
Reported measurements properties of PSQI: structural validity.
Population Methods 1- 2- 3-
Aloba et al. 2006 [46] University students (n ¼ 520) Factor analysis: NA NA 1st factor: SQ (0.587), SL (0.443),
Eigenvalues >0.4 considered HSE (0.487), SDI (0.642), SM
as loading on a factor (0.562), DD (0.671)
All tests two-tailed 2nd factors: SD (0.832), SDI
Level of significance at (0.421)*
p < 0.05 3rd factors: SQ (0.560), SM (0.561),
HSE (0.518)*
*Variance not reported
Babson et al. 2012 [48] Military veterans w PTSD EFA: Removal of SM factor: poor Removal of SM factor: good model Removal of SM factor: 3-factor
(n ¼ 226) Fit of 1-,2-, and 3-factor model fit for 1-factor solution* fit for a 2- factor solution* solution did not converge*
models *All factor loading from a Х2 (52) ¼ 74.62 p ¼ 0.08, CFI ¼ 0.98, *All factor loading from a Geomin
Factor structure replication in Geomin rotation TLI ¼ 0.98, RMSEA ¼ 0.03, rotation
randomly split samples SRMR ¼ 0.04.
(n ¼ 111), (n ¼ 115) *All factor loading from a Geomin
rotation
67
Table 5 (continued )
68
Study Structural validity Results: Factor Models
Population Methods 1- 2- 3-
Maximum Likelihood
Algorithm for model's fit
Magee et al. 2008 [62] N ¼ 364, adults EFA and CFA 1-factor: poor fit (Х2 ¼ 26.44, 2 efactors from EFA: good fit Cole et al.'s 3-factor model:
EFA: d.f. ¼ 14, p < 0.05; GFI ¼ 0.96, (Х2 ¼ 16.84, d.f. ¼ 13; GFI ¼ 0.97, acceptable fit, similar to 2efactors
to identify factor structure AGFI ¼ 0.92, CFI ¼ 0.94; AGFI ¼ 0.94, CFI ¼ 0.98; from EFA (Х2 ¼ 14.20, d.f. ¼ 11;
CFA: RMSEA ¼ 0.064) RMSEA ¼ 0.04) GFI ¼ 0.98, AGFI ¼ 0.94, CFI ¼ 0.98;
to test 2-factor model from 1-factor w SM removed: poor fit 2-factors from EFA w/SM removed- RMSEA ¼ 0.04)
EFA; single factor structure, (Х2 ¼ 21.82, d.f. ¼ 9, p < 0.05; no improvement in model fit Cole et al.'s [50] 3-factor model w
3-factor structure by Cole GFI ¼ 0.96, AGFI ¼ 0.91, indices (Х2 ¼ 12.57, d.f. ¼ 8; SM removed: all factor loadings
et al.'s CFI ¼ 0.93; RMSEA ¼ 0.09) GFI ¼ 0.98, AGFI ¼ 0.94, CFI ¼ 0.98; significant, all goodness-of-fit
Maximum Likelihood RMSEA ¼ 0.06) acceptable, a little weaker
Algorithm used to investigate (Х2 ¼ 10.42, d.f. ¼ 6, GFI ¼ 0.98,
model's fit AGFI ¼ 0.93, CFI ¼ 0.98;
RMSEA ¼ 0.06)
Nicassio et al. 2014 [70] N ¼ 107, patients w RA CFA to evaluate 1-, 2-, and 3- 1-factor: poor fit (Х2 ¼ 19.88, 2-factor: satisfactory fit; internal 3-factor: best fit, but Chronbach's
factor(s) solution d.f. ¼ 9, p ¼ 0.019; Х2/ consistency Chronbach's a ¼ 0.70 a ¼ 0.58 and DD factor
dysfunction; DIMS, disorders of initiating and maintaining sleep; DOES, disorders of excessive somnolence; ECVI, expected cross validation index; EFA, evaluative factor analysis; GFI, goodness-of-fit index; HSE, habitual sleep
efficiency; NA, not applicable; NR, not reported; PTSD, posttraumatic stress disorder; RA, rheumatoid arthritis; RMSEA, root mean square error of approximation; SD, sleep duration; SDI, sleep disturbances; SEM, structural
Abbreviations: AGFI, adjusted goodness-of-fit index; BC, breast cancer; CAIS, consistent Akaike information criterion; CFA, confirmatory factor analysis; CFI, comparable fit index; CFS, chronic fatigue syndrome; DD, daytime
equation modeling; SL, sleep latency; SM, sleep medications; SQ, sleep quality; SRMR, standardized root mean square residual; TLI, Tucker-Lewis index; TMD, temporomandibular disorder; WRMR, weighted root mean squared
view in discussing scoring of multidimensional instruments,
non-Hispanic whites:
regards to a clinical setting, it is important to know which item or
subscale is most affected.
model: good fit
improved fit
approach for a researcher looking to quantify a population's sleep
English:
English:
quality, or compare populations, as initially proposed by Dr.
Buysse's group [22], and to discriminate “good” and “poor”
sleepers, while in the clinical setting, focusing on individual items
within the PSQI to detect attributes of poor sleep quality necessary
T2: Х2 ¼ 18.2, d.f. ¼ 7; CFI ¼ 0.98;
per the DSM-IV and ICSD-2, is obvious, and individual items can
1-factor model (7 subscales):
SMRM ¼ 0.053)
from the model
descriptively)
1-factor structure
lying conceptual model or on the data from the literature [83]. Our
et al. tested
responses to a test administered by different raters on different number of studies, there is tendency to draw conclusions, usually
occasions, a short period during which no true changes to the ad hoc summaries, which can often be misleading [97]. Therefore, a
measured construct can occur [30]. Such data was not provided in meta-analysis with known limitations may nevertheless be pref-
the reviewed studies. Repeated measurements of any construct erable to an ad hoc summary.
may differ due to day-to-day variation, the instrument used, the
persons administering the measure, or the circumstances under Conclusion
which the measurements are taken [83,95]. All sources of variation
play a role in test-retest reliability. There is limited evidence on The PSQI is currently the only standardized clinical instrument
test-retest for the PSQI e while three studies have shown stability that covers a broad range of indicators relevant to sleep quality.
in measured sleep quality dimensions between test and retest, it is Items pertaining to circadian rhythm disorders and medication
still unclear what the appropriate length of time between test and effects other than those by sleep aids, although not covered, may be
retest is. inferred based on analysis of data from available items, together
with a detailed clinical history of the patient. We found strong
Sex differences in PSQI positive evidence for reliability and validity (hypothesis testing),
and moderate positive evidence for structural validity testing in a
Lack of consistency in sex differences between studies can be variety of non-clinical and clinical samples. While the internal
attributed to a variety of factors, the individual impacts of each a consistency of the PSQI did not reach the level recommended for
challenge to interpret. To determine whether differences in sleep individual level comparison, it is worth noting that, to date, no
quality between study samples was attributable to age, general study of agreement among clinicians in assessing quality of sleep in
health, clinical disorders, psychosocial stressors, cultural differ- a patient has been conducted [98]. It would not be a surprise
ences, sex, or a combination, is a complex task. At this time the however, if the agreement is low, given the limited knowledge
nature and implications of conflicting results have not been about sleep function; such differences of opinion have been
adequately explored given limited information available. Never- demonstrated [98e100]. Thus, the utility of a standardized tool
theless, considering extensive evidence on morphologic differences such as the PSQI holds great potential for clinical practice, as the
between sexes in circadian clock genes, respiratory control, stress agreement is potentially much higher and the findings more
responses and the action of sex hormones on sleep mechanisms, consistent, specifically if progress is made on focusing on individual
sleep quality is likely to be influenced by sex. items and reports of a significant other.
Meta-analysis
[35] Sandadi S, Frasure HE, Broderick MJ, Waggoner SE, Miller J, von [63] Mariman A, Vogelaers D, Hanoulle I, Delesie L, Tobback E, Pevernagie D.
Gruenigen VE. The effect of sleep disturbance on quality of life in women Validation of the three-factor model of the PSQI in a large sample of
with ovarian cancer. Gynecol Oncol 2011;123:351e5. chronic fatigue syndrome (CFS) patients. J Psychosom Res 2012;72:
[36] Sanford SD, Wagner LI, Beaumont JL, Butt Z, Sweet JJ, Cella D. Longi- 111e3.
tudinal prospective assessment of sleep quality: before, during, and [64] Masel BE, Scheibel RS, Kimbark T, Kuna ST. Excessive daytime sleepi-
after adjuvant chemotherapy for breast cancer. Support Care Cancer ness in adults with brain injuries. Arch Phys Med Rehabil 2001;82:
2013;21:959e67. 1526e32.
[37] Soehner AM, Kennedy KS, Monk TH. Circadian preference and sleep-wake [65] Mondal P, Gjevre JA, Taylor-Gjevre RM, Lim HJ. Relationship between the
regularity: associations with self-report sleep parameters in daytime- Pittsburgh sleep quality index and the Epworth sleepiness scale in a sleep
working adults. Chronobiol Int 2011;28:802e9. laboratory referral population. Nat Sci Sleep 2013;5:15e21.
[38] Toor P, Kim K, Buffington CK. Sleep quality and duration before and after [66] Morin CM, Belleville G, Belanger L, Ivers H. The insomnia severity index:
bariatric surgery. Obes Surg 2012;22:890e5. psychometric indicators to detect insomnia cases and evaluate treatment
[39] Sharkey KM, Kurth ME, Anderson BJ, Corso RP, Millman RP, Stein MD. response. Sleep 2011;34:601e8.
Assessing sleep in opioid dependence: a comparison of subjective ratings, [67] Neau J, Paaquereau J, Auche V, Mathis S, Godeneche G, Ciron J, et al. Sleep
sleep diaries, and home polysomnography in methadone maintenance disorders and multiple sclerosis: a clinical and polysomnography study.
patients. Drug Alcohol Depen 2011;11:245e8. Eur Neurol 2012;68:8e15.
[40] Cross NE, Lagopoulos J, Duffy SL, Cockayne NL, Hickie IB, Lewis SJG, et al. [68] Neu D, Mairesse O, Hoffman G, Dris A, Lambrecht LJ, Linkowski P, et al.
Sleep quality in healthy older people: relationship with 1H magnetic Sleep quality perception in the chronic fatigue syndrome: correlations
resonance spectroscopy markers of glial and neuronal integrity. Behav with sleep efficiency, affective symptoms and intensity of fatigue. Neu-
Neurosci 2012;127:803e10. ropsychobiology 2007;56:40e6.
[41] Blackwell T, Redline S, Ancoli-Israel S, Schneider JL, Surovec S, Johnson NL, [69] Rener-Sitar K, John MT, Bandyopadhyay D, Howell MJ, Schiffman EL.
et al. Comparison of sleep parameters from actigraphy and poly- Exploration of dimensionality and psychometric properties of the Pitts-
somnography in older women: the SOF study. Sleep 2008;31:283e91. burgh sleep quality index in cases with temporomandibular disorders.
[42] Burkhalter H, Serelka SM, Engberg S, Wirz-Justice A, Steiger J, De Geest S. Health Qual Life Out 2014;12:10.
Validity of 2 sleep quality items to be used in a large cohort of kidney [70] Nicassio PM, Ormseth SR, Custodio MK, Olmstead R, Weisman MH,
transplant recipients. Prog Transpl 2011;21:27e35. Irwin MR. Confirmatory analysis of the Pittsburgh sleep quality index in
[43] Broderick JE, Junghaenel DU, Schneider S, Pilosi JJ, Stone AA. Pittsburgh rheumatoid arthritis patients. Behav Sleep Med 2014;12:1e12.
and Epworth sleep scale items: accuracy of ratings across different [71] Osorio CD, Gallinaro AL, Lorenzi-Filho G, Lage LV. Sleep quality in patients
reporting periods. Behav Sleep Med 2013;11:173e88. with fibromyalgia using the Pittsburgh sleep quality index. J Rheumatol
[44] Backhaus J, Junghanns K, Broocks A, Riemann D, Hohagen F. Test-retest 2006;33:1863e5.
reliability and validity of the Pittsburgh sleep quality index in primary [72] Otte JL, Rand KL, Carpenter JS, Russell KM, Champion VL. Factor analysis of
insomnia. J Psychosom Res 2002;53:737e40. the Pittsburgh sleep quality index in breast cancer survivors. J Pain
[45] Afsar B, Elsurer R. The relationship between sleep quality and daytime Symptom Manage 2013;45:620e7.
sleepiness and various anthropometric parameters in stable patients [73] Buysse DJ, Reynolds III CF, Monk TH, Hoch CC, Yeager AL, Kupfer DJ.
undergoing hemodialysis. J Ren Nutr 2013;23:296e301. Quantification of subjective sleep quality in healthy elderly men and
[46] Aloba OO, Adewuya AO, Ola BA, Mapayi BM. Validity of the Pittsburgh women using the Pittsburgh sleep quality index (PSQI). Sleep 1991;14:
sleep quality index among Nigerian university students. Sleep Med 331e8.
2007;8:266e70. [74] Ritsner M, Kurs R, Ponizovsky A, Hadjez J. Perceived quality of life in
[47] Alsaadi SM, McAuley JH, Hush JM, Bartlett DJ, Henschke N, Grunstein RR, schizophrenia: relationships to sleep quality. Qual Life Res 2004;12:
et al. Detecting insomnia in patients with low back pain: accuracy of four 783e91.
self-report measures. BMC Musculoskel Dis 2013;14:196. [75] Scarlata S, Pedone C, Curcio G, Cortese L, Chiurco D, Fontana D, et al. Pre-
[48] Babson KA, Blonigen DM, Boden MT, Drescher KD, Bonn-Miller MO. Sleep polysomnographic assessment using the Pittsburgh sleep quality index
quality among U.S. military veterans with PTSD: a factor analysis and questionnaire is not useful in identifying people at higher risk for
structural model of symptoms. J Trauma Stress 2012;25:665e74. obstructive sleep apnea. J Med Screen 2013;20:220.
[49] Beaudreau SA, Spira AP, Stewart A, Kezirian EJ, Lui L, Ensrud K, et al. [76] Skouteris H, Wertheim EH, Germano C, Paxton SJ, Milgrom J. Assessing
Validation of the Pittsburgh sleep quality index and the Epworth sleepi- sleep during pregnancy. A study across two time points examining the
ness scale in older black and white women. Sleep Med 2012;13:36e42. Pittsburgh sleep quality index and associations with depressive symp-
[50] Beck S, Schwartz AL, Towsley G, Dudley W, Barsevick A. Psychometric toms. Women Health Iss 2009;19:45e51.
evaluation of the Pittsburgh sleep quality index in cancer patients. J Pain [77] Spira AP, Beaudreau SA, Stone KL, Kezirian EJ, Lui L, Redline S, et al.
Symptom Manage 2004;27:140e8. Reliability and validity of the Pittsburgh sleep quality index and the
[51] Bush AL, Armento MEA, Weiss BJ, Rhoades HM, Novy DM, Wilson NL, et al. Epworth sleepiness scale in older men. J Gerontol A Biol Sci Med Sci
The Pittsburgh sleep quality index in older primary care patients with 2012;67A:433e9.
generalized anxiety disorder: psychometrics and outcomes following [78] Tomfohr LM, Schweizer CA, Dimsdale JE, Loredo JS. Psychometric char-
cognitive behavioral therapy. Psychiatry Res 2012;199:24e30. acteristics of the Pittsburgh sleep quality index in English speaking non-
[52] Buysse DJ, Hall ML, Strollo PJ, Kamarck TW, Owens J, Lee L, et al. Re- Hispanic Whites and English and Spanish speaking Hispanics of
lationships between the Pittsburgh sleep quality index (PSQI), Epworth Mexican descent. J Clin Sleep Med 2013;9:61e6.
sleepiness scale (ESS), and clinical/polysomnographic measures in a [79] Zohal MA, Yazdi Z, Kazemifar AM. Daytime sleepiness and quality of sleep
community sample. J Clin Sleep Med 2008;4:563e71. in patients with COPD compared to control group. Glob J Health Sci
[53] Carpenter JS, Andrykowski MA. Psychometric evaluation of the Pittsburgh 2013;5:150e5.
sleep quality index. J Psychosom Res 1998;45:5e13. [80] Fichtenberg NL, Zafonte RD, Putnam S, Mann NR, Millard AE. Insomnia in a
[54] Casement MD, Harrington KM, Miller MW, Resick PA. Associations be- post-acute brain injury sample. Brain Inj 2002;16:197e206.
tween Pittsburgh sleep quality index factors and health outcomes in [81] Cronbach LJ, Warrington WG. Time-limit tests: estimating their reliability
women with posttraumatic stress disorder. Sleep Med 2012;13:752e8. and degree of speeding. Psychometrika 1951;16:167e88.
[55] Cole JC, Motivala SJ, Buysse DJ, Oxman MN, Levin MJ, Irwin MR. Validation *[82] Feinstein AR, Wells CK, Joyce CM, Josephy BR. The evaluation of sensibility
of a 3-factor scoring model for the Pittsburgh sleep quality index in older and the role of patient collaboration in clinimetric indexes. Trans Assoc
adults. Sleep 2006;29:112e6. Am Physicians 1985;98:146e9.
[56] Elsenbruch S, Harnish MJ, Orr WC. Subjective and objective sleep quality [83] deVet CHW, Terwee CB, Mokkink LB, Knol DL. Measurements in medicine.
in irritable bowel syndrome. Am J Gastroenterol 1999;94:2447e52. A practical guide. Cambridge University Press; 2011.
[57] Gliklich RE, Taghizadeh F, Winkelman JW. Health status in patients with [84] Starfield B. Primary care: balancing health needs, services, and technol-
disturbed sleep and obstructive sleep apnea. Otolaryngol Head Neck Surg ogy. Cambridge Oxford University Press; 1998.
2000;122:542e6. [85] McHorney CA, Tarlov AR. Individual-patient monitoring in clinical prac-
[58] Grandner MA, Kripke DF, Yoon I, YounGstedt SD. Criterion validity of the tice: are available health status surveys adequate? Qual Life Res 1995;4:
Pittsburgh sleep quality index: investigation in a non-clinical sample. 293e307.
Sleep Biol Rhythms 2006;4:129e39. [86] Nunnally JC. In: Assessment of reliability. Psychometric theory. New York:
[59] Hall MH, Matthews KA, Kravitz HM, Gold EB, Buysse DJ, Bromberger JT, McGraw-Hill Book Co; 1978. p. 225e55.
et al. Race and financial strain are independent correlates of sleep in *[87] FSDA Guidance for Industry. Patient-reported outcome measures: use in
midlife women: the SWAN sleep study. Sleep 2009;32:73e82. medical product development to support labelling claims. Silver Spring,
[60] Hancock P, Larner AJ. Diagnostic utility of the Pittsburgh sleep quality USA: US Department of Health and Human Services, Food and Drug
index in memory clinics. Int J Geriatr Psychiatry 2009;24:1237e41. Administration; 2009.
[61] Knutson KL, Rathouz PJ, Yan LL, Liu K, Lauderdale DS. Stability of the Pitts- [88] Lohr KN, Aaronson NK, Alonso J, Burnam MA, Patrick DL, Perrin EB, et al.
burgh sleep quality index and the Epworth sleepiness questionnaires over 1 Evaluating quality-of-life and health status instruments: development of
year in middle-aged adults: the CARDIA study. Sleep 2006;29:1503e6. scientific review criteria. Clin Ther 1996;18:979e92.
[62] Magee CA, Caputi P, Iverson DC, Huang X. An investigation of the *[89] Scientific Advisory Committee of the Medical Outcomes Trust. Assessing
dimensionality of the Pittsburgh sleep quality index in Australian adults. health status and quality-of-life instruments: attributes and review
Sleep Biol Rhythms 2008;6:222e7. criteria. Qual Life Res 2002;11:193e205.
T. Mollayeva et al. / Sleep Medicine Reviews 25 (2016) 52e73 73
[90] McHorney CA, Ware Jr JE, Raczek AE. The MOS 36-item short-form [95] Thomadsen B, Lin SW. Taxonometric guidance for developing quality
health survey (SF-36): II. Psychometric and clinical tests of validity assurance. Int J Radiat Oncol Biol Phys 2008;71(Suppl1):S204e9.
in measuring physical and mental health constructs. Med Care [96] Fayers PM, Hand DJ. Factor analysis, causal indicators, and quality of life.
1993;31:247e63. Qual Life Res 1997;6:139e50.
[91] Schutte-Rodin S, Broch L, Buysse D, Dorsey C, Sateia M. Clinical guideline [97] Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-
for the evaluation and management of chronic insomnia in adults. J Clin analysis. J Wiley & Sons, Ltd; 2009.
Sleep Med 2008;4:487e504. [98] Soderberg E, Alexanderson K. Sickness certification practices of physi-
[92] Marx RG, Bombardier C, Hogg-Johnson S, Wright JG. Clinimetric and cians: a review of the literature. Scand J Public Health 2003;31:460e74.
psychometric strategies for development of a health measurement scale. *[99] Lax MB, Manetti FA, Klein RA. Medical evaluation of work-related illness:
J Clin Epidemiol 1999;52:105e11. evaluations by a treating occupational medicine specialist and by inde-
[93] Wright JG, Feinstein AR. A comparative contrast of clinimetric and psy- pendent medical examiners compared. Int J Occup Environ Health
chometric methods for constructing indexes and rating scales. J Clin 2004;10:1e12.
Epidemiol 1992;42:1201e18. [100] Wolfson AM, Doctor JN, Burns SP. Clinical judgments of functional out-
[94] Browne MW. An overview of analytic rotation in exploratory factor comes: how bias and perceived accuracy affect rating. Arch Phys Med
analysis. Multivar Behav Res 2001;36:111e50. Rehabil 2000;81:1567e74.