FROM:   J Manipulative Physiol Ther 1999 (Oct);   22 (8):   503-510 ~ FULL TEXT

Jennifer E. Bolton and Alan C. Breen

Anglo-European College of Chiropractic,
Bournemouth, England

Objective:   Develop and test a short-form comprehensive outcome measure for back pain.

Design:   Prospective longitudinal study of 3 consecutive cohorts of back pain patients. Setting: Anglo-European College of Chiropractic outpatient clinic and several field chiropractic practices.

Method:   Domains judged important in the back pain model and responsive to clinical change were identified from the literature. Items were scored on an 11-point numerical rating scale. The instrument was psychometrically tested by use of those tests relevant to an evaluative measure.

Results:   Seven dimensions of the back pain model were included in the questionnaire. Having established face validity, the instrument was shown to demonstrate high internal consistency (Cronbach's ALPHA = 0.9) and good test-retest reliability (ICC = 0.95). All items were retained on the basis that they contributed to the overall score (item-corrected total score correlations) and to the instrument's responsiveness to clinical change (item change-corrected total change score correlations). The instrument demonstrated acceptable construct and longitudinal construct validity with established external measures. The effect size of the instrument was high (1.29) and comparable with established measures.

Conclusion:   A reliable, valid, and responsive instrument has been developed for use in back pain patients. It is practical for use in investigations of both the efficacy and effectiveness of back pain treatments.

From the Full-Text Article:


Developing and testing a new questionnaire is a time-consuming task that should only be undertaken if a need exists. In this study, 3 separate phases of data collection were undertaken over a considerable time before complete testing of the instrument was possible. The pilot study and phase 1 involved lengthy reworking of the questions in the development stage so as to achieve some degree of face validity, both to patients and researchers. Even so, we might have gone further and interviewed a sample of patients to find out exactly what they understood by each question. Despite this, however, we believe the questionnaire consists of items that are clear and unambiguous and broadly understood by patients.

The need for a new questionnaire was identified by trawling the literature on back pain outcome measures and being faced with many long and cumbersome measures and very few, if any, multidimensional condition-specific measures suitable for use in documenting patient outcomes in a busy clinic practice setting. Although undoubtedly such instruments must exist, little evidence, as far as we are aware, of their psychometric testing exists. In a recent article by Deyo and coworkers, [35] the case for standardization of outcome measures in back pain research is forcibly made. Similar to this study, these authors distinguished outcome measures for use in clinical trials from those for use in other types of outcomes research. With reference to the latter, a “parsimonious” 6-item core set of outcome measures was proposed covering pain symptoms, daily activity disability, and satisfaction with treatment.35 Although each of the items was extracted from an established measure, the psychometric properties of the 6-item set itself were not tested. [35]

Setting about selecting items for an outcome measure suitable for use in clinical practice, we were particularly mindful of the multidimensional nature of back pain on the basis of the biopsychosocial model36 and the current shifts in outcome measures reflecting the illness, as opposed to disease, and care, as opposed to treatment, models. [9] Consequently, we included those dimensions we judged relevant to the model and that were shown in the literature at the time (and still are) to explain, at least in part, the back pain experience and its consequences. In addition, we used the criterion that all the dimensions included must be responsive to clinically significant change. As a result, the symptom of pain intensity, daily functional activity and social activity, the affective dimensions of anxiety and depression, and the cognitive/behavioral aspects of fear-avoidance beliefs and self-efficacy beliefs of pain control were included in the final instrument. The retention of all the items also maintained the multidimensional nature of the outcome measure. Pain intensity has been shown to decrease after treatment in numerous studies of back pain patients (eg, Meade et al [37]). Like pain intensity, many studies have shown significant changes in functional activity after treatment interventions in back pain patients (eg, Beurskens et al [38]). Similarly, distress levels, [27] fear-avoidance beliefs, [29] and pain locus of control [28] have all been shown to contribute significantly to the back pain experience and change after treatment. Finally, although these dimensions were singled out for inclusion in the questionnaire, it would be incorrect to interpret this as these being the only attributes of the back pain experience that change after treatment. In the trade-off for brevity of the questionnaire, these aspects were selected over others on the basis of review of the literature and our judgment of the important domains of the back pain model.

Having selected the item pool, we next turned our attention to item scaling, meaning the response options available to patients in answering the questions. Intuitively, increasing response options on a scale will increase item responsiveness and therefore the ability to detect clinically significant change. We, and others, have shown that an 11-point NRS is as responsive as a visual analog scale in detecting change in pain intensity and is easier for patients to complete. [39, 40] We have also shown that asking patients about their pain levels over the previous week is more responsive compared with asking patients to report on their present pain levels. [39] For these reasons, all scales in the questionnaire were 11-point NRSs asking patients to report on their usual levels of the domains of interest over a 1-week period.

To ensure that none of the items is redundant in terms of contributing to the overall score of the questionnaire and that each is responsive to clinically significant change, item-corrected total correlations and item change score-corrected total change score correlations (respectively) were determined. The results of both these analyses determined that all the original items should be retained in the final version of the questionnaire.

Considerable confusion exists in the literature regarding which psychometric tests should be applied in the testing of new questionnaires and the terminology used. For example, in recent development and testing of new outcomes measures in the lumbar spine no consistency of either approach or terminology is apparent. [17, 18, 19, 38] We adopted the criterion that only those tests applicable to the testing of evaluative measures should be used and consequently based the psychometric testing primarily on the framework advocated by Kirshner and Guyatt. [24] Other psychometric tests, of which there are many, were therefore purposely not included.

The results of the internal consistency analysis (Cronbach's alpha) showed that the questionnaire is a homogeneous instrument tapping different aspects of the same attribute (ie, back pain experience) and that as a result the items can be summed to produce a total overall score. Because the 7 dimensions each have a maximum score of 10, it may be more convenient to express the total score of the BQ as a percentage.

Because no “gold standard” exists, the external construct validity of the questionnaire was tested against established measures purporting to measure the same domains as those included in the questionnaire. The importance of testing longitudinal construct validity in an evaluative measure has been well argued by Kirshner and Guyatt, [24] and as a result we included this particular property and the more commonly investigated construct validity. In all cases the questionnaire robustly correlated with these external measures not only in terms of absolute scores (construct validity) but also in terms of the change scores over time (longitudinal construct validity).

In contrast to reliability and validity, it is surprising that the psychometric property of responsiveness is often not tested in evaluative measures. This is something of a paradox considering the ability to detect clinical change is an essential requirement of a measure of outcome. [30] In this study responsiveness of the individual items of the questionnaire in those patients who reported improvement in their condition was tested by correlating the change in item scores with the corrected total change scores of the questionnaire (also termed internal construct validity [38]). As has been discussed earlier, all items were sensitive to the improvement reported by patients. In terms of the overall responsiveness of the instrument and comparison with that of the external measures, the effect size was calculated. The questionnaire demonstrated a large effect size and, apart from that of the CPG, was the largest of any of the external measures used in this study, confirming the ability of the BQ to detect clinical change.

Before leaving the discussion on psychometric testing of the questionnaire, we should explain why the data collected in the 3 phases of the study were subjected to the analyses in the way they were. Obviously, only the data in phase 3 could be subjected to external validity and external responsiveness testing because it was only in this phase that external measures were used. Similarly only data in phase 2 could be subjected to test-retest reliability testing because only in this phase were the conditions for satisfactory test-retest met. However, data from both phases 2 and 3 could have been subjected to internal consistency and internal responsiveness testing. For clarity purposes for this article, it was decided to present the internal testing of the questionnaire with phase 2 data and the external testing of the questionnaire with phase 3 data. In all cases where data from both phases could be used, the analysis of both sets of data gave rise to the same interpretations and conclusions. It remains the case that development and testing of a questionnaire over a prolonged time inevitably leads to this multiphasic approach and the consequential overlapping of data sets.

The questionnaire has limitations. The patients recruited to the study were convenience samples both from the college teaching clinic and from chiropractors' field practices. As such, they may not be representative of the population of chiropractic patients as a whole, nor may they be representative of other ambulatory back pain patients. No attempt was made to distinguish between acute and chronic back pain patients. Also, the questionnaire was not developed for use in patients with other musculoskeletal disorders. Finally, the mean baseline scores of the pretreatment questionnaire, although comparable to other outcome measures, only registered approximately 50% of the scale, suggesting that further modifications may be possible to increase its responsiveness.


We have developed a multidimensional questionnaire for use in the routine documentation of outcomes in back pain patients attending chiropractic outpatient clinics. The questionnaire can be completed quickly and has been shown to be reliable, valid, and responsive. As such, it may be considered for use in other ambulatory back pain populations and in clinical trials of back pain treatments. We do not claim that the items in the questionnaire entirely cover the back pain experience or that they are necessarily the best available. However, we do believe that together they provide a reasonably comprehensive assessment that encompasses most of the important dimensions of the back pain model. The development of such an instrument makes possible studies directly comparing outcomes in practice-based settings with those in research trials, an important issue in the current moves to evidence-based practice.

