J Manipulative Physiol Ther 2004 (Jan); 27 (1): 26–35 ~ FULL TEXT
Hugh Hurst and Jennifer Bolton
Anglo-European College of Chiropractic,
BACKGROUND: To date, clinical trials have relied almost exclusively on the statistical significance of changes in scores from outcome measures in interpreting the effectiveness of treatment interventions. It is becoming increasingly important, however, to determine the clinical rather than statistical significance of these change scores.
OBJECTIVE: To determine cutoff values for change scores that distinguish patients who have clinically improved from those who have not.
METHOD: Data were obtained from 165 back and 100 neck patients undergoing chiropractic treatment. Patients completed the Bournemouth Questionnaire (BQ) before treatment and the BQ and Patient's Global Impression of Change (PGIC) scale after treatment. Three statistical methods were applied to individual change scores on the BQ. These were (1) the Reliable Change Index (RCI); (2) the effect size (ES); and (3) the raw and percentage change scores. The PGIC scale was used as the "gold standard" of clinically significant change.
RESULTS: The RCI, using the cutoff value of >1.96, appropriately identified clinical improvement in back patients but not in neck patients. An individual ES of approximately 0.5 had the highest sensitivity and specificity in distinguishing back and neck patients who had undergone clinically significant improvement from those who had not. In terms of raw score changes, percentage BQ change scores [(raw change score/baseline score) x 100] of 47% and 34% were identified as having the highest sensitivity and specificity in distinguishing clinically significant improvement from nonimprovement in back and neck patients, respectively.
CONCLUSIONS: This study provides a methodological framework for identifying clinically significant change in patients. This approach has important implications in providing clinically relevant information about the effect of a treatment intervention in an individual patient.
From the Full-Text Article:
In this study, 3 statistical methods derived from different computations of change scores on the BQ were investigated for their ability to distinguish patients who had undergone a clinically significant change from those who had not. The a priori definition of clinically significant improvement was a score of 6 or more on a 7-point NRS based on patients' global impression of change in their condition following treatment. This equated to feeling better or much better and a noticeable, worthwhile, and meaningful change. This anchor-based method has been used in many other studies to determine clinically significant change. [11, 21, 22, 23] In the absence of a true gold standard, asking patients themselves what constitutes a meaningful change to them, with all the attendant internal and external factors that might influence such judgment, seems intuitively the best that can be done when investigating issues of clinically important change.
This study identified from 70% to 80% agreement in categorizing patients as improved or not improved between asking patients directly on a PGIC scale and indirectly using cutoff values with high sensitivity and specificity on outcome measures. Since both methods rely on patients' own subjective judgements about change in their condition, this is reassuring. Many agreement studies rule out agreement that occurs by chance by using the k statistic in data analyses instead of simple percent agreement. However, in this case, since the data were not recorded as binary variables, the K statistic was not considered to be an appropriate method of analysis.
One of the 3 statistical methods used to categorize patients as improved and not improved, the RCI, gave anomalous results both in identifying the proportion of neck patients in the sample who improved and in calculations involving the PGIC scale. Neither of these findings was apparent when the RCI was used in back patients. The reliability coefficient of the neck BQ was relatively low, and this may have resulted in an overrigorous threshold for identifying patients who improved. Caution is therefore indicated when identifying clinically important improvement using the RCI for outcome measures in which reliability is moderate to poor.
The results of the sensitivity and specificity analyses showed that the second statistical method used in this study, the individual ES statistic, can be used to distinguish patients who improve from those who do not using the a priori definition of clinically important improvement from the PGIC scale. The findings of this study show that clinically significant improvement is indicated for individual back patients with an ES statistic of 0.4 or more and individual neck patients with an ES statistic of 0.5 or more. The similarity of these 2 values suggests that an overall individual ES cutoff of 0.5 for both types of patients rather than the exact values would be more convenient for use in a clinical setting and in the design of clinical trials.
The study has shown that the third statistical method under test can also be used to distinguish patients who have improved from those who have not. Raw change scores of 14 or more and percentage change scores of 47% or more were best associated with the a priori definition of clinical change in back pain patients. Corresponding cutoff values in neck pain patients were lower at 9 or more for raw change scores and 34% or more for percentage change scores. Using a similar definition of clinically important improvement, Farrar et al11 showed that a percentage change score of approximately 30% on an 11-point pain intensity NRS best distinguished chronic pain patients who had improved from those who had not. In an accompanying study to this one, using the BQ in a different sample of neck pain patients and using the RCI (but without the correction factor proposed by Christensen and Mendoza9) to identify clinically improved patients, corresponding cutoff values were raw score changes of 13 or more and percentage change scores of 33% (Bolton, submitted for publication). The similarity of the cutoff percentage change score value in both studies suggests this might be more appropriate as a clinical tool in identifying patients who have improved. Moreover, percentage change score is a standardized measure that is more easily interpretable, particularly when different outcome measures with different scales are in use. Farrar et al  concluded that in studies in which there is high variability in baseline pain levels, the relationship between percentage change and clinical improvement will be more consistent than the relationship between raw change and clinical improvement.
This article provides a methodological framework for interpreting statistical computations from outcome measures in terms of their clinical significance. In essence, it treats these computations as diagnostic tests in determining the presence or absence of a clinically significant change. There is a considerable amount of potential bias in the evaluation of diagnostic tests  and a strength of this study was that it avoided selection bias by recruiting patients in a consecutive manner. However, the study only looks at scores from 1 outcome measure in a limited patient group and change that occurs over a relatively short period of time. Moreover, the modified PGIC scale has not been tested for reliability or validity, nor has it been shown to be a valid external criterion for clinically significant change, even though we used it as such. In an area where there is an array of methods to define minimal important difference (anchor-based and statistical), more work is required to identify just what does constitute a clinically important difference, so that it can be used with confidence as a valid external criterion in future studies. Further work is also required into other outcome measures and other conditions. In particular, the reliability of the cutoff values reported in this study should be investigated by repeating the work in different samples of patients. In conditions such as back and neck pain, which are notoriously unpredictable and heterogeneous, issues of reliability are of paramount importance when cutoff values are being proposed for use in other settings. It is also the case that since this study's design did not include a control group, no conclusions have been drawn on the cause of the improvement observed in these patients and therefore the effect of the treatment intervention.
This study presents a number of threshold values on statistical computations from change scores that best identify patients undergoing clinically significant change from those who have not. This work is based, however, on the PGIC as an external criterion of clinically significant change, and while this may be both conceptually reasonable and clinically relevant, it remains to be seen whether or not this is a valid assumption. By identifying proportions of patients who have undergone clinically important change, calculations can be made of the NNT and thus facilitate the application of group results from clinical trials to an individual patient. This transition from research setting to clinical setting underpins the principles of the practice of evidence-based health care.