Imperfect Placebos Are Common In Low Back Pain Trials:
A Systematic Review Of The Literature

This section was compiled by Frank M. Painter, D.C.
Send all comments or additions to:

FROM:   Eur Spine J. 2008 (Jul);   17 (7):   889–904~ FULL TEXT

L. A. C. Machado, S. J. Kamper, R. D. Herbert, G. Maher, and J. H. McAuley

Back Pain Research Group,
Musculoskeletal Division,
The George Institute for International Health,
Missenden Rd, P.O. Box M201,
Camperdown, NSW, 2050, Australia.

The placebo is an important tool to blind patients to treatment allocation and therefore minimise some sources of bias in clinical trials. However, placebos that are improperly designed or implemented may introduce bias into trials. The purpose of this systematic review was to evaluate the adequacy of placebo interventions used in low back pain trials. Electronic databases were searched systematically for randomised placebo-controlled trials of conservative interventions for low back pain. Trial selection and data extraction were performed by two reviewers independently. A total of 126 trials using over 25 different placebo interventions were included. The strategy most commonly used to enhance blinding was the provision of structurally equivalent placebos. Adequacy of blinding was assessed in only 13% of trials. In 20% of trials the placebo intervention was a potentially genuine treatment. Most trials that assessed patients' expectations showed that the placebo generated lower expectations than the experimental intervention. Taken together, these results demonstrate that imperfect placebos are common in low back pain trials; a result suggesting that many trials provide potentially biased estimates of treatment efficacy. This finding has implications for the interpretation of published trials and the design of future trials. Implementation of strategies to facilitate blinding and balance expectations in randomised groups need a higher priority in low back pain research.

From the FULL TEXT Article:


Placebo-controlled trials are designed to control for incidental factors such as natural recovery, regression to mean and placebo effects. In theory, this permits the specific (non-incidental) effects of treatment to be determined. To control for placebo effects, participants must be kept unaware of their group assignment, that is, they must be blinded. If a placebo-controlled trial fails to achieve acceptable blinding, it is possible that the estimates of treatment effects will be biased due to imbalances in the magnitude of placebo effects between groups. Blinding also contributes to the prevention of other sources of bias in trials such as measurement bias, treatment non-compliance and loss to follow-up [79, 125].

Ideally, blinding is achieved by using placebos that are indistinguishable from the experimental intervention. While this is relatively easy to achieve in pharmaceutical trials, it is more difficult in trials of complex interventions such as exercise or psychological interventions [66, 113]. This is because in trials investigating complex interventions, indistinguishability is often achieved at the expense of having placebos that are not inert.

There is a controversy surrounding the use of term “inert” to describe non-pharmaceutical placebos [18, 34], but much of this debate seems to be merely semantic. For example, Rosenthal and Frank [119] have expanded the inert nature of placebos to psychotherapy by defining placebo as “an activity regarded as therapeutically inert from the standpoint of the theory of the therapy being studied” (p. 299). Semantics apart, trialists should avoid the use of non-inert placebos in trials because they may cause the underestimation of treatment effects. Rather than use placebos that are not inert, the alternative is to instead choose placebos that are clearly inert but distinguishable from the experimental intervention. An example is the use of sham electrotherapy in trials of spinal manipulation or exercise. Theoretically, this could be equally problematic because the dissimilar nature of the interventions may not generate placebo effects of similar magnitude in the experimental and control groups [80, 81].

A systematic review of the literature was conducted to ascertain the adequacy of placebo interventions implemented in clinical trials. We chose to review the trials investigating the efficacy of interventions for low back pain because low back pain represents a common and costly health condition for which a wide range of treatment options are currently available [91].


      Search strategy

A systematic search was conducted from the earliest record to November 2006 in MEDLINE, CINAHL, PsychInfo, Cochrane Central Register of Controlled Trials and EMBASE using the strategy recommended by the Cochrane Back Review Group [140]. Results were combined with the terms “placebo”, “sham”, “attention-control” and “minimal intervention”.

      Inclusion and exclusion criteria

Eligible studies were randomised placebo-controlled trials evaluating the efficacy of conservative (non-surgical) interventions for non-specific low back pain or sciatica in which outcomes had been reported in terms of pain, disability, quality of life, sick leave, global perceived effect or recurrence. Non-English studies were included when a translation was available. Studies in which participants presented with cauda equina syndrome, infection, neoplasm, fracture, inflammatory disease, pregnancy or spinal surgery in the past 12 months were excluded, as were primary prevention studies.

      Data extraction and analysis

Two independent reviewers used a standard form to extract data. Disagreements were resolved by discussion and consensus. Trial quality was assessed using the PEDro scale, an 11-item checklist in which higher scores represent higher quality and 10 is the maximum possible score [100]
(the full scale can be viewed at

In this review, trials were included in the analysis regardless of the result of their quality ratings. The reviewers extracted information on the substances or procedures used in placebo groups, whether the assessment of blinding or the evaluation of patients’ expectations was reported, and the results of these assessments. Additionally, trials were coded according to the use of the following specific strategies, which have the potential to facilitate blinding in trials [9, 38].

      Indistinguishable placebo

The most obvious way of blinding patients to allocation is to provide one group with a placebo intervention that is indistinguishable from the experimental intervention. We examined descriptions of placebo interventions in trial reports to judge whether patients would be able to differentiate them from the experimental intervention. Placebos that replicate side effects of pharmaceutical interventions are known as "active placebos". However, in this study a pharmaceutical placebo did not have to replicate side-effects to be considered indistinguishable. For those interventions delivered by procedures that break the skin, such as acupuncture and injections, the placebo was considered indistinguishable only when skin penetration was also involved. In accordance to Baskin and colleagues [9], placebos of psychological interventions were never considered indistinguishable.

      Inert placebo

In this review, placebo interventions were coded as not inert if they involved a treatment used in current clinical practice. (Note that having inert placebos does not, on its own, secure blinding).

      Structurally equivalent placebo

One strategy used to promote comparability of intervention and placebo groups is to ensure the structural equivalence of the experimental and placebo interventions. Structural equivalence is particularly important when indistinguishability is not feasible. The structural equivalence of each of the placebo interventions was evaluated by considering a list of criteria adapted from the psychotherapy literature [9]. To qualify as being structurally equivalent in this review, the placebo intervention had to match the experimental intervention in the following criteria: number of sessions, length of sessions, format (group or individual), level of therapist training, individualisation (the degree to which the intervention was tailored to the patient), and relevance of the intervention with regard to the condition (e.g. lying prone was not considered to be a relevant placebo for low back pain [61]).

      Sample consisting of naïve subjects

Patients were considered “naïve” if the trial reported that they had not been exposed to the active form of the intervention employed in the placebo group. This strategy contributes to blinding because a non-naïve sample would be more likely to know the sensation of true treatment and therefore correctly guess their allocation. To code trials for this feature, their inclusion and exclusion criteria were examined. For example, in a trial in which the placebo consists of inactive transcutaneous electrical nerve stimulation (TENS), patients were considered naïve if an exclusion criterion was previous treatment with TENS.


Electronic searches identified 1,002 studies. Of these, 126 were eligible and included in the analysis (Fig. 1). Because nine of the trials had a third group consisting of a different intervention, 135 comparisons against placebo were available. For simplicity, each comparison was treated as an individual trial. Trials reported on the following categories of interventions:

acupuncture (10 trials) [25, 42, 73, 74, 85, 95, 99, 104, 106, 128]

back school (2 trials) [11, 28]

behavioural (7 trials) [10, 21, 62, 111, 130, 132, 133]

electrotherapy (20 trials) [8, 15, 25, 27, 37, 49–51, 54, 57, 63, 69, 76, 77, 89, 94, 107, 122, 137, 143]

exercise (10 trials) [29, 37, 43, 48, 50, 59, 67, 116, 131, 136]

heatwrap therapy (2 trials) [109, 110]

insoles (1 trial) [127]

magnets (1 trial) [33]

massage (1 trial) [116]

neuroreflexotherapy (1 trial) [93]

pharmaceutical (65 trials) [1–7, 12–14, 17, 19, 20, 22–24, 26, 30–32, 35, 36, 39–41, 45–47, 52, 53, 55, 56, 58, 60, 64, 71, 72, 75, 82–84, 86–88, 90, 92, 97, 101–103, 108, 112, 114, 115, 118, 120, 123, 124, 134, 135, 138, 139, 141, 142, 145]

spinal manipulative therapy (12 trials) [11, 29, 48, 54, 61, 65, 68, 72, 78, 96, 121, 146]

and traction (3 trials). [16, 117, 129]

Trial characteristics are presented in Table 1.

Figure 1.   Search and selection of papers.
*Non-English papers

Table 1.   Characteristics of included trials

The quality of the included trials was mostly moderate (range 1–10, median 7 points). Six trials scored 3 points or less on the PEDro scale [48, 50, 121, 131, 132, 136] and two pharmaceutical trials scored the maximum of 10 points [41, 108]. Over 25 different substances or procedures were used as placebo interventions. The placebo tablet/capsule was the most frequent (28%), followed by sham electrotherapy (20%). Pharmaceutical trials were highly consistent in their choices of placebo, whereas exercise and spinal manipulative therapy trials had the largest diversity of placebos.

Only 17 trials (13%) provided information on success of blinding. Of these, 2 failed to achieve acceptable blinding, represented by a significantly greater number of participants in the placebo group correctly guessing their group allocation. Patients’ expectations were assessed in 14 trials (10%), and in 8 of those higher expectations were observed in the experimental group. The methods used to assess expectations included single questions about expectations for pain relief or for treatment efficacy, modified expectation scales, structured credibility scales, and questions on preferences for future treatment. The latter was considered to be a measure of expectations because it is often one of the items included in credibility scales. The time at which expectation was measured differed greatly across trials, ranging from baseline [104] to 6 months after enrolment [85].

Provision of structural equivalence was the strategy most frequently used to facilitate blinding (87% of trials). Indistinguishable placebo interventions were used in 58% of trials. Placebos that were clearly inert were used in 79% of trials. Of the remaining trials (those whose placebo was not clearly inert) most, but not all, were trials with indistinguishable placebos; suggesting that indistinguishability was achieved at the expense of potentially causing specific treatment effects. Few trials (18%) explicitly included only naïve subjects. The proportion of trials with placebos that were indistinguishable, inert, structurally equivalent and used naïve subjects varied with the type of intervention. Figure 2 describes these proportions among groups of interventions tested in ten or more trials. A post hoc analysis excluding low-quality trials (trials scoring 3 points or lower on the PEDro scale) provided results almost identical to those described in Fig. 2.

Figure 2.   Proportion of trials with

(a) indistinguishable placebo
(b) inert placebo
(c) structurally equivalent placebo
(d) naïve subjects
(e) blinding assessment
(f) successful blinding

successful = both assessed and found to be successful
unsuccessful = not assessed or assessed and found to be unsuccessful.

Assessments of patients’ expectations were also considered under blinding assessment.
Graphs report group of interventions investigated in ten or more trials.
Acupuncture (n = 10),
electrotherapy (n = 20),
exercise (n = 10),
SMT (n = 12),
pharmaceutical (n = 65).
SMT = spinal manipulative therapy


This review reveals that imperfect placebos are common in low back pain trials, a finding that has implications for the design of future trials and also for the interpretation of published trials evaluating treatment of low back pain. Two common problems were identified in the design of trials: the use of placebos that are potentially not inert (as indicated by contemporary treatment) and the uncertain success of blinding.

It may be argued that our search strategy may have inflated the proportion of trials with non-inert placebos, because we used the term “minimal intervention” in our search strategy. However, we only included trials in the review if the authors categorised the control intervention as a placebo intervention, or if they have stated in the manuscript that the intervention was designed to control for non-specific effects of treatment [29]. The use of non-inert placebos in trials is usually a consequence of an uncritical attempt to design placebos that are indistinguishable from real interventions. For example, among non-pharmaceutical trials, we found that indistinguishability was more frequent for trials of acupuncture but all these indistinguishable placebos consisted of potentially genuine treatments. In acupuncture trials, the use of invasive sham acupuncture techniques has been criticised because the mechanism behind the effects of acupuncture may not depend on the depth or location of needling, but on needling itself [98, 144]. Accordingly, the lack of a clear understanding on the mechanisms underlying specific therapeutic effects is also a challenge to the design of indistinguishable placebos in other complex interventions [66].

In pharmaceutical trials, “active placebos” are sometimes used to create intervention groups that are more closely matched. These placebos aim to mimic the side effects of drugs (e.g. dry mouth) while maintaining the same characteristics of other placebo types [115, 125]. However, pharmaceutical trials with improper choices of “active placebos” can also be at risk of spoiling their placebo comparisons. Two trials included in this review had a choice of “active placebo” (diphenhydramine) that might have acted as a genuine treatment because of its sedative properties. Thus, the results of these trials no longer reflect a placebo-controlled comparison but instead reflect a comparison of two genuine treatments. The decision on whether to use “active placebos” in pharmaceutical trials should be balanced with its risks. In antidepressant trials for example, their use may not be justifiable given that the incidence of side effects in experimental and placebo groups seems to be similar regardless of the use of an “active placebo” [4, 5].

The inclusion of naïve subjects in trials is one of the alternatives to enhance blinding when true indistinguishability is difficult to achieve. This is illustrated in a trial where TENS therapy is provided by a functioning device and the placebo via a non-functioning device (sham TENS group). Although both interventions will look the same, the electrical stimulation will only be detected by patients treated with the functioning device. In order to keep patients blinded in trials like this, researchers often tell them that they might or might not feel the stimulation regardless of whether the treatment provided was a placebo [27]. However, it is unlikely that such information will prevent patients who have previously received a course of TENS therapy from knowing the sensation of true treatment and consequently from becoming unblinded. For the same reason, the use of a crossover design in these trials might not be appropriate [50]. Deyo and colleagues [37] have argued for the inclusion of naïve subjects in electrotherapy trials and, consistent with this recommendation, our results showed that naïve subjects were used more frequently in electrotherapy trials than in trials of other interventions.

From the different strategies with the potential to facilitate blinding in placebo-controlled trials, we found that structural equivalence was the most frequently used. When experimental and placebo interventions are structurally equivalent, they might not look the same, but they involve similar degrees of therapeutic contact. Provision of structurally equivalent placebo interventions may control for placebo effects without the risk of having a placebo that is not inert. A meta-analysis of psychotherapy trials has provided some evidence that structural equivalence reduces bias in treatment estimates [9]. The meta-analysis showed that trials with structurally equivalent groups reported smaller effects of interventions than trials with groups that were not structurally equivalent. The “larger treatment effects” observed in the latter would reflect larger placebo effects in the experimental group due to the differential amount or quality of therapeutic contact. Nevertheless, because potentially many factors influence the magnitude of placebo effects, it would seem unlikely that structural equivalence alone can control for all the factors that generate unbalanced placebo effects in trials.

The use of any strategy to facilitate blinding will be worthless if, ultimately, an acceptable level of blinding is not achieved. As noted by Schulz and colleagues [126], “blinding must succeed to reap its benefits”. Accordingly, the CONSORT statement recommends that the success of blinding be reported [105]. Blinding success was poorly documented in a sample of general medicine and psychiatry trials [44]. Likewise, our results show that disappointingly few trials of low back pain report on blinding success. However, this fact is not sufficient to rule out successful blinding. Hill and colleagues [70] contacted the investigators of 40 rheumatology trials and found that the lack of reporting of randomisation, concealed allocation and blinding does not necessarily mean that these research methods have not been properly conducted. Nevertheless, although successful blinding might have been achieved in some trials where this was not reported, it would be clearer if future trials included the results of their blinding assessments in their reports.

One way of checking if blinding is successful is to measure how often the group assignment is guessed correctly. In a two-arm trial in which blinding is successful, guesses would be accurate 50% of the time. Nevertheless, in placebo-controlled trials, the success of blinding is better understood by the differences in the proportions of patients in each group who believed a “real” treatment was provided. That is, if patients in the placebo group are more likely to believe that the intervention received was a placebo, blinding was unsuccessful. The timing of blinding assessments also deserves special consideration. For instance, if the experimental intervention consists of a highly effective treatment, the difference in the proportion of patients believing in the provision of a “real” treatment will tend to be higher regardless of the use of adequate strategies to secure blinding. For this reason, it is preferable that blinding success is assessed earlier rather than latter in a course of treatment.

Some investigators supplement assessments of blinding success with measurements of expectations with treatment. While important imbalances in patients’ expectations were reported in eight trials (out of 14), it is likely such imbalances are common across trials of this type because of the small number of trials in which assessments of expectations were performed. Health care providers may also transfer to patients their own expectations [125]. As noted by Critelli and Neumann [34], “there appears to be a tendency for experimental placebos to be in some sense weaker, less credible, or applied in a less enthusiastic manner than treatments that have been offered as actual therapies”. However, in this review we have focused exclusively on investigating this concept from a patient’s perspective.

Despite the contribution of expectation measurements to the interpretability of the results in clinical trials, these measurements have important limitations. Firstly, there is no consensus on how expectations should be assessed in clinical trials, represented by the lack of standardisation in these assessments. In addition, deciding the best timing for these assessments is difficult and may explain the large variation encountered among the trials included in this review. As with assessment of blinding, treatment effects may confound ratings of patients’ expectations obtained at follow-up. Thus, it is questionable whether assessments of expectations as late as 6 months after enrolment measure the same construct as assessments of expectations at baseline. If researchers choose to assess expectations at baseline, patients might find it difficult to describe their expectations associated with interventions to which they are unfamiliar. Optimal ways to assess expectations in trials and the standardisation of such measurements are a priority and should be addressed by future studies.


Our results illustrate the complexity inherent in design of suitable placebo interventions. Unfortunately many placebo-controlled trials evaluating treatment of low back pain are imperfect and so the trials potentially provide biased estimates of the efficacy of treatment. This finding has implications for the interpretation of published trials and the design of future trials in this area.

Electronic supplementary material


    Please refer to the Full Text

    Return to the PROBLEMS WITH PLACEBOS Page

    Since 10–182016

         © 1995–2018 ~ The Chiropractic Resource Organization ~ All Rights Reserved