Archives of Internal Medicine 2012 (Oct 22); 172 (19): 1444–1453 ~ FULL TEXT
Andrew J. Vickers, DPhil, Angel M. Cronin, MS, Alexandra C. Maschino, BS, George Lewith, MD, et. al.
Department of Epidemiology and Biostatistics,
Memorial Sloan-Kettering Cancer Center,
New York, NY 10065, USA.
BACKGROUND: Although acupuncture is widely used for chronic pain, there remains considerable controversy as to its value. We aimed to determine the effect size of acupuncture for 4 chronic pain conditions: back and neck pain, osteoarthritis, chronic headache, and shoulder pain.
METHODS: We conducted a systematic review to identify randomized controlled trials (RCTs) of acupuncture for chronic pain in which allocation concealment was determined unambiguously to be adequate. Individual patient data meta-analyses were conducted using data from 29 of 31 eligible RCTs, with a total of 17 922 patients analyzed.
RESULTS: In the primary analysis, including all eligible RCTs, acupuncture was superior to both sham and no-acupuncture control for each pain condition (P < .001 for all comparisons). After exclusion of an outlying set of RCTs that strongly favored acupuncture, the effect sizes were similar across pain conditions. Patients receiving acupuncture had less pain, with scores that were 0.23 (95% CI, 0.13-0.33), 0.16 (95% CI, 0.07-0.25), and 0.15 (95% CI, 0.07-0.24) SDs lower than sham controls for back and neck pain, osteoarthritis, and chronic headache, respectively; the effect sizes in comparison to no-acupuncture controls were 0.55 (95% CI, 0.51-0.58), 0.57 (95% CI, 0.50-0.64), and 0.42 (95% CI, 0.37-0.46) SDs. These results were robust to a variety of sensitivity analyses, including those related to publication bias.
CONCLUSIONS: Acupuncture is effective for the treatment of chronic pain and is therefore a reasonable referral option. Significant differences between true and sham acupuncture indicate that acupuncture is more than a placebo. However, these differences are relatively modest, suggesting that factors in addition to the specific effects of needling are important contributors to the therapeutic effects of acupuncture
From the FULL TEXT Article:
Acupuncture is the insertion and stimulation of needles at specific points on the body to facilitate recovery of health. Although initially developed as part of traditional Chinese medicine, some contemporary acupuncturists, particularly those with medical qualifications, understand acupuncture in physiologic terms, without reference to pre-modern concepts. 
An estimated 3 million American adults receive acupuncture treatment each year,  and chronic pain is the most common presentation.  Acupuncture is known to have physiologic effects relevant to analgesia [4, 5], but there is no accepted mechanism by which it could have persisting effects on chronic pain. This lack of biological plausibility, and its provenance in theories lying outside of biomedicine, makes acupuncture a highly controversial therapy.
A large number of randomized trials of acupuncture for chronic pain have been conducted. Most have been of low methodologic quality and, accordingly, meta-analyses based on these trials are of questionable interpretability and value6. Here we present an individual patient data meta-analysis of randomized trials of acupuncture for chronic pain, where only high quality trials were eligible for inclusion. Individual patient data meta-analysis is superior to the use of summary data in meta-analysis as it enhances data quality, enables different forms of outcome to be combined, and allows use of statistical techniques of increased precision.
The full protocol of the meta-analysis has been published.6 In brief, the study was conducted in three phases: identification of eligible trials; collection, checking and harmonization of raw data; individual patient data meta-analysis.
Data Sources and Searches
To identify papers, we searched MEDLINE, the Cochrane Collaboration Central Register of Controlled Trials and the citation lists of systematic reviews (full search strategy in Appendix). There were no language restrictions. The initial search, current to November 2008, was used to identify studies for the individual patient data meta-analysis; a second search was conducted in December 2010 for summary data to use in a sensitivity analysis.
Two reviewers applied inclusion criteria for potentially eligible papers separately, with disagreements about study inclusion resolved by consensus. Randomized trials were eligible for analysis if they included at least one group receiving acupuncture needling and one group receiving either sham (placebo) acupuncture or no acupuncture control. Trials must have accrued patients with one of four indications – non-specific back or neck pain, shoulder pain, chronic headache or osteoarthritis – with the additional criterion that the current episode of pain must be of at least four weeks duration for musculoskeletal disorders. There was no restriction on the type of outcome measure, although we specified that the primary endpoint must be measured more than four weeks after the initial acupuncture treatment.
It has been demonstrated that unconcealed allocation is the most important source of bias in randomized trials  and, as such, we included only those trials where allocation concealment was determined unambiguously to be adequate (further detail in the review protocol ). Where necessary, we contacted authors for further information concerning the exact logistics of the randomization process. Trials were excluded if there was any ambiguity about allocation concealment.
Data Extraction and Quality Assessment
The principal investigator of eligible studies was contacted and asked to provide raw data from the trial. To ensure data accuracy, all results reported in the trial publication, including baseline characteristics and outcome data, were then replicated.
Reviewers assessed the quality of blinding for eligible trials with sham acupuncture control. Trials were graded as having a low likelihood of bias if either the adequacy of blinding was checked by direct questioning of patients (e.g. by use of a credibility questionnaire) and no important differences were found between groups, or the blinding method (e.g. the Streitberger sham device8) had previously been validated as able to maintain blinding. Trials with a high likelihood of bias from unblinding were excluded from the meta-analysis of acupuncture versus sham; a sensitivity analysis included only trials with a low risk of bias.
Data Synthesis and Analysis
Each trial was reanalyzed by analysis of covariance with the standardized principal endpoint (scores divided by pooled standard deviation) as the dependent variable, with the baseline measure of the principal endpoint and variables used to stratify randomization as covariates. This approach has been shown to have the greatest statistical power for trials with baseline and follow-up measures. [9, 10] The effect size for acupuncture from each trial was then entered into a meta-analysis using the metan command in Stata 11 (Stata Corp., College Station, TX): the meta-analytic statistics were created by weighting each coefficient by the reciprocal of the variance, summing and dividing by the sum of the weights. Meta-analyses were conducted separately for comparisons of acupuncture with sham and no acupuncture control, and within each pain type. We pre-specified that the hypothesis test would be based on the fixed effects analysis as this constitutes a valid test of the null hypothesis of no treatment effect.
We identified 82 trials (see figure 1 for flowchart) of which 31 were eligible (Table 1 and Appendix online). Four of the studies were organized as part of the German Acupuncture Trials (GERAC) initiative [11–14], 4 were part of the Acupuncture Randomized Trials (ART) group [15–18]; 4 were Acupuncture in Routine Care (ARC) studies [19–22]; 3 were UK National Health Service acupuncture trials. [23–25] Eleven studies were sham controlled, 10 had no acupuncture control and 10 were three-armed studies including both sham and no acupuncture control. The second search for subsequently published studies identified an additional four eligible studies [26–29], with a total of 1,619 patients.
An important source of clinical heterogeneity between studies concerns the control groups. In the sham controlled trials, the type of sham included acupuncture needles inserted superficially , sham acupuncture devices with needles that retract into the handle rather than penetrate the skin30 and non-needle approaches such as deactivated electrical stimulation31 or detuned laser.  Moreover, co-interventions varied, with no additional treatment other than analgesics in some trials , whereas in other trials, both acupuncture and sham groups received a course of additional treatment, such as exercise led by physical therapists.  Similarly, the no acupuncture control groups varied between usual care, such as a trial in which control group patients were merely advised to “avoid acupuncture” ; attention control, such as group education sessions33; and guidelined care, where patients were given advice as to specific drugs and doses. 
Data extraction and quality assessment
Usable raw data were obtained from 29 of the 31 eligible trials, including a total of 17,922 patients from the US, UK, Germany, Spain and Sweden. For one trial, the study database had become corrupted ; in another case, the statisticians involved in the trial failed to respond to repeated enquiries despite approval for data sharing being obtained from the principal investigator. 
The 29 trials comprised 18 comparisons with 14,597 patients of acupuncture with no acupuncture group and 20 comparisons with 5,230 patients of acupuncture and sham acupuncture. Patients in all trials had access to analgesics and other standard treatments for pain. Four sham-controlled trials were determined to have an intermediate likelihood of bias from unblinding [13, 32, 36, 37]; the 16 remaining sham-controlled trials were graded as having a low risk of bias from unblinding. On average, drop-out rates were low (weighted mean 10%). Drop-out rates were only above 25% for four trials: Molsberger 2002  and 2010  (33% and 27%, but raw data not received and neither trial included in main analysis); Carlsson 2001  (46%, trial excluded in a sensitivity analysis for blinding) and Berman 2004  (31%). This had a high drop-out rate amongst no acupuncture controls (43%); drop-out rates were close to 25% in the acupuncture and sham groups. The Kerr trial had a large difference in drop-out rates between groups (acupuncture 13%, control 33%) but was excluded in the sensitivity analysis for blinding. 
Forest plots for acupuncture against sham acupuncture and against no acupuncture control are shown separately for each of the four pain conditions in figures 2 and 3. Meta-analytic statistics are shown in table 2. Acupuncture was statistically superior to control for all analyses (p<0.001). Effect sizes are larger for the comparison between acupuncture and no acupuncture control than for the comparison between acupuncture and sham: 0.37, 0.26 and 0.15 in comparison with sham versus 0.55, 0.57 and 0.42 in comparison with no acupuncture control for musculoskeletal pain, osteoarthritis and chronic headache respectively.
For five of the seven analyses, the test for heterogeneity was statistically significant. In the case of comparisons with sham acupuncture, the trials by Vas et al are clear outliers. For example, the effect size of the Vas trial for neck pain is about 5 times greater than meta-analytic estimate. One effect of excluding these trials in a sensitivity analysis (table 3) is that there is no significant heterogeneity in the comparisons between acupuncture and sham. Moreover, the effect size for acupuncture becomes relatively similar for the different pain conditions: 0.23, 0.16 and 0.15 against sham, and 0.55, 0.57 and 0.42 against no acupuncture control for back and neck pain, osteoarthritis, and chronic headache respectively (fixed effects; results similar for the random effects analysis).
To give an example of what these effect sizes mean in real terms, baseline pain score on a 0 – 100 scale for a typical trial might be 60. Given a standard deviation of 25, follow-up scores might be 43 in a no acupuncture group, 35 in sham acupuncture and 30 in patients receiving true acupuncture. If response were defined in terms of a pain reduction of 50% or more, response rates would be approximately 30%, 42.5% and 50%, respectively.
The comparisons with no acupuncture control show evidence of heterogeneity. This appears largely explicable in terms of differences between the control groups used. In the case of osteoarthritis, the largest effect is for Witt 2005 , where patients in the waiting list control received only rescue pain medication, and the smallest for Foster 200725, which involved a program of exercise and advice led by physical therapists. For the musculoskeletal analyses, heterogeneity is driven by two very large trials [19, 20] (n=2565 and n=3118) for back and neck pain. If only back pain is considered (table 3), heterogeneity is dramatically reduced and is again driven by one trial, Brinkhaus 2006 , with waiting list control. In the headache meta-analysis, Diener 2006  had much smaller differences between groups. This trial involved providing drug therapy according to national guidelines in the no acupuncture group, including initiation of beta-blockers as migraine prophylaxis. There was disagreement within the collaboration about whether this constituted active control. Excluding this trial reduced evidence of heterogeneity (p=0.04) but had little effect on the effect size (0.42 to 0.45).
Table 3 shows several pre-specified sensitivity analyses. Neither restricting the sham control trials to those with low likelihood of unblinding nor adjustment for missing data had any substantive effect on our main estimates. Inclusion of summary data from trials for which raw data were not obtained (2 trials) or which were published recently (4 trials) also had little impact on either the primary analysis (table 3) or the analysis with the outlying Vas trials excluded (data not shown).
To estimate the potential impact of publication bias, we entered all trials in to a single analysis and compared the effect sizes from small and large studies.  We saw some evidence that small studies had larger effect sizes for the comparison with sham (p=0.023) but not no acupuncture control (p=0.7). However, these analyses are influenced by the outlying Vas trials, which were smaller than average, and by indication, as the shoulder pain trials were small and had large effect sizes. Tests for asymmetry were non-significant when we excluded Vas and shoulder pain studies (n=15; p=0.065) and when small studies were also excluded(n<100, n=12; p=0.3). Nonetheless, we repeated our meta-analyses excluding trials with a sample size less than 100. This had essentially no effect on our results. As a further test of publication bias, we considered the possible effect on our analysis if we had failed to include high-quality, unpublished studies. Only if there were 47 unpublished trials with n=100 showing an advantage to sham of 0.25 standard deviations would the difference between acupuncture and sham lose significance.
A final sensitivity analysis examined the effect of pooling different endpoints measured at different periods of follow-up. We repeated our analyses including only pain endpoints measured at 2 – 3 months after randomization. There was no material effect on results: effect sizes increased by 0.05 to 0.09 SD for musculoskeletal and osteoarthritis trials and were stable otherwise.
As an exploratory analysis, we compared sham to no acupuncture control. In a meta-analysis of 9 trials [11–13, 15–18, 25, 33], the effect size for sham was 0.33 (95% C.I. 0.27, 0.40) and 0.38 (95% C.I. 0.20, 0.56) for fixed and random effects models respectively (p<0.001 for tests of both effect and heterogeneity).
Overview of findings
In an analysis of patient-level data from 29 high quality randomized trials, including 17,922 patients, we found statistically significant differences between both acupuncture versus sham and acupuncture versus no acupuncture control for all pain types studied. After excluding an outlying set of studies, meta-analytic effect sizes were similar across pain conditions.
The effect size for individual trials comparing acupuncture to no acupuncture control did vary, an effect that appears at least partly explicable in terms of the type of control used. As might be expected, acupuncture had a smaller benefit in patients who received a program of ancillary care – such as physical therapist led exercise  – than in patients who continued on usual care. Nonetheless, the average effect, as expressed in the meta-analytic estimate of approximately 0.5 standard deviations, is of clear clinical relevance whether considered either as a standardized difference  or when converted back to a pain scale. The difference between acupuncture and sham is of lesser magnitude, 0.15 to 0.23 standard deviations.
Neither study quality nor sample size appear to be a problem for this meta-analysis, on the grounds that only high quality studies were eligible and the total sample size is large. Moreover, we saw no evidence that publication bias, or failure to identify published eligible studies, could affect our conclusions.
As the comparisons between acupuncture and no acupuncture cannot be blinded, both performance and response bias are possible. Similarly, while we considered the risk of bias of unblinding low in most studies comparing acupuncture and sham acupuncture, providers obviously were aware of the treatment provided and, as such, a certain degree of bias of our effect estimate for specific effects cannot be entirely ruled out. However, it should be kept in mind that this problem applies to almost all studies on non-drug interventions. We would argue that the risk of bias in the comparison between acupuncture and sham acupuncture is low compared to other non-drug treatments for chronic pain, such as cognitive therapies, exercise or manipulation, which are rarely subject to placebo control.
Another possible critique is that the meta-analyses combined different endpoints, such as pain and function, measured at different times. However, results did not change when we restricted the analysis to pain endpoints measured at a specific follow-up time, 2 – 3 months after randomization.
Comparison with other studies
Many prior systematic reviews of acupuncture for chronic pain have had liberal eligibility criteria, accordingly included trials of low methodologic quality, and then came to the circular conclusion that weaknesses in the data did not allow conclusions to be drawn. [40, 41] Other reviews have not included meta-analyses, apparently due to variation in study endpoints. [42, 43] We have avoided both problems by including only high quality trials and obtaining raw data for individual patient data meta-analysis. Some more recent systematic reviews have published meta-analyses [44–47] and reported findings that are broadly comparable to ours with clear differences between acupuncture and no treatment control and smaller differences between true and sham acupuncture. Our findings have greater precision: all prior reviews have analyzed summary data, an approach of reduced statistical precision when compared to individual patient data meta-analysis [6, 48]. In particular, we have demonstrated a robust difference between acupuncture and sham control that can be distinguished from bias. This is a novel finding that moves beyond the prior literature.
We believe that our findings are both clinically and scientifically important. They suggest that the total effects of acupuncture, as experienced by the patient in routine clinical practice, are clinically relevant, but that an important part of these total effects is not due to issues considered to be crucial by most acupuncturists, such as the correct location of points and depth of needling. Several lines of argument suggest that acupuncture (whether real or sham) is associated with more potent placebo or context effects than other interventions [49–52]. Yet many clinicians would feel uncomfortable in providing or referring patients to acupuncture if it were merely a potent placebo. Similarly, it is questionable whether national or private health insurance should reimburse therapies that do not have specific effects. Our finding that acupuncture has effects over and above sham acupuncture is therefore of major importance for clinical practice. Even though on average these effects are small, the clinical decision made by doctors and patients is not between true and sham acupuncture, but between a referral to an acupuncturist or avoiding such a referral. The total effects of acupuncture, as experienced by the patient in routine practice, include both the specific effects associated with correct needle insertion according to acupuncture theory, non-specific physiologic effects of needling, and non-specific psychological (placebo) effects related to the patient’s belief that treatment will be effective.
We found acupuncture to be superior to both no acupuncture control and sham acupuncture for the treatment of chronic pain. Although the data indicate that acupuncture is more than a placebo, the differences between true and sham acupuncture are relatively modest, suggesting that factors in addition to the specific effects of needling are important contributors to therapeutic effects. Our results from individual patient data meta-analyses of nearly 18,000 randomized patients on high quality trials provide the most robust evidence to date that acupuncture is a reasonable referral option for patients with chronic pain.
The Acupuncture Trialists’ Collaboration is funded by an R21 (AT004189I from the National Center for Complementary and Alternative Medicine (NCCAM) at the National Institutes of Health (NIH) to Dr Vickers) and by a grant from the Samueli Institute. Dr MacPherson’s work has been supported in part by the UK National Institute for Health Research (NIHR) under its Programme Grants for Applied Research scheme (RP-PG-0707-10186). Eric Manheimer’s work on the Acupuncture Trialists’ Collaboration was supported by grant number R24 AT001293 from NCCAM The views expressed in this publication are those of the author(s) and not necessarily those of the NCCAM NHS, the NIHR or the Department of Health in England. No sponsor had any role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
The study was conceived by AV, GL, CW, and KL. AV was responsible for the overall study design with input from AC for the statistical analysis; AM for the systematic review; GL and HM with respect to acupuncture analyses; NV, CW, NF, KS and KL with respect to clinical trial methodology and meta-analysis. Statistical analyses were conducted by AV, AC and AM. The first draft of the manuscript was written by AV and AM. All authors gave comments on early drafts and approved the final version of the manuscript. AV had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.