J Manipulative Physiol Ther 2001 (Sept); 24 (7): 457–466 ~ FULL TEXT
Gert Bronfort, DC, PhD, Willem J.J. Assendelft, MD, PhD,
Roni Evans, DC, Mitchell Haas, DC, Lex Bouter, PhD
Department of Research,
Wolfe-Harris Center for Clinical Studies,
Northwestern Health Sciences University,
Bloomington, MN 55431, USA.
Background: Chronic headache is a prevalent condition with substantial socioeconomic impact. Complementary or alternative therapies are increasingly being used by patients to treat headache pain, and spinal manipulative therapy (SMT) is among the most common of these.
Objective: To assess the efficacy/effectiveness of SMT for chronic headache through a systematic review of randomized clinical trials.
Study Selection: Randomized clinical trials on chronic headache (tension, migraine and cervicogenic) were included in the review if they compared SMT with other interventions or placebo. The trials had to have at least 1 patient-rated outcome measure such as pain severity, frequency, duration, improvement, use of analgesics, disability, or quality of life. Studies were identified through a comprehensive search of MEDLINE (1966-1998) and EMBASE (1974-1998). Additionally, all available data from the Cumulative Index of Nursing and Allied Health Literature, the Chiropractic Research Archives Collection, and the Manual, Alternative, and Natural Therapies Information System were used, as well as material gathered through the citation tracking, and hand searching of non-indexed chiropractic, osteopathic, and manual medicine journals.
Data Extraction: Information about outcome measures, interventions and effect sizes was used to evaluate treatment efficacy. Levels of evidence were determined by a classification system incorporating study validity and statistical significance of study results. Two authors independently extracted data and performed methodological scoring of selected trials.
Data Synthesis: Nine trials involving 683 patients with chronic headache were included. The methodological quality (validity) scores ranged from 21 to 87 (100-point scale). The trials were too heterogeneous in terms of patient clinical characteristic, control groups, and outcome measures to warrant statistical pooling. Based on predefined criteria, there is moderate evidence that SMT has short-term efficacy similar to amitriptyline in the prophylactic treatment of chronic tension-type headache and migraine. SMT does not appear to improve outcomes when added to soft-tissue massage for episodic tension-type headache. There is moderate evidence that SMT is more efficacious than massage for cervicogenic headache. Sensitivity analyses showed that the results and the overall study conclusions remained the same even when substantial changes in the prespecified assumptions/rules regarding the evidence determination were applied.
Conclusions: SMT appears to have a better effect than massage for cervicogenic headache. It also appears that SMT has an effect comparable to commonly used first-line prophylactic prescription medications for tension-type headache and migraine headache. This conclusion rests upon a few trials of adequate methodological quality. Before any firm conclusions can be drawn, further testing should be done in rigorously designed, executed, and analyzed trials with follow-up periods of sufficient length.
From the FULL TEXT Article
A previous systematic review assessing the effect of SMT
on chronic headaches has suggested that SMT may be a
worthwhile therapy for tension-type headache7. The findings
of our review, which includes 3 additional relatively highquality
RCTs, provide a basis for considering SMT in the
therapeutic management of migraine, chronic tension-type
and cervicogenic headaches. Although migraine, cervicogenic
headache and tension-type headache generally are
considered to be separate conditions, there is some support
in the literature for the notion that they represent a continuum
with several common underlying mechanisms, including
cervical spine dysfunction. [46,47] One possible explanation of
the apparent effect of SMT in chronic headache comes from
the results of several studies that have demonstrated that
headache can be induced experimentally by noxiously stimulating
tissues, including joint capsules, ligaments, and
paraspinal muscles, enervated by the cervical nerve roots
(C1-C3).  Headache pain caused by such stimulation may
be possible because of the common neurological pathways
shared by the trigeminal nucleus and the C1-C3 nerves. 
Different methodologies have been advocated for the systematic
review of studies addressing therapeutic efficacy.
[15,18,49-52] Given the nature of RCTs available for this
review, we chose to evaluate the strength of the evidence
based on the best-evidence synthesis method rather than a
formal meta-analysis. [9,53] A number of meta-analytical methods
have been advocated for combining results of RCTs. [15,54] It
is recognized by international experts that one of the most
important limitations of published meta-analyses is inadequate
control for clinical heterogeneity among synthesized
studies. [8,55,56] There is currently little consensus on decision
rules regarding statistical pooling of study results.  The
clinical heterogeneity of the trials, in terms of headache
type, patient characteristics, interventions, comparison therapies,
and outcome measure prevented statistical pooling in
A possible limitation of the current review is publication
bias, of which there are several potential sources.  No effort
was made to identify unpublished research,  which is more
likely to have negative outcomes.  However, it is recognized
that attempts to retrieve unpublished trial data may
also bias studies.  The search strategy may have missed
important studies not currently indexed, but by including
citation tracking of non-indexed journals it is unlikely that
many were overlooked. Optimally, reviews should include
all trials regardless of language. [61-63] However, this review
was initially restricted to the languages we spoke: English,
German, French, Dutch, and the Scandinavian languages.
Although an attempt was made to identify trials in other languages,
this approach was not fully systematic; the possibility
that some relevant trials may have been overlooked must
The evidence for efficacy or inefficacy rests primarily on
the results of a small number of RCTs of acceptable methodological
quality. A few additional high-quality RCTs in
the future could easily change the conclusions of our
review. [62,64] Little research has been done to determine what
constitutes a minimal clinically-important difference in
headache outcomes. The chosen cut-point of a medium
effect-size (0.5) difference to determine inferiority/superiority
of an intervention is somewhat arbitrary but similar to
other reported estimates. [65,66] Also, sensitivity analyses
showed that the results and the overall study conclusions
remained the same even when substantial changes in the prespecified
assumptions/rules regarding the evidence determination
The reliability with which different reviewers use similar
methodological scoring systems is a source of uncertainty. 
Conclusions regarding the weight of evidence are largely
dependent on the exact definition of the evidence classification
system used.  An additional methodological assessment
of the studies included in this review was performed
by using a 5-point scoring system developed by Jadad et
al.  This scale addresses 3 areas—randomization, double
blinding, and description of dropouts—which, if not
addressed adequately, may be important sources of bias.
Studies that scored highly with our system also scored relatively
high with the Jadad scale (correlation coefficient of
.62). It is important to note that none of the studies could
achieve higher than a 3-point score with the Jadad scale
because none of them were double-blinded.
Another possible limitation of this review is that we who
performed the methodological scoring were not blinded to
the authors and results of the individual RCTs because of
our familiarity with the SMT literature. Some maintain that
blinding yields significantly lower methodological scores, 
whereas others contend that it does not make a difference. 
Berlin et al  have demonstrated that the overall results of
meta-analyses are uninfluenced by blinding.
Limitations of the Individual Trials
Most of the headache trials, including those of acceptable
quality, have substantial methodological limitations. In the
trials by Boline et al  and Nelson et al,  9 withdrawal of
amitriptyline at the end of treatment is inconsistent with normal
clinical practice. The return of these patients to near baseline
values could be largely due to a medication rebound
effect, making the apparent advantage of the SMT group less
impressive. Longer periods of observation after treatment are
necessary to adequately judge the value of SMT as a potential
first line of therapy for tension-type headache.
In the trial by Nelson et al,  it appears that SMT has a
magnitude of effect similar to the commonly used prophylactic
medication amitriptyline. However, the trial was not
designed to assess equivalence and did not have sufficient
power to do so. Thus, whether the 2 therapies are equivalent
is still unknown. Another concern regarding this
study is the substantial loss of patients to follow up
(28%). Although the study investigators performed missing
data analyses, these can never fully compensate for
the loss of data.
The authors of the trials by Bove and Nilsson  conclude
that, as an isolated intervention, SMT does not have a positive
effect on episodic tension-type headache. However, by
its design the Bove and Nilsson trial did not assess the isolated
effect of SMT; rather it looked at the combined effect
of SMT with soft tissue massage. Whether there is an interaction
that results from combining SMT with soft tissue
massage is unknown. A more appropriate conclusion would
have been that SMT, when combined with soft tissue massage,
is no better than soft tissue therapy alone for episodic
tension-type headache. This conclusion neither supports
nor refutes the efficacy of SMT as a separate therapy.
In the trial by Parker et al, [38, 42] there is no description of
the dropouts, increasing the likelihood of bias. The extended
trial by Nilsson et al  on cervicogenic headache is somewhat
unorthodox in that the decision to recruit more patients
was made after the original analyses of the data. No prespecifications
were made regarding separate analyses of the
data, and one must be concerned about the possibility of a
Type I error.
The results of the remainder of the trials, which were of
lower methodological quality, all tend to suggest that SMT
was better than the comparison therapies. This is consistent
with studies in other fields that have shown that those of
lower methodological quality tend to have positive outcomes.
[52, 64, 70] Thus, one must interpret the results of these
trials with caution.
None of the studies reviewed evaluated the cost-effectiveness
of SMT for chronic headaches. Trials are needed to establish
SMT’s relative cost-effectiveness to other commonly
used therapies, and are particularly needed to address the
potential for long-term effects. Finally, caution should be
exercised when extrapolating from studies of SMT, because
there is substantial diversity in terms of training and technique
SMT appears to have a better effect than massage for cervicogenic
headache. It also appears that SMT has an effect
comparable with commonly used first-line prophylactic prescription
medications for tension-type headache and
migraine headache. This conclusion rests on a few trials of
adequate methodological quality. Before any firm conclusions
can be drawn, further testing should be done in rigorously
designed, executed, and analyzed trials with followup
periods of sufficient length.
Evaluation list for scoring: descriptions
Scoring: A YES score (+) is only used when all described
individual item criteria are met. A NO score (-) is only used
when it is clear from the article that none of the described
individual item criteria are met. UNCLEAR/PARTLY (p) is
used when the documentation or description is insufficient
to answer yes or no to whether any or all of the described
individual item criteria are met. The validity score (VS) is
the percentage score of the applicable validity items (maximum
of 14). (+) = 1, (p) = 1/2, and (-) = 0.
Are the inclusion and exclusion criteria clearly
defined? They must be stated explicitly. If a more detailed
description was needed, or only inclusion or exclusion criteria
were clearly defined, the score is UNCLEAR/PARTLY.
Is it established that the groups are comparable at baseline?
If different, are appropriate adjustments made during
the statistical analysis? Comparability should be present
especially for main outcomes, but also for important clinical
and demographic variables, such as age, gender, duration
and severity of condition, and known prognostic indicators.
Is the randomization procedure adequately described
and appropriate? If it was only noted that randomization
was used, the score is NO. To receive a YES score, the randomization
process must be described (ie, randomly generated
list, opaque envelopes), the method used (simple,
block, stratification, minimization) must be appropriate, and
the concealment of randomization must be described explicitly.
If only one or two of these criteria are met, a score of
UNCLEAR/PARTLY is the highest possible.
Is it established that at least one main outcome measure
was relevant to the condition under study, and were the
reliability and validity documented? This must be explicitly
established by investigation, appropriately referenced, or
generally accepted (eg, VAS scales, Oswestry, or Roland-
Morris disability scales). If all of the above conditions are
not met the score is NO.
Are patients blinded to the degree possible, and did the
blinding procedure work? This may not apply to study (na)
(eg, a comparison of a drug and physical therapy) and is therefore
not included in % scores. If the presence of either “optimal
blinding” or “effectiveness of blinding” is not documented,
a score of UNCLEAR/PARTLY is the highest attainable. If
at least one study involves a “blindable” intervention, then the
effectiveness of the blinding must be documented; otherwise a
score of UNCLEAR/PARTLY is the highest attainable.
Is it established that treatment providers were blinded
to the degree possible, and did the blinding procedure work?
This may not apply to study (na) and is therefore not included
in % scores.
Is it established that assessment of the primary outcomes
was unbiased? If assessment of outcomes could be
blinded, was it done? Was the effectiveness of blinding documented?
Was there documentation that patients were not
influenced by providers or investigators on how they scored
their own outcomes?
Is the postintervention follow-up period adequate and
consistent with the nature of the condition under study? This
may not apply to study (na) (eg, crossover designs) and is
therefore not included in % scores. This minimum followup
period is 1 month for acute conditions and 3 months for
chronic conditions in order to receive a YES score. A minimum
of 2 weeks for acute conditions and 1 month for
chronic conditions must be met for an UNCLEAR/PARTLY
Are the interventions described adequately? Did all
interventions follow a defined protocol? Is it possible from
the description in the article or reference to prescribe or
apply the same treatment in a clinical setting? If not, YES is
not an appropriate score.
Were differences in attention bias between groups controlled
for and explicitly described? Were time, provider
enthusiasm, and number of intervention sessions equivalent
among study groups?
Is comparison made to existing efficacious or commonly
practiced treatment option(s)? If a placebo controlled
study, has a comparison to existing efficacious standard
therapy been made previously?
Is the primary study objective (hypothesis) clearly
defined in terms of group contrasts, outcomes, and time points
a priori? (Many studies present biased posthoc conclusions.)
Is the choice of statistical test(s) of the main results
appropriate? Is the main analysis consistent with the design
and the type of the outcome variables?
Was it established at randomization that there was adequate
statistical power (??= 0.2 with ??= 0.05) to detect an a
priori determined clinically important between-group difference
of the primary outcome(s) including adjustment for
multiple tests and/or outcome measures?
Are confidence intervals (CI), or data allowing CI to be
Are all dropouts described for each study group separately
and accounted for in the analysis of the main outcomes? Look
for analysis of impact of dropouts or worst/best case analysis.
Almost all studies with appropriate follow-up periods that evaluated
the effects of therapeutic management of a condition will
have some attrition (>5%). If no dropouts, this item does not
apply to study (eg, studies with one intervention and outcomes
collected in same session) and is not included in % scores.
Are all missing data described for each study group
separately and accounted for in the analysis of the main outcomes?
Look for analysis of impact of missing data. Almost
all studies that evaluated the effects of therapeutic management
of a condition will have missing data (>5%). If no
missing data, this item will not apply to study (na) and is not
included in % scores.
If indicated, was an intention-to-treat analysis used? In
studies with documented full compliance with allocated
treatments, and no differential co-intervention between
groups, a YES score can apply. In single session studies (eg,
studies with one intervention and outcomes collected in
same session) this item does not apply (na) and is therefore
not included in % scores.
Were adjustments made for the number of statistical
tests (2 or more) when establishing cut-off point of P-level
for each test? If applicable (avoidance of increasing risk of
Type I errors), was it documented that this was an issue that
could have influenced the outcome of the study, and were
adjustments made (eg, Bonferonni’s or similar type of
adjustment)? If indicated adjustment(s) were incapable of
changing main result/outcome of study, or if study involved
only one test at one point in time, a score of ‘na’ applies.
Are the conclusions directly related to the primary
objectives of the study, and are they valid? Were the a priori
testable hypotheses tested and prioritized appropriately in
the conclusions (see also item L)?
Return to the HEADACHE Page