From: "Lon Morgan, DC" To: chirosci-list@silcom.com Date: Mon, 9 Sep 1996 12:01:15 +0000 Subject: Guides to the Medical Literature - Part I Several months ago Dr. Smith posted several excellent work sheets on critical assessment of diagnostic tests and clinical trials. As further background, I am posting summaries from JAMA's User's Guides to the Medical Literature. I believe they can be useful in assessing the value and relevance of research studies we may encounter. ============================================= 'How to Use an Article about Therapy or Prevention: Are the Study Results Valid?" Summarized from: Guyatt GH, et.at., User's Guides to the Medical Literature, JAMA, 1993; 270:21 2598-2601. Copyrighted material THE FRAMEWORK One can usefully pose three questions about an article on therapy: 1. Are the Results of the Study Valid? This question concerns the validity or accuracy of the results and whether the treatment effect reported represents the true direction and magnitude of the treatment effect. In other words: Do these results represent an unbiased estimate of treatment effect, or have they been influenced in some systematic fashion to lead to a false conclusion? 2. What Were the Results? If the results are valid and the study is an unbiased assessment of treatment effect, then the results are worth examining. This second question considers the size and precision of the treatment's effect. The best estimate of that effect will be the study findings themselves, which are more precise in larger studies. 3. Will the Results Help Me In Caring for My Patients? This question has two parts: First, are the results applicable to my patient? You should hesitate to institute the treatment if your patient is too dissimilar from those in the trial, or if the outcome isn't important to your patient. Second, if the results are applicable, what is the impact of treatment? The impact depends on benefits and risks (side effects) of treatment, and the consequences of withholding treatment. Thus an effective therapy might be withheld if a patient's prognosis is already good without treatment, especially if the treatment is accompanied by important side effects. Doctors need an approach that is both efficient and comprehensive. We have therefore labeled validity criteria as "primary" - those few that can quickly be applied by readers with limited time and "secondary' - those that, though still important, can be reserved for articles that pass the initial guides and for readers who have both the need and the time for a deeper review. ARE THE RESULTS OF THIS ARTICLE VALID? Primary Guides Was the Assignment of Patients to Treatment Randomized? - During the 1970s and early 1980s surgeons increasingly undertook extracranial-intracranial bypass. They believed it prevented strokes in patients whose symptomatic cerebrovascular disease was otherwise surgically inaccessible. This conviction was based on clinical outcomes among nonrandoniized cohorts of patients who had and had not under gone this operation, for the former appeared to fare much better than the latter. To the surprise of many and the indignation of a few, a large multicenter randomized trial in which patients were randomly selected, demonstrated that the only effect of surgery was to make patients worse off in the immediate postsurgical period; longterm outcome was unaffected. Other surprises generated by randomized trials that contradicted the results of less rigorous trials include the demostration that steroids may increase (rather than reduce) mortality in patients with sepsis, that steroid injections do not ameliorate facet-joint back pain, and that plasmapheresis does not benefit patients with polymyositis. This demonstrated the value of treatments being assigned by random allocation, rather than by the conscious decisions of clinicians and patients. In short, clinical outcomes result from many causes, and treatment is just one of them: underlying severity of illness, the presence of comorbid conditions, and a host of other prognostic factors (unknown as well as known) often swamp any effect of therapy. Because these other features also influence the clinician's decision to offer the treatment at issue, nonrandomized studies of efficacy are inevitably Iimited in their ability to distinguish useful from useless or even harmful therapy. As confimation of this fact, it turns out that studies in which treatment is allocated by any method other than randomization tend to show larger (and frequently false-positive) treatment effects than do randomized trials. The beauty of randomization is that it assures, if sample size is sufficiently large, that both known and unknown determinants of outcome are evenly distributed between control groups. If a randomized trial has not been done the clinician still has to make a treatment decision and must rely on weaker studies. The nonrandomized studies provide much weaker evidence than do randomized trials. Were All Patients Who entered the Trial Properly Accounted for at Its Conclusion? This question has two components: was followup complete and were patients analyzed in the groups to which they were randomized? Was Follow-up Complete?- Every patient who entered the trial should be accounted for at its conclusion. If this is not done, or if substantial numbers of patients are reported as "lost to followup," the validity of the study is open to question. The greater the number of subjects who are lost, the more the trial may be subject to bias because patients who are lost often have different prognoses from those who are retained.. Readers can assume, in positive trials, that all patients lost from the treatment group did badly, and all lost from the control group did well, and then recalculate the outcomes under these assumptions. If the conclusions of the trial do not change, then the loss to follow-up was not excessive. If the conclusions would change, the strength of inference is weakened. The extent to which the inference is weakened will depend on how likely it is that treatment patients lost to follow-up all did badly, while control patients lost to follow-up all did well. Were Patients Analyzed in the Groups to which They Were Randomized?- Patients in randomized trials sometimes forget to take their medicine or even refuse their treatment altogether. Readers might think that such patients who never received their assigned treatment should be excluded from analyses for efficacy. Not so. The reasons people don't take their medication are often related to prognosis. In a number of randomized trials, noncompliant patients have fared worse than those who took their medications, even when their medications were placebos! Excluding noncompliant patients from the analysis leaves behind those who may be destined to have a better outcome and destroys the unbiased comparison provided by randomization. This strategy preserves value of randomization: prognostic factors that we know about, and those don't know about, will be, on averege equally distributed in the two groups, and the effect we see will be just that due to the treatment assigned. Secondary Guides Were Patients, Their Clinicians, and Study Personnel "Blind" to Treatment?- Patients who know that they are on a new, experimental treatment are likely to have an opinion about its efficacy, as are their clinicians or the other study personnel who are measuring responses to therapy. These opinions, whether optimistic or pessimistic, can systematically distort both the treatment and the reporting of treatment outcomes, thereby reducing our confidence in the study's results. In addition, unblinded study personnel who are measuring outcomes may provide different interpretations of marginal findings. The best way of avoiding bias is double-blinding, which is achieved in drug trials by administering a placebo. When you read reports on treatments in which patients and treating clinicians cannot be kept blind, you should note whether investigators have minimized bias by blinding those who assess clinical outcomes. Were the Groups Similar at the Start of the Trial?- For reassurance about a study's validity, readers would like to be informed if treatment and control groups were similar for all factors except one: whether they received the experimental therapy. Investigators provide this reassurance when they display the baseline prognostic features of the treatment and control patients. Randomization doesn't always produce groups balanced for known prognostic factors. When the groups are small, chance may place those with apparently better prognoses in one group. As sample size increases, this is less and less likely (one wouldn't be too surprised to see seven heads out of 10 coin flips, but one would be very surprised to see 70 heads out of 100 coin flips). The issue here is not whether there are statistically significant differences in known prognostic factors between treatment groups, but rather the magnitude of these differences. If they are large, the validity of the study may be compromised. The stronger the relationship between the prognostic factors and outcome, and the smaller the trial, the more the differences between groups will weaken the strength of any inference about efficacy. All is not lost if the treatment groups are not similar at baseline. Statistical techniques permit adjustment of the study result for baseline differences. Accordingly, readers should look for documentation of similarity for relevant baseline characteristics. When both unadjusted and adjusted analyses reach the same conclusion, readers justifiably gain confidence in the validity of the study result. Aside From the Experimental Intervention, Were the Groups Treated Equally? - Care for experimental and control groups can differ in a number of ways besides the test therapy, and differences in care other than that under study can weaken or distort the results. If one group received closer follow-up, events might be more likely to be reported, and patients may be treated more intensively with nonstudy therapies. For example, in trials of new forms of therapy for resistant rheumatoid arthritis, ancillary treatment with systemic steroids, if administered more frequently to the control group than to the treatment group, could obscure an experimental drug's true treatment effect. Interventions other than the treatment under study, when differentially applied to the treatment and control groups, are called "cointerventions." Cointervention is a more serious problem when double-blinding is absent. The foregoing five guides (two primary and three secondary), applied in sequence, will help the reader determine whether the results of an article on therapy are likely to be valid. If the results are valid, then the reader can proceed to consider the magnitude of the effect and the applicability to patients. Lon Morgan, DC lmorgan@primenet.com "The trouble with the world is that the stupid are cocksure, and the intelligent are full of doubt." Bertrand Russell -------------------- From: "Lon Morgan, DC" To: chirosci-list@silcom.com Date: Mon, 9 Sep 1996 15:22:36 +0000 Subject: Guides to the Medical Literature - Part II "What are the Study Results and will they help me in caring for my patients?" Summarized from: Guyatt GH, et.at., User's Guides to the Medical Literature, JAMA, 1994; 271:1 59-63. Copyrighted material - for personal use. INTRODUCTION Part I in this series dealt with whether a study of effectiveness of therapy was valid. Part II will consider how to understand and use the results of valid studies of therapeutic interventions. What Were the Results? How Large Was the Treatment Effect?- Most frequently, randomized clinical trials monitor how often patients experience some adverse outcome. Examples of these dichotomous outcomes (yes or no outcomes that either happen or don't happen) include cancer recurrence, myocardial infarction, and death. Patients either do or do not suffer an event, and the article reports the proportion of patients who develop such events. One way would be as the absolute difference between the proportion who died in the control group and the proportion who lived. Another way to express the impact of treatment would be as a relative risk (RR): the risk of events among patients receiving the new treatment, relative to that among controls. The most commonly reported measure of dichotomous treatment effects is the complement of this RR, and is called the relative risk reduction (RRR). It is expressed as a percent. An RRR of 25% means that the new treatment reduced the risk of death by 25% relative to that occurring among control patients; the greater the RRR, the more effective the therapy. How Precise Was the Estimate of Treatment Effect?- The true risk reduction can never be known; all we have is the estimate provided by rigorous controlled trials, and the best estimate of the true treatment effect is that observed in the trial. This estimate is called a "point estimate" in order to remind us that although the true value lies somewhere in its neighbourhood, it is unlikely to be precisely correct. Investigators tell us the neighborhood within which the true effect likely lies by the statistical strategy of calculating confidence intervals (CIS). We arbitrarily use the 95% Cl, which can be simply interpreted as defining the range that includes the true RRR 95% of the time. You'll seldom find the true RRR toward the extremes of this interval, and you'll find the true RRR beyond these extremes only 5% of the time, a property of the Cl that relates closely to the conventional level of "statistical significance" of P <.05. The larger the sample size of a trial, the larger the number of outcome events and the greater our confidence that the true RRR (or any other measure of efficacy) is close to what we have observed. The point estimate is the one value most likely to represent the true RRR. As one considers values farther and farther from the point estimate, they become less and less consistent with the observe d RRR. By the time one crosses the upper or lower boundaries of the 95% CI, the values are extremely unlikely to represent the true RRR, given the point estimate. When is the sample size big enough? In a "positive" study, a study in which the authors conclude that the treatment is effective, one can look at the lower boundary of the CI. If this risk reduction is still important, or "clinically significant," (that is, it is large enough for you to want to offer it to your patient), then the investigators have enrolled sufficient patients. The CI also helps us interpret "negative" studies in which the authors have concluded that the experimental treatment is no better than control therapy. All we need do is look at the upper boundary of the CI. If the RRR at this upper boundary would, if true, be clinically important, the study has failed to exclude an important treatment effect. The clinician must bear in mind the proviso about the arbitrariness of the choice of 95% boundaries for the CI. A reasonable alternative, a 90% Cl, would be somewhat narrower. What can the clinician do if the Cl around the RRR is not reported in the article? There are three approaches, and we present them in order of increasing complexity. The easiest approach is to examine the P value. If the P value is exactly.05, then the lower bound of the 95% confidence limit for the RRR has to lie exactly at 0 (an RR of 1), and you cannot exclude the possibility that the treatment has no effect. As the P value decreases below .05, the lower bound of the 95% confidence limit for the RRR rises above 0. A second approach, involving some quick mental arithmetic or a pencil and paper, can be used when the article includes the value for the standard error (SE) of the RRR (or of the RR). This is because the upper and lower boundaries of the 95% CI for an RRR are the point estimate plus and minus twice this SE. The third approach involves calculating the CIs yourself. Once you obtain the CIs, you know how high and low the RRR might be (that is, you know the precision of the estimate of the treatment effect) and can interpret the results as described above. Not all randomized trials have dichotomous outcomes, nor should they. For example, a new treatment for patients with chronic lung disease may focus on increasing their exercise capacity. Here too you should took for the 95% CIs around any difference in changes in exercise capacity and consider their implications. Having determined the magnitude and precision of the treatment effect, readers now can turn to the final question of how to apply the article's results to their patients and clinical practice. Will the Results Help Me in Caring for My Patients? The first issue to address is how confident you are that you can apply the results to a particular patient or patients in your practice. If the patient would have been enrolled in the study had she been there - that is, she meets all the inclusion criteria, and doesn't violate any of the exclusion criteria - there is little question that the results are applicable. If this is not the case, and she would not have been eligible for the study,judgement is required. The study result probably applies even if, for example, she was 2 years too old for the study, had more severe disease, had previously been treated with a competing therapy, or had a comorbid condition. A better approach than rigidly applying the study's inclusion and exclusion criteria is to ask whether there is some compelling reason why the results should not be applied to the patient. A compelling reason usually won't be found, and most often you can generalize the results to your patient with confidence. A final issue arises when our patient fits the features of a subgroup of patients in the trial report. In articles reporting the results of a trial (especially when the treatment doesn't appear to be efficacious for the average patient), the authors may have examined a large number of subgroups of patients at different stages of their illness, with different comorbid conditions, with different stages at entry, etc. Quite often these subgroup analyses were not planned ahead of time, and the data are simply "'dredged" to see what might turn up. Investigators may sometimes overinterpret these "data-dependent" analyses as demonstrating that the treatment really has a different effect in a subgroup of patients-those who are older or sicker, for instance, may be held up as benefitting substantially more or less than other subgroups of patients in the trial. Guides for deciding whether to believe these subgroup analyses can be summarized as follows: the treatment is really likely to benefit the subgroup more or less than the other patients if the difference in the effects of treatment in the subgroups: (1) is large; (2) is very unlikely to occur by chance; (3)results from a analysis specified as a hypothesis before the study began; (4) was one of only a very few subgroup analyses that were carried out; and (5) is replicated in other studies. To the extent that the subgroup analysis fails these criteria, clinicians should be skeptical about applying them to their patients. Were All Clinically Important Outcomes Considered?- Treatments are indicated when they provide important benefits. Demonstrating that a bronchodilator produces small increments in forced expired volume in patients with chronic airflow limitation, that a vasodilator improves cardiac output in heart failure patients, or that a lipid-lowering agent improves lipid profiles does not necessarily provide a sufficient reason for administering these drugs. What is required is evidence that the treatments improve outcomes that are important to patients, such as reducing shortness of breath during the activities required for daily living. We can consider forced expired volume in 1 second "substitute end points." That is, the authors have substituted these physiologic measures for the important outcomes (shortness of breath), usually because to confirm benefit on the latter they would have had to enroll many more patients and followed them for far longer periods of time. A dramatic recent example of the danger of substitute end points was found in the evaluation of the usefulness of antiarrhythmic drugs following myocardial infarction. Because such drugs had been shown to reduce abnormal ventricular depolarizations (the substitute end points) in the short run, it made sense they should reduce life-threatening arrythmias in the long run. Randomized trials on three drugs previously shown to be effective in reducing end points of arrythmia had to be discontinued when the researchers discovered substantially higher mortality in patients receiving them. Are the Likely Treatment Benefits Worth the Potential Harm and Costs? This introduces the idea of "number needed to treat", (NNT). The impact of treatment is related to its RRR and also to the risk of the adverse outcome the treatment is supposed to prevent. Before treating, we must consider our patient's risk of the adverse event if left untreated. For a given RRR, the higher the probability that a patient will experience an adverse outcome if we don't treat , the more likely the patient will benefit from treatment, and the fewer such patients we need to treat to prevent one event. Thus clinical efficacy is enhanced when the NNT is low. CONCLUSION Hopefully you're developing a sense of how to use the medical literature to resolve a treatment decision. First, define the problem clearly, and use one of a number of search strategies to obtain the best available evidence. Having found an article relevant to the therapeutic issue, assess the quality of the evidence. To the extent that the quality of the evidence is poor, any subsequent inference (and the clinical decision it generates) will be weakened. If the quality of the evidence is adequate, determine the range within which the true treatment effect likely falls. Then, consider the extent to which the results are generalizable to the patient at hand, and whether the outcomes that have been measured are important. If the generalizability is in doubt, or the importance of the outcomes questionable, support for a treatment recommendation will be weakened. Finally, by taking into account the patient's risk of adverse events, assess the likely results of the intervention. This involves a balance sheet looking at the probability of benefit and the associated costs (including monetary costs, and issues such as inconvenience) and risks. The bottom line of the balance sheet will guide your treatment decision. While this may sound like a challenging route to deciding on treatment, it is what clinicians implicitly do each time they administer therapy. Making the process explicit and being able to apply guidelines to help assess the strength of evidence will, we think, result in better patient care. ------------------------- From: "Lon Morgan, DC" To: chirosci-list@silcom.com Date: Tue, 10 Sep 1996 22:25:29 +0000 Subject: Guides to the Medical Literature - Part III Guides to the Medical Literature - Part III USING AN ARTICLE ABOUT DIAGNOSTIC TESTING Summarized from: Jaeschke R, et al, How to use an article about a diagnostic test: Are the results valid? JAMA. 1994; 271:5 389-91. Doctors may confront dilemmas when ordering and interpreting diagnostic tests, particularly considering the ever expanding technology. This article will review the principles of assessing articles about diagnostic tests and using the information they provide. Once a doctor has chosen a potential article, the following criteria may be applied: 1. Are the Results of the Study Valid? Whether one can believe the results of a study is determined by the methods used to carry it out. To say that the results are valid implies that the accuracy of the diagnostic test is close enough to the truth to render the further examination of the study worthwhile. First, you must determine if you can believe the results of the study by considering how the authors assembled their patients and how they applied the test and an appropriate reference (or "gold" ) standard to the patients. 2. What Are the Results of the Study? If you decide that the study results are valid, the next step is to determine the diagnostic test's accuracy. This is done by examining the test's likelihood ratios or "properties." 3. Will the Results Help Me in Caring for My Patients? The third step is to decide how to use the test. Are the results of the study generalizable - ie, can you apply them to your particular patient? How often are the test results likely to yield valuable information? Does the test provide additional information above and beyond the history and physical examination? Is it less expensive or more easily available than other diagnostic tests for the same target disorder? Ultimately, are patients better off if the test is used? ARE THE RESULTS OF THE STUDY VALID? Primary Guides Was There an Independent, Blind Comparison With a Reference Standard?- The accuracy of a diagnostic test is best determined by comparing it with the "truth." Accordingly, readers must assure themselves that an appropriate reference standard has been applied to every patient, along with the test under investigation. If you do accept the reference standard, the next question is whether the test results and the reference standard were assessed independently of each other. The more likely it is that the interpretation of a new test could be influenced by knowledge of the reference standard result (or vice versa), the greater the importance of the independent interpretation of both. Did the Patient Sample Include an Appropriate Spectrum of Patients to Whom the Diagnostic Test Will Be Applied in Clinical Practice?- A diagnostic test is really useful only to the extent it distinguishes between disorders that might otherwise be confused. Almost any test can distinguish the healthy from the severely afflicted; this ability tells us nothing about the clinical utility of a test. The true pragmatic value of a test is therefore established only in a study that closely resembles clinical practice. A vivid example of how the hopes raised with the introduction of a diagnostic test can be dashed by subsequent investigations comes from the study of carcinoembryonicantigen (CEA) in coIorectal cancer. CEA levels, when measured in 36 people with known advanced cancer of the colon or rectum, were elevated in 35 of them. At the same time, much lower levels were found in normal people and in a variety of other conditions. The results suggested that measurement of CEA levels might be useful in diagnosing colorectal cancer or even in screening for the disease. In subsequent studies of patients with less advanced stages of colorectal cancer and patients with other cancers or other gastrointestinal disorders, the accuracy of CEA measurements plummeted, and the use of CEA levels for cancer diagnosis and screening was abandoned. CEA is now recommended only as one element in the follow-up of patients with known colorectal cancer. Secondary Guides Once you are convinced that the article is describing an appropriate spectrum of patients who underwent the independent, blind comparison of a diagnostic test and a reference standard, most likely its results represent an unbiased estimate of the real accuracy of the test. However, you can further reduce your chances of being misled by considering a number of other issues. Did the Results of the Test Being Evaluated Influence the Decision to Perform the Reference Standard? - The properties of a diagnostic test will be distorted if its result influences whether patients undergo confirmation by the reference standard. This situation, sometimes called "verification bias" or "work-up bias", would apply, for example, when patients with suspected coronary artery disease and positive exercise tests were more likely to undergo coronary angiography (the reference standard than those with negative exercise tests. Were the Methods for Performing the Test Described in Sufficient Detail to Permit Replication? If the authors have concluded that you should use a diagnostic test, they must tell you how to use it. This description should cover all issues that are important in the preparation of the patient (diet, drugs to be avoided, precautions after the test), the performance of the test (technique, possibility of pain), and the analysis and interpretation of its results. Once the reader is confident that the article's results constitute an unbiased estimate of the test properties, she can determine exactly what (and how helpful) those test properties are. Lon Morgan, DC lmorgan@primenet.com "The trouble with the world is that the stupid are cocksure, and the intelligent are full of doubt." Bertrand Russell