How Evidence-Based Medicine Went Wrong on Hormone Replacement

This is a case study in how “evidence-based medicine” can harm the very patients it is supposed to help.

In the late 1990s, the generally accepted opinion was that postmenopausal hormone replacement therapy (HRT) increased the risk of breast and endometrial cancer, but decreased the risks of coronary heart disease and hip fracture. On balance, it was thought that the gain from lowering heart disease and hip fracture outweighed the increased risk of breast cancer in all patients except those with the lowest risk for heart disease and the highest risk for breast cancer (e.g., see Col et al., JAMA, 1997 and Barrett-Connor, BMJ, 1998).

Then, in what looked like a clear example of the benefits of evidence-based medicine, the National Heart, Lung, and Blood Institute stopped the estrogen-plus progestin HRT arm of the Women’s Health Initiative (WHI) in 2002. The reason:  the women crossed a preset threshold for breast cancer risk that was equivalent to an increase of less than a tenth of a percent per woman per year. The WHI was a large randomized controlled trial (RCT) designed to study various aspects of women’s health, including postmenopausal hormone replacement therapy (HRT) using continuous combined estrogen plus progestin for most participants, and estrogen-alone for those without a uterus. The estrogen only arm was also ended prematurely in 2004, when nearly 7 years of follow-up suggested that although estrogen decreased hip fracture, it did not affect heart disease, and it increased the risk of stroke by 8 additional cases per year for every 10,000 women.

The WHI results were widely publicized in the media before outside experts had a chance to examine and discuss them. Because the WHI was billed as a “gold-standard” RCT, by 2009, U.S. women had decreased their use of HRT by more than 70 percent.

Guidelines were hastily developed. The U.S. Preventive Services Task Force, citing the WHI as the “only trial powered to evaluate the effectiveness of hormone therapy for the primary prevention of multiple chronic conditions,” still recommends against “the use of combined estrogen and progestin for the prevention of chronic conditions in postmenopausal women” and finds that “the chronic disease prevention benefits of estrogen are unlikely to outweigh the harms in most postmenopausal women who have had a hysterectomy.”

But subsequent debate, and some new evidence on the importance of HRT timing, suggests that there were serious problems with the design of the WHI. The narrow viewpoint that elevates RCTs over all other kinds of knowledge may have led to the rejection of a therapy that, when properly individualized, has a real possibility of preventing disease and injury for millions of women in old age.

According to John Studd, past professor of Gynecology at Imperial College, London, the problems with the WHI were simple — it “used a treatment we don’t use on a group of patients we don’t treat.”

Using RCT results to guide treatment for individuals is generally problematic for two reasons. The first is that the sample of people participating in a RCT often does not represent the population of interest. The second, as Goldman explains, is that the RCT focus is on “average effects in heterogeneous groups, rather than on variation in those effects by clinically relevant characteristics of participants.”

Writers for the WHI group consistently claimed that its results applied to “healthy” women. Klaiber et al. (2005) note that at enrollment, 1,906 of these healthy women were taking medication to lower cholesterol, 5,988 were on medication to lower blood pressure, 296 had had prior heart attacks, 472 had a history of angina, and 138 had a history of prior strokes.

Klaiber et al. also note that the WHI screened for heart disease using a normal electrocardiogram, a test that is “no longer regarded as a definitive diagnostic measure of coronary artery disease.” They believe that an unknown number of participants likely had subclinical coronary artery atherosclerosis and were therefore predisposed to having a heart attack when HRT was administered.

Undetected atherosclerosis would be a major problem because evidence from human and animal studies suggests that HRT benefits depend upon how soon it is prescribed after menopause. At the earliest stages of atherosclerosis, exogenous estrogen may lower coronary risk. At more advanced stages, it may result in clotting, or rupture of arterial plaque. An estimated 70 percent of the women in the WHI were in an age group that would be expected to experience negative cardiovascular effects of HRT. Just 10 percent were in an age group likely to benefit from it.

In the general population, women with undesirable symptoms begin HRT within several years of menopause. Goldman reports that data from the Nurses’ Health Study suggest that 80 percent of HRT users initiate it within 2 or 3 years of the start of menopause, which begins at age 51, on average, in the U.S. But women in the WHI had to be a year past their last period. As a result, Klaiber et al. calculate that a large majority of the women in the study had already experienced 10 to 29 years of estrogen deficiency. Only 33 percent of the sample was less than 60 years old, and most of the heart attacks in the WHI were in the first year. Two European RCTs with mean ages of 53.5 years and similar HRT regimens had no first year heart attacks. (Lobo, 2004)

The WHI was also unlikely to have detected a benefit from HRT even if one existed. Naftolin et al. concluded that it “could not reasonably be expected to provide useful information regarding the cardio protective effects of [HRT]…in moderately to severely symptomatic women who were 50-54 years old at the start of the trial.” There were only 574 women in the trial who met those conditions. The incidence of cardiac events in this age group is 53 per 100,000 women per year. Even if HRT were so dangerous that it caused twice as many cardiac events, they calculate that more than 4,000 women would have had to have been enrolled in order to detect such a difference. Given that the WHI dropout rate was 42 percent, they estimate that the number needed rises to almost 9,000.

In a 2004 statement, Barbara Alving, the director of the WHI, stated that the study was stopped due to “an increased risk of breast cancer and because the risk of breast cancer, coronary heart disease, stroke, and blood clots outweighed the benefits on hip fracture and colorectal cancer.” Speroff  reports that an updated report on WHI adjudication of diagnoses produced a 10 percent disagreement for myocardial infarction and a 3 percent disagreement for death due to coronary heart disease, a difference that could, by itself, erase the statistical significance of the WHI results on coronary heart disease.

Still, it isn’t clear how the WHI group judgment about average risks applies to individuals. Speroff explains that 60 years of inconsistent results from studies on breast cancer and HRT suggest that any increase in risk is small, less than that of not exercising or of being overweight after menopause. Furthermore, it is still unknown whether HRT increases new breast cancers or accelerates the growth of existing tumors. On the other hand, there is general agreement that HRT use increases bone mineral density and reduces fracture risk by up to a third. Hip fracture rates in U.S. women were an estimated 793.5 per 100,000 in 2005. Breast cancer incidence rates are 124.3 per 100,000 women, with about 34 percent of cases afflicting women younger than the WHI sample.

In 2008, the European Menopause and Andropause Society updated its conclusions about the benefits and risks of HRT, stating that “in 50-59 year old women a “window of opportunity” for a benefit [of HRT] in cardiovascular disease displays a high plausibility,” that HRT treatment significantly decreases bone loss and risk of osteoporotic fractures, and that the risk of stroke remains of “low clinical impact” in women under 70 years old.

American women are still waiting.

They will continue to wait if evidence-based medicine advocates have their way, because only another equally big, expensive, RCT will be able to undo the results of the WHI.

Evidence-Based Practice Tools SummaryAlthough evidence-based medicine advocates insist that they promote the “conscientious, explicit and judicious use of current best evidence in making decisions about the care of the individual patients” and integrate “individual clinical expertise with the best available external clinical evidence from systematic research,” they instruct readers that “if you find that [a] study was not randomized…we’d suggest that you stop reading it and go on to the next article.” They also promote information hierarchies as shown by this pyramid from the University of Washington’s online evidence-based toolkit.

The hierarchical approach offers a rationale for centralized control. It can be used to justify getting rid of individual control over medical decisions by demonizing practice variations as going against “the evidence.” It implies that because medicine as traditionally practiced doesn’t rely on “scientific” evidence from RCTs, those who run and interpret them should be able to develop guidelines that override the authority of individual physicians treating individual patients in clinical practice.

While no one could possibly have a problem with using “current best evidence in making decisions,” there is no evidence that the current best evidence always comes from RCTs. Without knowledge of biological mechanisms and careful observations of patients that lead to high-quality cohort and observational studies, no one would have questioned the WHI results or been able to develop insights into the biological mechanisms that change the effects of HRT as a woman ages.

In the real world, science is a messy social endeavor that goes far beyond a sterile listing of acceptable results from underpowered RCTs. It mixes observation, experiment, controlled trials, and seemingly farfetched proposals into an ongoing conversation that takes place in papers, meetings, talks, informal conversations, formal education, tiny experiments and big randomized studies. Progress occurs when the good ideas are separated from the bad ones as other possible explanations are slowly ruled out. It is important to remember that RCTs are a small part of the overall scientific process, and that many of the medical miracles that we now take for granted developed over time in a messy mixture of informed judgment, acrimonious debate, careful observation, repeated trials, elegant experiments, inelegant experiments, and pure serendipity.

Comments (10)

Trackback URL | Comments RSS Feed

  1. Andrew says:

    Well, this post doesn’t really offer a better research-based alternative. So far, I think RCT’s through evidence-based medicine have produced far more unbiased data than the alternatives. You need to come up with a better mechanism if you’re going to attack the entire methodological process.

  2. Sam says:

    At the core of it all, without this “messy” process of research, we would have nothing. Granted, physical research vs research with social implications is an important distinction to make. However, the intention of evidence-based research is to assure better standards of criteria are evolving and implemented and followed during the research process.

  3. Tommy Beyer says:

    “Using RCT results to guide treatment for individuals is generally problematic for two reasons. The first is that the sample of people participating in a RCT often does not represent the population of interest. The second, as Goldman explains, is that the RCT focus is on “average effects in heterogeneous groups, rather than on variation in those effects by clinically relevant characteristics of participants.”

    I think it is more of an unbiased measure and we need proof that it most often does not represent the population of interest. Of course, anything at random has its risk, but biased subjects is another big risk that RCT’s are designed to prevent. I agree that in the end, it’s a multidimensional approach to the research process that should be followed before serious conclusions are made.

  4. Travis Murphy says:

    I’m not sure that Linda needs to provide another alternative to point out flaws, you’re assuming that there are no other alternatives.

  5. Jacob Druisdael says:

    I think it’s unfair to claim that all RCTs lack validity. There are some groups that produce quality data, but I certainly wouldn’t accept UW as the end-all-be-all.

  6. Gabriel Odom says:

    We all need to hark back to the scientific method. Hypotheses are never made into actionable law without multiple repetitions of the experiment. I firmly believe in evidence-based medicine. However, these conclusions should only be implemented in the form of Best Practice Advisories until the positive efficacy of the treatment is well-known and easily demonstrable.

  7. Maria Jimenez-Herrera says:

    I’m confused, should I be taking hormones or not?

  8. Bruce says:

    Evidenced-based medicine went wrong? No surprise here.

  9. David Hogberg says:

    I am a believer in evidence-based medicine–as long as I am free to reject the evidence. If I think the evidence is wrong–as Linda shows I would with HRT if I was a woman–or I think I am an exception, I should be free to go with a treatment that is not supported by the evidence. Evidence should be a guide, not a dictate.

    It’s also worth noting the work of Dr. John Ioannidis who is finding that a lot of the evidence relied on in medicine is flawed: http://www.theatlantic.com/magazine/print/2010/11/lies-damned-lies-and-medical-science/308269/

  10. Billy Scar says:

    @Andrew,

    Ms. Gorman provides clear evidence here that support her statement here. She may not be offering clear alternatives as to what can be changed, but she is clearly giving us readers a good amount of reasons to stop believing that evidence-based medicine is of any good to anybody. It’s a good start to point out the flaws of a specific program, while providing evidence that sustain your claim…and perhaps from there experts will come up with better approaches. Instead of looking at the glass half empty, look at it half full.