When Correlation Does Not Imply Causation: Why your gut microbes may not (yet) be a silver bullet to all your problems

by Dawn Chen
figures by Daniel Utter

Did you know that the divorce rate in Maine strongly correlates with the per capita consumption of margarine? Wow, maybe abstaining from margarine prevents divorce! I can definitely imagine a pop-media article with this eye-catching title. Before throwing out all margarine to save your marriage, an intelligent reader like you would probably think to yourself: “what absurdity, it’s probably a coincidence that the trends match, and there is no causal relationship between them after all.” 

Figure 1: Divorce rate in Maine is correlated with per capita consumption of margarine. Is this a relationship that occurred randomly, or is there something here worth digging further? (Source: Spurious Correlations)

Such coincidental-but-unverified associations can be found in scientific research too, especially in recent news covering the microbiome field that explores microbes (also known as microorganisms) that inhabit our bodies. “Good” microbes in our gut can help us absorb nutrients better and protect us against infections, while “bad” microbes can make us sick. As researchers dig deeper into our microbiome, they have found that microbes in our body are linked to a wide range of health outcomes and diseases, including obesity, diabetes, Alzheimer’s disease, depression, Multiple Sclerosis, ALS, and autism

Ostensibly, these results suggest that a new series of therapeutics is on the horizon; if we change our diet, eat more probiotics like yogurt, or replace our microbiome with “good” microbes, we are on-track to alleviating these diseases, right? However, the truth is more complicated than it seems. Most of these studies only suggest that a relationship exists between the microbiome and disease. We don’t yet know for certain how exactly the microbes caused the patients to be sick, or if the microbes caused the illness at all. Very often, news on the microbiome field falls into the “correlation does not imply causation” trap, where a relationship between two variables does not imply a direct cause-and-effect. 

Correlation does not imply causation

To critically evaluate existing scientific findings, we must first understand the difference between correlation and causation. Correlation means that there is a relationship, or pattern, between two different variables, but it does not tell us the nature of the relationship between them. 

In contrast, causation implies that beyond there being a relationship between two events, one event causes another event to occur. For example, if we don’t sleep, we will feel sleepy. The former (not sleeping) directly causes the latter (feeling sleepy).

The distinction between correlation and causation seems to be straightforward, but it’s easy to wrongly assume causation from correlation, especially when there is a complex interplay of variables. Here are some common themes of wrongly inferring causation from correlation, or why “correlation does not imply causation”:

Figure 2: Common misconceptions between correlation and causation. (1) The relationship between 2 events may be coincidental. (2) The cause and effect between 2 events may be reversed. (3) There may be a third, unknown, variable that confounds the relationship. 
  1. The relationship between both variables is coincidental

The correlation between unrelated variables can occur by chance. One example is the “Redskins Rule”, where the result of the last NFL game of the Washington Football Team before the US presidential elections accurately predicted every election result from 1936 – 2000. Intuitively, we know that the outcome of a football game has nothing to do with presidential elections – this observation is merely a coincidence. The more variables we examine, the more likely we will find unrelated variables that are correlated by random chance. 

  1. Reverse causality

Reverse causality means that there is a causative relationship between events A and B, but not in the order that you would expect – the cause and effect are reversed. For example, if we observe that the faster the windmill rotates, the more wind there is, we might falsely conclude that the windmills rotating causes the wind. However, we know that it is the wind that causes the windmills to rotate.

  1. A common (third) confounding variable causes both events

In some cases, there may be a hidden, underlying variable that causes events that appear to be correlated. We might assume that event A causes event B when in reality, there is another event C that causes both events A and B. For example, many researchers have previously found that alcohol consumption is associated with an increased risk for lung cancer. However, smoking was later shown to be a confounding factor. Individuals who consume more alcohol also happen to smoke more, which increases their risk for lung cancer. 

Observational studies can’t prove causation

While correlation is easily observable, determining causation is much more complicated and requires an appropriate experimental design. Ideally, we would want to conduct experiments in the lab, where we tightly control all variables except for the one that we are interested in. However, this is nearly impossible in human studies. To conduct a most rigorous randomized-controlled experiment, we probably would need participants to live in the same place, eat the same food, exercise and sleep at the same time, just to name a few variables. As a result, most human microbiome research has been largely observational. 

Figure 3: Workflow in most microbiome population studies. Researchers collect stool samples from healthy and sick participants, find the composition of these samples by sequencing, then analyze the data to find pattern differences between samples from healthy and sick participants. 

In most large-scale human microbiome studies like the Human Microbiome Project or American Gut Project, researchers recruit a group of participants, collect and sequence their feces samples, and simultaneously gather information on participants’ lifestyle, diet, and health statuses. By analyzing differences in the microbiome between individuals suffering from disease and healthy individuals, we can find correlations between microbiome composition and the disease of interest (Figure 3). 

It’s worth noting that the direction of causality in these relationships is often ambiguous. Specifically, scientists have found that patients, such as those suffering from inflammatory bowel disease, have different gut bacteria compared to healthy individuals. Did differences in the gut microbiome make the patient sick, or did the patient’s disease state itself (e.g. more diarrhea or inflammation) lead to differences in the gut microbiome? We are often quick to assume the former, that the bacteria have caused the disease, though the direction of this causal relationship is not so easily determined. Researchers tend to call this the “chicken and egg” problem. Furthermore, lifestyle is a big confounding factor. Patients who suffer from diseases often change their diet upon diagnosis or take drugs for treatment, which can change their gut microbiome composition. 

Figure 4: Removing confounding variables to find a true relationship in population studies. A one-on-one matching method, where each diseased patient is matched to a healthy control with a similar lifestyle, can help us better understand relationships between the gut microbiome and human disease. 

In an attempt to solve the problem of confounding variables, a recent publication in Nature by Ivan Vujkovic-Cvijin and co-workers picked out lifestyle differences that might be associated with microbiome composition. They found that gender, age, body mass index, and levels of alcohol consumption are the biggest confounders associated with both microbiome composition and disease status. To remove the effects of these confounders, the researchers used the approach of one-to-one matching, where a sick individual was matched with a healthy individual who had the same age, gender, and lifestyle habits. This is a common technique used in observational studies, where researchers cannot control for all variables under perfect experimental conditions (Figure 4).  Using this technique, the researchers discovered that many associations found previously between gut bacteria abundance and disease status are no longer statistically significant, suggesting that some gut microbiome changes attributed to disease might be a result of underlying confounders.

Stay healthy, stay skeptical

Despite the ambiguity surrounding causation, a growing number of commercial companies like Viome, uBiome (which was raided by the FBI last year for multiple insurance billing) or DayTwo have started marketing interventions for the microbiome. Customers would mail in a feces sample for sequencing, then based on the types of bacteria present in the sample, the companies will prescribe personalized nutritional information or provide customers with risk scores for different diseases. While these companies have good intentions of helping consumers understand their bodies, we need to critically evaluate their claims. 

The microbiome is undoubtedly important for our health. However, we are still not completely sure how exactly the microbiome does so or fits into disease progression, despite the hype surrounding largely correlative studies. To determine if the microbiome causes disease, some researchers are exploring the molecular mechanism of individual bacterial strains. Other researchers are working on designing experimental studies with a larger sample size and a more rigorous methodology. With better analysis tools and datasets, we will be able to uncover the complex functions these tiny living organisms hold in our bodies soon. In the meantime, grab a cup of yogurt, just because it’s tasty.  


Dawn Chen is a first-year Ph.D. student in Systems, Synthetic and Quantitative Biology at Harvard University. 

Daniel Utter is a 6th year Ph.D. student in Organismic and Evolutionary Biology at Harvard University.

Cover Image: “silver bullet” by eschipul is licensed under CC BY-SA 2.0

For More Information: