When Correlation Does Not Imply Causation: Why your gut microbes may not (yet) be a silver bullet to all your problems
by Dawn Chen
figures by Daniel Utter
Did you know that the divorce rate in Maine strongly correlates with the per capita consumption of margarine? Wow, maybe abstaining from margarine prevents divorce! I can definitely imagine a pop-media article with this eye-catching title. Before throwing out all margarine to save your marriage, an intelligent reader like you would probably think to yourself: “what absurdity, it’s probably a coincidence that the trends match, and there is no causal relationship between them after all.”
Such coincidental-but-unverified associations can be found in scientific research too, especially in recent news covering the microbiome field that explores microbes (also known as microorganisms) that inhabit our bodies. “Good” microbes in our gut can help us absorb nutrients better and protect us against infections, while “bad” microbes can make us sick. As researchers dig deeper into our microbiome, they have found that microbes in our body are linked to a wide range of health outcomes and diseases, including obesity, diabetes, Alzheimer’s disease, depression, Multiple Sclerosis, ALS, and autism.
Ostensibly, these results suggest that a new series of therapeutics is on the horizon; if we change our diet, eat more probiotics like yogurt, or replace our microbiome with “good” microbes, we are on-track to alleviating these diseases, right? However, the truth is more complicated than it seems. Most of these studies only suggest that a relationship exists between the microbiome and disease. We don’t yet know for certain how exactly the microbes caused the patients to be sick, or if the microbes caused the illness at all. Very often, news on the microbiome field falls into the “correlation does not imply causation” trap, where a relationship between two variables does not imply a direct cause-and-effect.
Correlation does not imply causation
To critically evaluate existing scientific findings, we must first understand the difference between correlation and causation. Correlation means that there is a relationship, or pattern, between two different variables, but it does not tell us the nature of the relationship between them.
In contrast, causation implies that beyond there being a relationship between two events, one event causes another event to occur. For example, if we don’t sleep, we will feel sleepy. The former (not sleeping) directly causes the latter (feeling sleepy).
The distinction between correlation and causation seems to be straightforward, but it’s easy to wrongly assume causation from correlation, especially when there is a complex interplay of variables. Here are some common themes of wrongly inferring causation from correlation, or why “correlation does not imply causation”:
- The relationship between both variables is coincidental
The correlation between unrelated variables can occur by chance. One example is the “Redskins Rule”, where the result of the last NFL game of the Washington Football Team before the US presidential elections accurately predicted every election result from 1936 – 2000. Intuitively, we know that the outcome of a football game has nothing to do with presidential elections – this observation is merely a coincidence. The more variables we examine, the more likely we will find unrelated variables that are correlated by random chance.
- Reverse causality
Reverse causality means that there is a causative relationship between events A and B, but not in the order that you would expect – the cause and effect are reversed. For example, if we observe that the faster the windmill rotates, the more wind there is, we might falsely conclude that the windmills rotating causes the wind. However, we know that it is the wind that causes the windmills to rotate.
- A common (third) confounding variable causes both events
In some cases, there may be a hidden, underlying variable that causes events that appear to be correlated. We might assume that event A causes event B when in reality, there is another event C that causes both events A and B. For example, many researchers have previously found that alcohol consumption is associated with an increased risk for lung cancer. However, smoking was later shown to be a confounding factor. Individuals who consume more alcohol also happen to smoke more, which increases their risk for lung cancer.
Observational studies can’t prove causation
While correlation is easily observable, determining causation is much more complicated and requires an appropriate experimental design. Ideally, we would want to conduct experiments in the lab, where we tightly control all variables except for the one that we are interested in. However, this is nearly impossible in human studies. To conduct a most rigorous randomized-controlled experiment, we probably would need participants to live in the same place, eat the same food, exercise and sleep at the same time, just to name a few variables. As a result, most human microbiome research has been largely observational.
In most large-scale human microbiome studies like the Human Microbiome Project or American Gut Project, researchers recruit a group of participants, collect and sequence their feces samples, and simultaneously gather information on participants’ lifestyle, diet, and health statuses. By analyzing differences in the microbiome between individuals suffering from disease and healthy individuals, we can find correlations between microbiome composition and the disease of interest (Figure 3).
It’s worth noting that the direction of causality in these relationships is often ambiguous. Specifically, scientists have found that patients, such as those suffering from inflammatory bowel disease, have different gut bacteria compared to healthy individuals. Did differences in the gut microbiome make the patient sick, or did the patient’s disease state itself (e.g. more diarrhea or inflammation) lead to differences in the gut microbiome? We are often quick to assume the former, that the bacteria have caused the disease, though the direction of this causal relationship is not so easily determined. Researchers tend to call this the “chicken and egg” problem. Furthermore, lifestyle is a big confounding factor. Patients who suffer from diseases often change their diet upon diagnosis or take drugs for treatment, which can change their gut microbiome composition.
In an attempt to solve the problem of confounding variables, a recent publication in Nature by Ivan Vujkovic-Cvijin and co-workers picked out lifestyle differences that might be associated with microbiome composition. They found that gender, age, body mass index, and levels of alcohol consumption are the biggest confounders associated with both microbiome composition and disease status. To remove the effects of these confounders, the researchers used the approach of one-to-one matching, where a sick individual was matched with a healthy individual who had the same age, gender, and lifestyle habits. This is a common technique used in observational studies, where researchers cannot control for all variables under perfect experimental conditions (Figure 4). Using this technique, the researchers discovered that many associations found previously between gut bacteria abundance and disease status are no longer statistically significant, suggesting that some gut microbiome changes attributed to disease might be a result of underlying confounders.
Stay healthy, stay skeptical
Despite the ambiguity surrounding causation, a growing number of commercial companies like Viome, uBiome (which was raided by the FBI last year for multiple insurance billing) or DayTwo have started marketing interventions for the microbiome. Customers would mail in a feces sample for sequencing, then based on the types of bacteria present in the sample, the companies will prescribe personalized nutritional information or provide customers with risk scores for different diseases. While these companies have good intentions of helping consumers understand their bodies, we need to critically evaluate their claims.
The microbiome is undoubtedly important for our health. However, we are still not completely sure how exactly the microbiome does so or fits into disease progression, despite the hype surrounding largely correlative studies. To determine if the microbiome causes disease, some researchers are exploring the molecular mechanism of individual bacterial strains. Other researchers are working on designing experimental studies with a larger sample size and a more rigorous methodology. With better analysis tools and datasets, we will be able to uncover the complex functions these tiny living organisms hold in our bodies soon. In the meantime, grab a cup of yogurt, just because it’s tasty.
Dawn Chen is a first-year Ph.D. student in Systems, Synthetic and Quantitative Biology at Harvard University.
Daniel Utter is a 6th year Ph.D. student in Organismic and Evolutionary Biology at Harvard University.
Cover Image: “silver bullet” by eschipul is licensed under CC BY-SA 2.0
For More Information:
- Explore other fun correlations at Spurious Correlations.
- Microbiologist Brett Finlay and aging researcher Jessica Finlay discuss the ways to harness microbes in their book, The Whole-Body Microbiome.
- A book by microbiologist Martin Blaser dives into how the overuse of antibiotics might be fueling our modern plagues.
- Read more about the microbes within us in this book by science writer Ed Yong.
- Learn more about the need to establish causation and mechanism in microbiome studies in this research paper.