Thursday, November 4, 2010

Correlation and Causation: NOT THE SAME

I've been meaning to write about this issue for a long time but didn't have a nice test case until now (I wasn't really looking for one).  I was listening to a podcast, the Performance Nutrition Show (9/26 episode if you care), which is mildly entertaining.  The host mentioned a study (no, I won't look up the reference and quote the abstract, there's no point).  The study looked at Vitamin D levels in people and found a correlation between Vitamin D levels and body fat (an inverse correlation, meaning people with higher Vitamin D levels tend to have lower bodyfat) and fitness (I didn't listen enough to remember how they measured fitness).

Sounds great, right?  Take a Vitamin D supplement, which is cheap, and watch the pounds melt away.  Except not really.  Why do researchers do studies like this if they don't really show anything?  Because they're easy.  Get a bunch of people to come in and do a bodyfat analysis and donate some blood, check their blood levels for a bunch of things, see if there's a match anywhere.  Quick and easy.

What's the problem?  Basically, the problem is that the fact that there is a correlation between two variables does not mean that either one causes the other.  And if we don't understand the causation then there's no  take-home message. 

Why do people with higher Vitamin D levels have less bodyfat?  Maybe Vitamin D improves body composition through some biochemical mechanism.  If so, fantastic.  But maybe people who spend more time in the sun, outside - being active, presumably - tend to be in better shape.  That is, some third variable - exercise outside - causes both lowered body fat and higher Vitamin D.  Maybe bodyfat somehow inhibits Vitamin D formation or removes Vitamin D from the blood.  That would be bad, and interesting to know about, but it wouldn't entail that adding dietary Vitamin D would lower bodyfat.  Maybe there's some genetic link between the ability to synthesize or absorb Vitamin D and the tendency to put on fat.  People who tend to absorb Vitamin D well also tend to stay lean.  I'm not arguing for any of these results - the point is that a statistical correlation doesn't tell you anything about which of these explanations is correct.  And whether fat people should start popping Vitamin D gelcaps and expect to lose weight does depend on which explanation is correct.

We see this in lots of situations, often to public detriment.  The classic examples are studies showing that vegans score better on some indicator of health, be it blood pressure, metabolic syndrome, rates of contracting cancer, whatever.  These are correlational studies.  They show that people who choose to be vegan tend to be healthier in some way.  Does that mean that you, the health seeking human, should become vegan?  Absolutely not.  The fact is that in our society most people who choose to become vegan do so for health reasons.  How many of them do you think smoke?  Or shoot heroin?  How many of them exercise, do yoga, take the medications they're supposed to take on time?  The type of person who cares enough about health to choose to eat such a crappy diet under the misguided belief that it will make them healthier is very likely to make lots of other difficult choices for their health - and most of them will probably work.  Not smoking?  Good for you.  Meditation?  Good for you.  Avoiding other carcinogens?  Good.  Exercise?  Probably good.  Do you see the point?  Veganism correlates with health despite the diet, not because of it.  Well, that's only partly true - if you compare vegan diets to standard American diet of eat whatever crap you can find at a fast food joint or a 7-11, then the vegan diet probably really is healthier for you.

How should these things be studied?  Take Vitamind D.  A large number of people with low Vitamin D should be found and analyzed - bodyfat, health, etc.  Half should be given Vitamin D supplements and half a placebo.  Neither the people nor the researchers should know who has which - you give everyone a number, the person giving out the pills doesn't know if they're giving the real stuff or the placebo, and the doctor doing the measurements doesn't know which group his subject belongs to.  After 3 or 6 months or a year or whatever do the same analysis over again.  Try to find some way to account for the people, placebo or not, who didn't take their pills.  Then see if the subjects getting the real pills do better than the ones who got a placebo.  That will give you some real insight into whether the higher levels of Vitamin D really improve anything - really cause a change in any of the other variables you're looking into.

Why don't researchers do this type of study more often?  To be honest, it's really expensive and really hard.  Instead of a one time interview and blood test you're asking people to stick to some plan for an extended period of time.  You have to wait a lot longer to publish your results - it might take a year to get your data, instead of a long weekend.  And since most people don't or won't understand the difference in value between the two studies (one type is worthless, the other priceless), you're not getting a lot of publicity bang for your buck.  You can still make headlines with the correlational study and tie up your lab assistants a lot less.  Which gives them more time to take in your dry cleaning or fetch you coffee.

Correlational studies are all but useless.  They do show interesting stuff - if two variables do correlate, that's worth knowing - as a guide for further research.  If D levels inversely correlate to bodyfat it means we should do another study - a double blind study - to see if there's a causal relationship.  That's all.  It shouldn't guide public behavior because it doesn't really show anything of value.

Really, correlational studies should be published in a separate journal that is only read by research scientists and to which the general public doesn't have legal access.  Guys in lab coats should look at them and say, "Hmm, that's interesting, I'll design a study to test that for a causal relationship so I can get published in a real science journal!"  Only after the real research is done should journalists be allowed to write about the study and my mom be allowed to read about it in the paper or whatever (my mom is actually a pretty bad example of the general public, being a Ph.D in chemistry and a high level radiochemist, but leave that alone for now).  We'd all be a lot better off not being bombarded by headlines ripped from poorly conducted studies with crappy conclusion sections.

Of course, that's not going to happen.  Science journalists aren't scientists - they're journalists.  That means they took journalism classes in college.  I'm sure some of them are super smart and talented, but frankly that's not exactly required to become a journalist.  The journalist's goal is to make their story exciting enough to be picked up by the media outlet, not to give you good information.  "Two things may be connected, but we're not sure how or if it really means anything" doesn't sell as well as "Meat causes cancer!" 

So be very, very wary of anything you read if it comes out of a correlational study.  This is especially relevant to anyone following a paleo or low carb diet!  There are a number of reasons for this, but one important reason is that health conscious people in the United States have gravitated to a low fat, low animal product, high carb vegetarian/vegan diet for the last 30 years or so.  Why?   Because the government told them to.  Which means that if you survey 100 people who eat a lot of meat a handful might be paleo dieters who eat only grass fed meat, exercise, and generally take care of themselves, but most of them are the people who have totally given up on health.  They eat their 12 oz. steak (from a grain fed, hormone and antibiotic laden cow) with bread covered in margarine, 3-4 bottles of beer, a piece of pie for dessert, and then step outside for a cigarette.  Followed up by more beer and potato chips cooked in vegetable oil while watching TV.  Who do you think will have more cancer, heart disease, and diabetes?  Do you think it's because of the meat?  Of course not - but the correlational study won't make that clear.  And that's why you regularly see headlines that claim that meat eating causes cancer, heart disease, diabetes, and erectile dysfunction.

You can glean information from scientific journals, but it's tricky.  Studies contrasting closely related populations with very different lifestyles - like people living a native lifestyle contrasted with their close relatives who have moved to another island and eat McDonald's three times a week - can be very illuminating.  Studies examining chemical mechanisms can be useful.  Common sense doesn't hurt (there's an excellent argument about animal fat intake made by Richard Nikoley - if eating lots of animal fat was bad, then anybody who lost a significant amount of fat would be harmed, since when you lose bodyfat it has to go through your bloodstream just as if you had eaten it.  Since people who lose a lot of fat are in all ways healthier, then eating a lot of fat can't be bad for you). 

Don't let the conventional wisdom about nutrition get you down or deter you from your plan.  If you're at all concerned about those headlines, you can always get some labwork down.  Get your a1c and crp tested, get your metabolic panels done, learn enough to understand the results.  Be in charge of your own health - nobody cares about it more than you do.

And last, check this out.



  1. Another major cause of this sort of problem goes far beyond correlation and causation. Most studies in really good medical journals, like JAMA, are fatally flawed, even though the reviewers understand the flaw about which you're speaking.

    The reason is that someone does a big study and gathers a ton of data. Really, they collect a ton of data on a ton of factors. Then they run some t-tests to find significance value of some of the tests. Low and behold, some of them are significant to a p-value of .95! Great right?

    Well no. They tested 40 factors, since they had all that data lying around. By random chance 2 tests will come back with significant at the .95 level. Then, since no one publishes, or wants to mention negative results, they write their article with seemingly perfect methodology, neglecting to mention all of those other non-results, and it looks like good science.



  2. There are some hilarious examples of this. The China Study makes hundreds of comparisons and loudly trumpets anytime it gets a p-value under .05. Like nobody explained to them that if you make 200 comparisons between unrelated variables you'll get 10 instances of significance to .05. First week stats! Good point, Ian!