Páginas

26 janeiro 2016

Cruzada Contra a Regressão Múltipla

A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging. ...

I hope that in the future, if I’m successful in communicating with people about this, that there’ll be a kind of upfront warning in New York Times articles: These data are based on multiple regression analysis. This would be a sign that you probably shouldn’t read the article because you’re quite likely to get non-information or misinformation.

The thing I’m most interested in right now has become a kind of crusade against correlational statistical analysis—in particular, what’s called multiple regression analysis. Say you want to find out whether taking Vitamin E is associated with lower prostate cancer risk. You look at the correlational evidence and indeed it turns out that men who take Vitamin E have lower risk for prostate cancer. Then someone says, "Well, let’s see if we do the actual experiment, what happens." And what happens when you do the experiment is that Vitamin E contributes to the likelihood of prostate cancer. How could there be differences? These happen a lot. The correlational—the observational—evidence tells you one thing, the experimental evidence tells you something completely different.

In the case of health data, the big problem is something that’s come to be called the healthy user bias, because the guy who’s taking Vitamin E is also doing everything else right. A doctor or an article has told him to take Vitamin E, so he does that, but he’s also the guy who’s watching his weight and his cholesterol, gets plenty of exercise, drinks alcohol in moderation, doesn’t smoke, has a high level of education, and a high income. All of these things are likely to make you live longer, to make you less subject to morbidity and mortality risks of all kinds. You pull one thing out of that correlate and it’s going to look like Vitamin E is terrific because it’s dragging all these other good things along with it.

This is not, by any means, limited to health issues. A while back, I read a government report in The New York Times on the safety of automobiles. The measure that they used was the deaths per million drivers of each of these autos. It turns out that, for example, there are enormously more deaths per million drivers who drive Ford F150 pickups than for people who drive Volvo station wagons. Most people’s reaction, and certainly my initial reaction to it was, "Well, it sort of figures—everybody knows that Volvos are safe."

Let’s describe two people and you tell me who you think is more likely to be driving the Volvo and who is more likely to be driving the pickup: a suburban matron in the New York area and a twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are not assigned their cars. We don’t say, "Billy, you’ll be driving a powder blue Volvo station wagon." Because of this self-selection problem, you simply can’t interpret data like that. You know virtually nothing about the relative safety of cars based on that study.

[...]

Continua aqui

RICHARD NISBETT is a professor of psychology and co-director of the Culture and Cognition Program at the University of Michigan. He is the author of Mindware: Tools for Smart Thinking; and The Geography of Thought.

Nenhum comentário:

Postar um comentário