it’s not the end of the world as we know it, just yet

While hanging out at Columbia the past couple of years, i was comforted to realize that i wasn’t alone in my lack of enthusiasm for p-values. In fact, while some folks in some disciplines are completely infatuated with them, did you know there are others out there that dismiss them? i mean, entirely!? [removes tongue from cheek]

While it might be an overstatement to suggest that the variety of headlines that accompanied the news of the HIV vaccine trial in Thailand claimed that “AIDS is now a disease of the past,” it really wouldn’t be stretching the tone of some of those headlines too far. Some people really did seem to think this was the breakthrough we’d been waiting for. That is, those people who focused solely on the p-values. E.g., from one of the NYT pieces on it “Although the difference was a mere 23 people…it was statistically significant.”

But what most people seemed to fail to pick up on was the very next part of that sentence –“…and meant that the vaccine was 31.2 percent effective.” And the nuisance of a realization with that statement that no vaccine would ever be seriously considered for scale up before that number reached at least 80%, if not higher.

So, yes, this was the first such trial that showed any difference whatsoever. But was it worth all the hoopla that followed? Well, given Fauci’s quick re-interpretation for the public shortly thereafter, followed by some minor tweaks in analytic strategy of the data (mind you to follow more standard reporting practices) revealed that the “statistically significant” portion of the interpretation may not even be warranted, i think maybe no.

Now, i’m not posting about this to quibble about the significance (or lack thereof) of their findings. What i want to bring up is the important difference this highlights between statistical and substantive significance. In practical terms, this particular study’s findings isn’t likely to have much of an impact at all on the way we approach HIV-prevention (not even if the statistical significance were a slam dunk, i.e., p<0.001). Because while (potentially) statistically significant, it just wasn’t large enough to really have a meaningful impact. i say this as a disappointed HIV researcher who would be more than happy to welcome a vaccine to the available arsenal. However, i am not going to start doing cartwheels over small statistical differences, that could have happened by chance,* and ultimately are substantively small enough to not be implemented in any helpful way (i.e., producing a vaccine that reaches the public and actually prevents new infections).

All that to say something that is likely obvious to many reading this, but is often worth repeating – “Statistical significance is not always all it’s cracked up to be.”

*Remember p<0.05 does still mean that 1/20 times it could be random. And i don’t have a hard count on how many of these trials we’ve seen at this point, but the numbers have got to be adding up.

This entry was posted on Friday, October 9th, 2009 at 11:47 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to it’s not the end of the world as we know it, just yet

Michael Bishop says:

October 13, 2009 at 10:33 pm

This is a very good point. In economics, McCloskey is known for writing on this. Those interested in much more on the topic might pursue it here:

http://www.stat.columbia.edu/~cook/movabletype/archives/2009/02/mccloskey_et_al.html

Reply
Bob Hanneman says:

October 17, 2009 at 2:35 pm

Thanks for the posting — this is a great example of the difference between substantive and statistical significance for teaching (leaving aside whether the result actually passses a test).

A couple other things that I wish would be pointed out more, too…

What was the null hypothesis that was rejected? In the case of a vaccine, if the standard is 80% effectiveness, wouldn’t it make more sense to use, say, 25% or 50% as a null?

Leads to the main point: does anyone know of a really good way to talk about the power of tests that can be understood by folks who don’t know anything about inference?

Other minor point — I just bet the (small sample) standard error was calculated with a standard formula rather than estimated from the data (permutations, bootstrap, jackknife), and there was no discussion of how they came to have this particular sample.

Reply
Concerned Onlooker says:

October 20, 2009 at 4:26 am

This blog looks so promising. But no posts in 10 days? Please tell me you’re not abandoning the project!

Reply
Michael Bishop says:

October 21, 2009 at 1:56 pm

@Concerned Onlooker, thanks for the encouragement. I plan on posting about twice a week, even if my posts are sometimes just links with very little commentary. I do think that blogs tend to be much more successful when they have at least one new post a day, so perhaps that is something to shoot for. If we can’t attain it with the current contributors perhaps we should add more.

Reply

	successful life coac… on Transparency from the ASA and…
	Helpdesk.ipt.pw on Transparency from the ASA and…
	Difficult Relationsh… on Transparency from the ASA and…
	Game slot penghasil… on Assorted Links
	R and version contro… on Revision Control Statistics…

Permutations