Muliple Testing



How Can We Test Multiple Hypotheses?

Let’s say we have a set of hypotheses that we want to test at the same time. Our first thought might be to test each hypothesis separately, using some level of significance α. Sounds like a descent enough idea.

But let’s consider a case where we have 15 hypotheses to test, and a significance level of 0.05. What’s the probability of observing at least one significant result just due to chance? 

P(at least one significant result) = 1 − P(no significant results) = 1 − (1 − 0.05)**15 ≈ 0.53. 

So, with 15 tests being considered, we have a 53% chance of observing at least one significant result, even if all of the tests are not actually significant. That’s going to be a problem if we have many hypotheses to test. So how can we test multiple hypotheses without increasing our probability of observing a significant result just due to chance? 

Bonferroni Correction

The Bonferroni correction is a method for correcting for this phenomenon. The significance cut-off at α/n where n is the number of tests. In our previous example, with 15 tests and α = 0.05, you’d only reject a null hypothesis if the p-value is less than 0.003333. Now if we calculate the chance of observing a significant result by chance we get, P(at least one significant result) = 1 − P(no significant results) = 1 − (1 − 0.003333)**15 ≈ 0.04885. This is much closer to our desired level of .05, it’s even a bit under so we are being conservative here.

P-Hacking

Failing to use the Bonferroni correction is a type of p-hacking. 

P-hacking is the conscious or subconscious manipulation of data in a way that produces a desired p-value, typically in the form of obtaining a significant result that is not actually significant. Assuming that we are honest researchers we want to avoid p-hacking when we are performing analysis so that we don’t come to erroneous conclusions. As the saying goes, torture your data long enough and it will confess.