For any kind of comparative study, whether involving animals or humans, it is incumbent on the study designer to perform at least one sample size calculation. There should be few exceptions to the rule, and these are mostly related to phase 2 trials in which safety is far more important than efficacy, and the intention is to settle efficacy definitely in a larger phase 3 trial. Many times, when you read the medical literature you’ll see the authors claiming that it was a “pilot study” or a “convenience sample”; what that means in reality is that they just wanted to see what would happen because they either had a limited budget or they wanted to get a sense of whether their product had any decent activity. But, a study that is statistically underpowered not only might tell you very little, but could mislead you into thinking something is efficacious or effective when it is not. This is the classic type 1 statistical error, which in my opinion, is the most egregious issue you want to avoid. If you are conducting a pilot study, therefore, either do the calculation or have the decency to state that you are aware the study is underpowered!
Sample size calculations are all about determining the statistical power of an endpoint in a study and are typically done for the primary endpoint, but may be useful for any secondary endpoint in which you are contemplating some statistical testing. The two main errors one is trying to avoid are type 1 in which one incorrectly rejects a null hypothesis—in other words, a false positive result—or a type 2 error in which one does not reject a hypothesis when in fact one should, a false negative. In science and medicine, one of the biggest problems we face is that many studies are statistically underpowered. Consequently, their conclusions should be considered uncertain at best. On the flip side, studies that are deliberately overpowered by a large margin may run into clinical equipoise issues.
So, what is a good statistical power? Prior to the Millennium, 80% was considered adequate but these days 90% is a far better target to aim for. Anything below 80% should be considered suspect.
For sample size calculations, we use PASS, which is one of the best pieces of the software on the market. Although the calculation can be very simple sometimes (e.g., for proportions), when you need to simulate a complex variable distribution in a means calculation, or simulate a complicated survival pattern this requires a lot of expertise even using PASS. While there are other programs available, and one can also write a program in R, we’ve seen some calculations that were incorrect or at the very least were non-transparent. So, if you need to do a calculation, do get some statistical advice.
Statistical power determination is just one of the many issues we cover in our clinical trial design and analysis services, but if you need a quick conversation on the subject we are happy to help, and we’re only a phone call or email away.