Title: | Compute Sample Size that Meets Requirements for Average Power and FDR |
---|---|
Description: | Defines a collection of functions to compute average power and sample size for studies that use the false discovery rate as the final measure of statistical significance. |
Authors: | Stan Pounds <[email protected]> |
Maintainer: | Stan Pounds <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-11-22 05:21:52 UTC |
Source: | https://github.com/cran/FDRsampsize |
A general approach to performing power and sample size calculations for microarray studies has been developed in the literature. However, the software associated with those articles implements the approach only for studies that will perform the t-test or one-way ANOVA to compare gene expression across two or more groups. Here, we describe a set of R routines that implement the general method for power and sample size calculations for a wider variety of statistical tests. These routines accept the name of a function that computes the power for the statistical test of interest and thus have the flexibility to perform calculations for virtually any statistical test with a known power formula.
Package: | FDRsampsize |
Type: | Package |
Version: | 1.0 |
Date: | 2016-01-06 |
License: | GPL(>=2) |
Stan Pounds <[email protected]>
A Onar-Thomas, S Pounds. FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance, Manuscript 2016.
Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
Compute the anticipated FDR for given sample size, p-value threshold, and effect size.
afdr (n, alpha, pow.func, eff.size, lam = 0.95, eps = 1e-04, ...)
afdr (n, alpha, pow.func, eff.size, lam = 0.95, eps = 1e-04, ...)
n |
sample size (scalar) |
alpha |
p-value cut-off (scalar) |
pow.func |
an R function that computes statistical power |
eff.size |
effect size vector |
lam |
p-value at which to evaluate ensemble PDF (for pi.star) |
eps |
epsilon for numerical differentiation |
... |
additional agruments for the functions |
The aFDR is defined by Pounds and Cheng (2005) as the anticipated false discovery rate incurred by performing all tests with p-value threshold alpha given the same size effect size and power function.
the aFDR
Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
afdr(n=50,alpha=0.01,pow.func=power.twosampt,eff.size=rep(c(1,0),c(100,900)))
afdr(n=50,alpha=0.01,pow.func=power.twosampt,eff.size=rep(c(1,0),c(100,900)))
Find the p-value threshold that satisfies an FDR requirement (if such a threshold exists)
alpha.fdr (fdr, n, pow.func, eff.size, null.effect, lam = 0.95, eps = 1e-04, tol = 1e-04, ...)
alpha.fdr (fdr, n, pow.func, eff.size, null.effect, lam = 0.95, eps = 1e-04, tol = 1e-04, ...)
fdr |
Desired FDR, scalar |
n |
sample size |
pow.func |
an R function to compute statistical power |
eff.size |
effect size vector |
null.effect |
value of effect size that corresponds to the null hypothesis |
lam |
the lambda parameter in computing the pi0 (proportion of tests with a true null) estimate of Storey (2002) |
eps |
epsilon for numerical differentiation |
tol |
tolerance for bisection solution to alpha |
... |
additional agruments for the functions |
a list with the following components:
fdr |
the FDR at that alpha |
alpha |
the determined alpha |
OK |
indicates if the requirement is met |
A Onar-Thomas, S Pounds "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2015.
Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
alpha.fdr(fdr=0.1,n=50,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
alpha.fdr(fdr=0.1,n=50,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
Find p-value cut-off that yields desired average power given n and effect size
alpha.power (ave.pow, n, pow.func, eff.size, null.effect, tol = 1e-06, ...)
alpha.power (ave.pow, n, pow.func, eff.size, null.effect, tol = 1e-06, ...)
ave.pow |
desired average power (scalar) |
n |
sample size |
pow.func |
name of R function to compute statistical power |
eff.size |
effect size vector |
null.effect |
value of effect size that corresponds to null hypothesis |
tol |
tolerance for bisection solution to alpha |
... |
additional agruments for the functions |
a list with the following components:
alpha |
desired value of alpha |
ave.pow |
average power at that alpha |
A Onar-Thomas, S Pounds. "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2015. Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
alpha.power(ave.pow=0.8,n=50,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
alpha.power(ave.pow=0.8,n=50,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
Compute average power for given sample size, effect size, and p-value threshold
average.power (n, alpha, pow.func, eff.size, null.effect, ...)
average.power (n, alpha, pow.func, eff.size, null.effect, ...)
n |
sample size |
alpha |
p-value cut off (scalar) |
pow.func |
an R function to compute statistical power |
eff.size |
effect size vector |
null.effect |
value of effect size that corresponds to null hypothesis |
... |
additional agruments for the functions |
average power (scalar)
Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271. Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38. Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
average.power(n=50,alpha=0.01,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
average.power(n=50,alpha=0.01,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
Compute the average power at a specific level of FDR control for a given effect size and sample size
fdr.power (fdr, n, pow.func, eff.size, null.effect, lam = 0.95, eps = 1e-04, tol = 1e-04, ...)
fdr.power (fdr, n, pow.func, eff.size, null.effect, lam = 0.95, eps = 1e-04, tol = 1e-04, ...)
fdr |
Desired FDR, scalar |
n |
sample size |
pow.func |
name of R function to compute statistical power |
eff.size |
effect size vector; will be provided as the third argument of pow.func |
null.effect |
value of effect size that corresponds to null hypothesis |
lam |
name of R function to compute statistical power |
eps |
epsilon for numerical differentiation |
tol |
tolerance for bisection solution to alpha |
... |
additional agruments for the functions |
average power (scalar) of the tests with a false null hypothesis
A Onar-Thomas, S Pounds "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2016.
Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38.
Pounds S and Cheng C (2005) Sample size determination for the false discovery rate. Bioinformatics 21(23): 4263-71.
fdr.power(fdr=0.10,n=50,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
fdr.power(fdr=0.10,n=50,pow.func=power.twosampt, eff.size=rep(0:1,c(900,100)),null.effect=0)
Determines the sample size needed to achieve a desired average power while controlling the FDR at a specified level.
fdr.sampsize (fdr, ave.pow, pow.func, eff.size, null.effect, max.n = 500, min.n = 5, tol = 1e-05, eps = 1e-05, lam = 0.95, ...) ## S3 method for class 'fdr.sampsize' print(x,...)
fdr.sampsize (fdr, ave.pow, pow.func, eff.size, null.effect, max.n = 500, min.n = 5, tol = 1e-05, eps = 1e-05, lam = 0.95, ...) ## S3 method for class 'fdr.sampsize' print(x,...)
fdr |
Desired FDR (scalar numeric) |
ave.pow |
Desired average power (scalar numeric) |
pow.func |
Character string name of function to compute power; must accept n, alpha, and eff.size as its first three arguments. Other optional arguments may also be provided. |
eff.size |
Numeric vector of effect sizes; interally, this will be provided as the third argument of pow.func |
null.effect |
Scalar value of the effect size under the null hypothesis. This may be 0 or 1 for tests that respectively use differences or ratios for comparisons. |
max.n |
Maximum n to consider |
min.n |
Minimum n to consider |
tol |
Tolerance for bisection calculations |
eps |
Epsilon for numerical differentiation |
lam |
Lambda for computing anticipated pi0 estimate, see Storey 2002. |
x |
result of the fdr.sampsize function |
... |
additional arguments for pow.func |
This function checks the technical conditions regarding whether the desired FDR can be achieved by min.n or max.n before calling n.fdr. Thus, for most applications, fdr.sampsize should be used instead of n.fdr.
fdr.sampsiz
e returns an object of class 'FDRsampsize', which is a list with the following components:
OK |
indicates if the requirement is met |
n |
the computed sample size |
alpha |
the p-value threshold that gives the desired FDR |
fdr.hat |
anticipated value of the FDR estimate given n and effect size |
act.fdr |
actual expected FDR given n and effect size |
ave.pow |
average power |
act.pi |
actual value of pi0, the proportion of tests with a true null hypothesis. |
pi.hat |
expected value of the pi0 estimate |
eff.size |
input effect size vector |
A Onar-Thomas, S Pounds. "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2015.
Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
power.twosampt # show the power.cox function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.twosampt, eff.size=rep(c(1,0),c(10,990)), null.effect=0) res
power.twosampt # show the power.cox function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.twosampt, eff.size=rep(c(1,0),c(10,990)), null.effect=0) res
Find smallest sample size that meets requirements for average power and FDR
n.fdr (ave.pow, fdr, pow.func, eff.size, null.effect, lam = 0.95, eps = 1e-04, n0 = 5, n1 = 500, tol = 1e-06, ...)
n.fdr (ave.pow, fdr, pow.func, eff.size, null.effect, lam = 0.95, eps = 1e-04, n0 = 5, n1 = 500, tol = 1e-06, ...)
ave.pow |
required average power (scalar) |
fdr |
required FDR (scalar) |
pow.func |
name of R function that computes statistical power |
eff.size |
effect size vector |
null.effect |
Value of effect size that indicates null |
lam |
p-value at which to evaluate ensemble PDF |
eps |
epsilon for numerical differentiation |
n0 |
smallest sample size to be considered for bisection |
n1 |
maximum sample size to be considered for bisection |
tol |
tolerance for solving for alpha in iterations |
... |
additional agruments for the functions |
This performs the sample size calculation without checking whether the minimum or maximum sample size satisfy the desired requirements. The fdr.sampsize function checks these conditions and then calls n.fdr. Thus, many users will may prefer to use the sampsize.fdr function instead of n.fdr.
a list with the following components:
n |
a sample size estimate |
alpha |
the p-value cut-off |
fdr.hat |
an approximation to the expected value of the FDR estimate given n |
ave.pow |
the average power |
fdr.act |
the actual FDR given n |
pi.hat |
expected value of the pi.hat estimator given n |
act.pi |
actual pi0 |
A Onar-Thomas, S Pounds. "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2015.
Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
Compute an approximation of the expected value of the null proportion estimate given the sample size and effect size.
pi.star (n, pow.func, eff.size, lam = 0.95, eps = 1e-04, ...)
pi.star (n, pow.func, eff.size, lam = 0.95, eps = 1e-04, ...)
n |
sample size |
pow.func |
an R function to compute statistical power |
eff.size |
effect size vector |
lam |
p-value at which to numerically evaluate p-value pdf (scalar) |
eps |
epsilon for numerical differentiation |
... |
additional agruments for the functions |
scalar value for approximated E(pi.hat)
#> Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Use the formula of Hseih and Lavori (2000) to compute the power of a single-predictor Cox model.
power.cox (n, alpha, logHR, v)
power.cox (n, alpha, logHR, v)
n |
number of events (scalar) |
alpha |
p-value threshold (scalar) |
logHR |
log hazard ratio (vector) |
v |
variance of predictor variable (vector) |
vector of power estimates for two-sided test
Hsieh, FY and Lavori, Philip W (2000) Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. Controlled Clinical Trials 21(6):552-560.
power.cox # show the power.cox function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.cox, eff.size=rep(c(log(2),0),c(100,900)), null.effect=0, v=1) res
power.cox # show the power.cox function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.cox, eff.size=rep(c(log(2),0),c(100,900)), null.effect=0, v=1) res
Use the formula of Hart et al (2013) to compute power for comparing RNA-seq expression across two groups assuming a negative binomial distribution.
power.hart (n, alpha, log.fc, mu, sig)
power.hart (n, alpha, log.fc, mu, sig)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
log.fc |
log fold-change (vector), usual null hypothesis is log.fc=0 |
mu |
read depth per gene (vector, same length as log.fc) |
sig |
coefficient of variation (CV) per gene (vector, same length as log.fc) |
This function is based on equation (1) of Hart et al (2013). It assumes a negative binomial model for RNA-seq read counts and equal sample size per group.
vector of power estimates for the set of two-sided tests
SN Hart, TM Therneau, Y Zhang, GA Poland, and J-P Kocher (2013). Calculating Sample Size Estimates for RNA Sequencing Data. Journal of Computational Biology 20: 970-978.
power.hart # show the power function n.hart=2*(qnorm(0.975)+qnorm(0.9))^2*(1/20+0.6^2)/(log(2)^2) # Equation 6 of Hart et al power.hart(n.hart,0.05,log(2),20,0.6) # Recapitulate 90% power res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.hart, eff.size=rep(c(log(2),0),c(100,900)), null.effect=0,mu=5,sig=1) res
power.hart # show the power function n.hart=2*(qnorm(0.975)+qnorm(0.9))^2*(1/20+0.6^2)/(log(2)^2) # Equation 6 of Hart et al power.hart(n.hart,0.05,log(2),20,0.6) # Recapitulate 90% power res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.hart, eff.size=rep(c(log(2),0),c(100,900)), null.effect=0,mu=5,sig=1) res
Use the formula of Li et al (2013) to compute power for comparing RNA-seq expression across two groups assuming the Poisson distribution.
power.li (n, alpha, rho, mu0, w = 1, type = "w")
power.li (n, alpha, rho, mu0, w = 1, type = "w")
n |
per-group sample size |
alpha |
p-value threshold |
rho |
fold-change, usual null hypothesis is that rho=1 |
mu0 |
average count in control group |
w |
ratio of total number of |
type |
type of test: "w" for Wald, "s" for score, "lw" for log-transformed Wald, "ls" for log-transformed score. |
This function computes the power for each of a series of two-sided tests defined by the input parameters. The power is based on the sample size formulas in equations 10-13 of Li et al (2013). Also, note that the null.effect is set to 1 in the examples because the usual null hypothesis is that the fold-change = 1.
vector of power estimates for two-sided tests
C-I Li, P-F Su, Y Guo, and Y Shyr (2013). Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution. Int J Comput Biol Drug Des 6(4). doi:10.1504/IJCBDD.2013.056830
power.li # show the power function power.li(88,0.05,1.25,5,0.5,"w") # recapitulate 80% power in Table 1 of Li et al (2013) res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.li, eff.size=rep(c(1.5,1),c(100,900)), null.effect=1, mu0=5,w=1,type="w") res
power.li # show the power function power.li(88,0.05,1.25,5,0.5,"w") # recapitulate 80% power in Table 1 of Li et al (2013) res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.li, eff.size=rep(c(1.5,1),c(100,900)), null.effect=1, mu0=5,w=1,type="w") res
Estimate power of the one-sample t-test;Uses classical power formula for one-sample t-test
power.onesampt (n, alpha, delta, sigma = 1)
power.onesampt (n, alpha, delta, sigma = 1)
n |
number of events (scalar) |
alpha |
p-value threshold (scalar) |
delta |
difference of actual mean from null mean (vector) |
sigma |
standard deviation (vector or scalar, default=1) |
vector of power estimates for two-sided test
power.onesampt # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.onesampt, eff.size=rep(c(2,0),c(100,900)), null.effect=0, sigma=1) res
power.onesampt # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.onesampt, eff.size=rep(c(2,0),c(100,900)), null.effect=0, sigma=1) res
Compute power of one-way ANOVA;Uses classical power formula for ANOVA;Assumes equal variance and sample size
power.oneway (n, alpha, theta, k = 2)
power.oneway (n, alpha, theta, k = 2)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
theta |
sum of ((group mean - overall mean)/stdev)^2 across all groups for each hypothesis test (vector) |
k |
the number of groups to be compared, default k=2 |
For many applications, the null effect is zero for the parameter theta described above.
vector of power estimates for test of equal means
power.oneway # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.oneway, eff.size=rep(c(2,0),c(100,900)), null.effect=0, k=3) res
power.oneway # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.oneway, eff.size=rep(c(2,0),c(100,900)), null.effect=0, k=3) res
Compute power of rank-sum test;Uses formula of Noether (JASA 1987)
power.ranksum (n, alpha, p)
power.ranksum (n, alpha, p)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p |
Pr (Y>X), as in Noether (JASA 1987) |
In most applications, the null effect size will be designated by p = 0.5, which indicates that Thus, in the example below, the argument null.effect=0.5 is specified in the call to fdr.sampsize.
vector of power estimates for two-sided tests
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
power.ranksum # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.ranksum, eff.size=rep(c(0.8,0.5),c(100,900)), null.effect=0.5) res
power.ranksum # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.ranksum, eff.size=rep(c(0.8,0.5),c(100,900)), null.effect=0.5) res
Use the Noether (1987) formula to compute the power of the sign test
power.signtest (n, alpha, p)
power.signtest (n, alpha, p)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p |
Pr (Y>X), as in Noether (JASA 1987) |
In most applications, the null effect size will be designated by p = 0.5 instead of p = 0. Thus, in the call to fdr.sampsize, we specify null.effect=0.5 in the example below.
vector of power estimates for two-sided tests
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
power.signtest # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.signtest, eff.size=rep(c(0.8,0.5),c(100,900)), null.effect=0.5) res
power.signtest # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.signtest, eff.size=rep(c(0.8,0.5),c(100,900)), null.effect=0.5) res
Estimate power of t-test for non-zero correlation;Uses classical power formula for t-test
power.tcorr (n, alpha, rho)
power.tcorr (n, alpha, rho)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
rho |
population correlation coefficient (vector) |
For many applications, the null.effect is rho=0.
vector of power estimates for two-sided tests
power.tcorr # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.tcorr, eff.size=rep(c(0.3,0),c(100,900)), null.effect=0) res
power.tcorr # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.tcorr, eff.size=rep(c(0.3,0),c(100,900)), null.effect=0) res
Estimate power of the two-samples t-test;Uses classical power formula for two-sample t-test;Assumes equal variance and sample size
power.twosampt (n, alpha, delta, sigma = 1)
power.twosampt (n, alpha, delta, sigma = 1)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
delta |
difference between population means (vector) |
sigma |
standard deviation (vector or scalar) |
For many applications, the null.effect is zero difference of means.
vector of power estimates for two-sided test
power.twosampt # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.twosampt, eff.size=rep(c(2,0),c(100,900)), null.effect=0, sigma=1) res
power.twosampt # show the power function res=fdr.sampsize(fdr=0.1, ave.pow=0.8, pow.func=power.twosampt, eff.size=rep(c(2,0),c(100,900)), null.effect=0, sigma=1) res