Chapter 10 F-distribution

10.1 TBA How much finished

10%

10.2 Introduction

Since it plays an important role in sample size estimation, it is helpful to examine the behavior of the F-distribution. In the following ndf = numerator degrees of freedom, ddf = denominator degrees of freedom and ncp = non-centrality parameter (i.e., the \(\Delta\) appearing in Eqn. (11.6) of (Dev P. Chakraborty 2017)).

The use of three R functions is demonstrated.

qf(p,ndf,ddf) is the quantile function of the F-distribution for specified values of p, ndf and ddf, i.e., the value x such that fraction p of the area under the F-distribution lies to the right of x. Since ncp is not included as a parameter, the default value, i.e., zero, is used. This is called the central F-distribution.
df(x,ndf,ddf,ncp) is the probability density function (pdf) of the F-distribution, as a function of x, for specified values of ndf, ddf and ncp.
pf(x,ndf,ddf,ncp) is the probability (or cumulative) distribution function of the F-distribution for specified values of ndf, ddf and ncp.

10.3 Effect of `ncp` for `ndf` = 2 and `ddf` = 10

Four values of ncp are considered (0, 2, 5, 10) for ddf = 10.
fCrit is the critical value of the F distribution, i.e., that value such that fraction \(\alpha\) of the area is to the right of the critical value, i.e., fCrit is identical in statistical notation to \({{F}_{1-\alpha ,ndf,ddf}}\).

ndf <- 2;ddf <- 10;ncp <- c(0,2,5,10)
alpha <- 0.05
fCrit <- qf(1-alpha, ndf,ddf)
x <- seq(1, 20, 0.1)
myLabel <- c("A", "B", "C", "D")
myLabelIndx <- 1
pFgtFCrit <- NULL
for (i in 1:length(ncp))
{
  y <- df(x,ndf,ddf,ncp=ncp[i])
  pFgtFCrit <- c(pFgtFCrit, 1-pf(fCrit, ndf, ddf, ncp = ncp[i]))
}  
for (i in 1:length(ncp))
{
  y <- df(x,ndf,ddf,ncp=ncp[i])
  curveData <- data.frame(x = x, pdf = y)
  curvePlot <- ggplot(data = curveData, mapping = aes(x = x, y = pdf)) + 
    geom_line() +
    ggtitle(myLabel[myLabelIndx]);myLabelIndx <- myLabelIndx + 1
  print(curvePlot)
}
fCrit_2_10 <- fCrit # convention fCrit_ndf_ddf

	ndf	ddf	fCrit	ncp	pFgtFCrit
A	2	10	4.102821	0	0.0500000
B	2	10	4.102821	2	0.1775840
C	2	10	4.102821	5	0.3876841
D	2	10	4.102821	10	0.6769776

10.4 Comments

10.4.1 Fig. A

This corresponds to ncp = 0, i.e., the central F-distribution.
The integral under this distribution is unity (this is also true for all plots in this vignette).
The critical value, fCrit in the above code block, is the value of x such that the probability of exceeding x is \(\alpha\). The corresponding parameter alpha is defined above as 0.05.
In the current example fCrit = 4.102821. Notice the use of the quantile function qf() to determine this value, and the default value of ncp, namely zero, is used; specifically, one does not pass a 4th argument to qf().
The decision rule for rejecting the NH uses the NH distribution of the F-statistic, i.e., reject the NH if F >= fCrit. As expected, prob > fCrit = 0.05 because this is how fCrit was defined.

10.4.2 Fig. B

This corresponds to ncp = 2, ndf = 2 and ddf = 10.
The distribution is slightly shifted to the right as compared to Fig. A, thereby making it more likely that the observed value of the F-statistic will exceed the critical value determined for the NH distribution.
In fact, prob > fCrit = 0.177584, i.e., the statistical power (compare this to Fig. A where prob > fCrit was 0.05).

10.4.3 Fig. C

This corresponds to ncp = 5, ndf = 2 and ddf = 10.
Now prob > fCrit = 0.3876841.
Power has increased compared to Fig. B.

10.4.4 Fig. D

This corresponds to ncp = 10, ndf = 2 and ddf = 10.
Now prob > fCrit is 0.6769776.
Power has increased compared to Fig. C.
The effect of the shift is most obvious in Fig. C and Fig. D.
Considering a vertical line at x = 4.102821, fraction 0.6769776 of the probability distribution in Fig. D lies to the right of this line
Therefore the NH is likely to be rejected with probability 0.6769776.

10.4.5 Summary

The larger that non-centrality parameter, the greater the shift to the right of the F-distribution, and the greater the statistical power.

10.5 Effect of `ncp` for `ndf` = 2 and `ddf` = 100

	ndf	ddf	fCrit	ncp	pFgtFCrit
A	2	10	4.102821	0	0.0500000
B	2	10	4.102821	2	0.1775840
C	2	10	4.102821	5	0.3876841
D	2	10	4.102821	10	0.6769776
E	2	100	3.087296	0	0.0500000
F	2	100	3.087296	2	0.2199264
G	2	100	3.087296	5	0.4910802
H	2	100	3.087296	10	0.8029764

10.6 Comments

All comparisons in this sections are at the same values of ncp defined above.
And between ddf = 100 and ddf = 10.

10.6.1 Fig. E

This corresponds to ncp = 0, ndf = 2 and ddf = 100.
The critical value is fCrit_2_100 = 3.0872959. Notice the decrease compared to the previous value for ncp = 0, i.e., 4.102821, for ddf = 10.
One expects that increasing ddf will make it more likely that the NH will be rejected, and this is confirmed below.
All else equal, statistical power increases with increasing ddf.

10.6.2 Fig. F

This corresponds to ncp = 2, ndf = 2 and ddf = 100.
The probability of exceeding the critical value is prob > fCrit_2_100 = 0.2199264, greater than the previous value, i.e., 0.177584 for ddf = 10.

10.6.3 Fig. G

This corresponds to ncp = 5, ndf = 2 and ddf = 100.
The probability of exceeding the critical value is prob > fCrit_2_100 = 0.4910802.
This is greater than the previous value, i.e., 0.3876841 for ddf = 10.

10.6.4 Fig. H

This corresponds to ncp = 10, ndf = 2 and ddf = 100.
The probability of exceeding the critical value is prob > fCrit_2_100 is 0.8029764.
This is greater than the previous value, i.e., 0.6769776 for ddf = 10.

10.7 Effect of `ncp` for `ndf` = 1, `ddf` = 100

	ndf	ddf	fCrit	ncp	pFgtFCrit
A	2	10	4.102821	0	0.0500000
B	2	10	4.102821	2	0.1775840
C	2	10	4.102821	5	0.3876841
D	2	10	4.102821	10	0.6769776
E	2	100	3.087296	0	0.0500000
F	2	100	3.087296	2	0.2199264
G	2	100	3.087296	5	0.4910802
H	2	100	3.087296	10	0.8029764
I	1	100	3.936143	0	0.0500000
J	1	100	3.936143	2	0.2883607
K	1	100	3.936143	5	0.6004962
L	1	100	3.936143	10	0.8793619

10.8 Comments

All comparisons in this sections are at the same values of ncp defined above and at ddf = 100.
And between ndf = 1 and ndf = 2.

10.8.1 Fig. I

This corresponds to ncp = 0, ndf = 1 and ddf = 100.
The critical value is fCrit_1_100 = 3.936143.
Notice the increase in the critical value as compared to the corresponding value for ndf = 2, i.e., 3.0872959.
One expects power to decrease: the following code demonstrates that as ndf increases, the critical value fCrit decreases.
In significance testing generally ndf = I -1.
It more likely that the NH will be rejected with increasing numbers of treatments.

ndf	ddf	fCrit
1	100	3.936143
2	100	3.087296
5	100	2.305318
10	100	1.926692
12	100	1.850255
15	100	1.767530
20	100	1.676434

10.8.2 Fig. J

This corresponds to ncp = 2, ndf = 1 and ddf = 100.
Now prob > fCrit_1_100 = 0.2883607, 0.1351602, 0.0168844, 8.9992114^{-4}, 3.2584757^{-4}, 8.1619807^{-5}, 1.1084132^{-5}, larger than the previous value 0.2199264.
The power has actually increased.

10.8.3 Fig. K

This corresponds to ncp = 5, ndf = 1 and ddf = 100`’,
Now prob > fCrit_1_100 = 0.6004962, 0.3632847, 0.0699798, 0.0048836, 0.0018367, 4.6889533^{-4}, 6.2058692^{-5}, larger than the previous value 0.4910802.
Again, the power has actually increased.

10.8.4 Fig. L

This corresponds to ncp = 10, ndf = 1 and ddf = 100
Now prob > fCrit_1_100 is 0.8793619, 0.7000168, 0.2459501, 0.0290856, 0.0123033, 0.0035298, 5.1213398^{-4}, larger than the previous value 0.8029764.
The power has actually increased.

10.9 Summary

Power increases with increasing ddf and ncp.
The effect of increasing ncp is quite dramatic. This is because power depends on the square of ncp.
As ndf increases, fCrit decreases, which makes it more likely that the NH will be rejected.
With increasing numbers of treatments the probability is greater that the F-statistic will be large enough to exceed the critical value.

References

Chakraborty, Dev P. 2017. Observer Performance Methods for Diagnostic Imaging: Foundations, Modeling, and Applications with r-Based Examples. Boca Raton, FL: CRC Press.