`Ch11Vig1SampleSize.Rmd`

Since it plays an important role in sample size estimation, it is helpful to examine the behavior of the F-distribution. In the following `ndf`

= numerator degrees of freedom, `ddf`

= denominator degrees of freedom and `ncp`

= non-centrality parameter (i.e., the \(\Delta\) appearing in Eqn. (11.6) of (Chakraborty 2017)).

The use of three `R`

functions is demonstrated.

`qf(p,ndf,ddf)`

is the*quantile*function of the F-distribution for specified values of`p`

,`ndf`

and`ddf`

, i.e., the value`x`

such that fraction`p`

of the area under the F-distribution lies to the right of`x`

. Since`ncp`

is not included as a parameter, the default value, i.e., zero, is used. This is called the*central*F-distribution.`df(x,ndf,ddf,ncp)`

is the probability density function (*pdf*) of the F-distribution, as a function of`x`

, for specified values of`ndf`

,`ddf`

and`ncp`

.`pf(x,ndf,ddf,ncp)`

is the probability (or cumulative) distribution function of the F-distribution for specified values of`ndf`

,`ddf`

and`ncp`

.

`ncp`

for `ndf`

= 2 and `ddf`

= 10- Four values of
`ncp`

are considered (0, 2, 5, 10) for`ddf`

= 10. -
`fCrit`

is the critical value of the F distribution, i.e., that value such that fraction \(\alpha\) of the area is to the right of the critical value, i.e.,`fCrit`

is identical in statistical notation to \({{F}_{1-\alpha ,ndf,ddf}}\).

ndf <- 2;ddf <- 10;ncp <- c(0,2,5,10) alpha <- 0.05 fCrit <- qf(1-alpha, ndf,ddf) x <- seq(1, 20, 0.1) myLabel <- c("A", "B", "C", "D") myLabelIndx <- 1 pFgtFCrit <- NULL for (i in 1:length(ncp)) { y <- df(x,ndf,ddf,ncp=ncp[i]) pFgtFCrit <- c(pFgtFCrit, 1-pf(fCrit, ndf, ddf, ncp = ncp[i])) } for (i in 1:length(ncp)) { y <- df(x,ndf,ddf,ncp=ncp[i]) curveData <- data.frame(x = x, pdf = y) curvePlot <- ggplot(data = curveData, mapping = aes(x = x, y = pdf)) + geom_line() + ggtitle(myLabel[myLabelIndx]);myLabelIndx <- myLabelIndx + 1 print(curvePlot) } fCrit_2_10 <- fCrit # convention fCrit_ndf_ddf

ndf | ddf | fCrit | ncp | pFgtFCrit | |
---|---|---|---|---|---|

A | 2 | 10 | 4.102821 | 0 | 0.0500000 |

B | 2 | 10 | 4.102821 | 2 | 0.1775840 |

C | 2 | 10 | 4.102821 | 5 | 0.3876841 |

D | 2 | 10 | 4.102821 | 10 | 0.6769776 |

`ncp`

for `ndf`

= 2 and `ddf`

= 100ndf | ddf | fCrit | ncp | pFgtFCrit | |
---|---|---|---|---|---|

A | 2 | 10 | 4.102821 | 0 | 0.0500000 |

B | 2 | 10 | 4.102821 | 2 | 0.1775840 |

C | 2 | 10 | 4.102821 | 5 | 0.3876841 |

D | 2 | 10 | 4.102821 | 10 | 0.6769776 |

E | 2 | 100 | 3.087296 | 0 | 0.0500000 |

F | 2 | 100 | 3.087296 | 2 | 0.2199264 |

G | 2 | 100 | 3.087296 | 5 | 0.4910802 |

H | 2 | 100 | 3.087296 | 10 | 0.8029764 |

- All comparisons in this sections are at the same values of
`ncp`

defined above. - And between
`ddf`

= 100 and`ddf`

= 10.

- This corresponds to
`ncp`

= 0,`ndf`

= 2 and`ddf`

= 100. - The critical value is
`fCrit_2_100`

= 3.0872959. Notice the decrease compared to the previous value for`ncp`

= 0, i.e., 4.102821, for`ddf`

= 10. - One expects that increasing
`ddf`

will make it more likely that the NH will be rejected, and this is confirmed below. - All else equal, statistical power increases with increasing
`ddf`

.

- This corresponds to
`ncp`

= 2,`ndf`

= 2 and`ddf`

= 100. - The probability of exceeding the critical value is
`prob > fCrit_2_100`

= 0.2199264, greater than the previous value, i.e., 0.177584 for`ddf`

= 10.

`ncp`

for `ndf`

= 1, `ddf`

= 100ndf | ddf | fCrit | ncp | pFgtFCrit | |
---|---|---|---|---|---|

A | 2 | 10 | 4.102821 | 0 | 0.0500000 |

B | 2 | 10 | 4.102821 | 2 | 0.1775840 |

C | 2 | 10 | 4.102821 | 5 | 0.3876841 |

D | 2 | 10 | 4.102821 | 10 | 0.6769776 |

E | 2 | 100 | 3.087296 | 0 | 0.0500000 |

F | 2 | 100 | 3.087296 | 2 | 0.2199264 |

G | 2 | 100 | 3.087296 | 5 | 0.4910802 |

H | 2 | 100 | 3.087296 | 10 | 0.8029764 |

I | 1 | 100 | 3.936143 | 0 | 0.0500000 |

J | 1 | 100 | 3.936143 | 2 | 0.2883607 |

K | 1 | 100 | 3.936143 | 5 | 0.6004962 |

L | 1 | 100 | 3.936143 | 10 | 0.8793619 |

- All comparisons in this sections are at the same values of
`ncp`

defined above and at`ddf`

= 100. - And between
`ndf`

= 1 and`ndf`

= 2.

- This corresponds to
`ncp`

= 0,`ndf`

= 1 and`ddf`

= 100. - The critical value is
`fCrit_1_100`

= 3.936143. - Notice the increase in the critical value as compared to the corresponding value for
`ndf = 2`

, i.e., 3.0872959. - One might expect power to decrease,
**but see below**.

- This corresponds to
`ncp`

= 2,`ndf`

= 1 and`ddf`

= 100. - Now
`prob > fCrit_1_100`

= 0.2883607, larger than the previous value 0.2199264. - The power has actually increased.

- Power increases with increasing
`ddf`

and`ncp`

. - The effect of increasing
`ncp`

is quite dramatic. This is because power depends on the square of`ncp`

. - Decreasing
`ndf`

also**increases**power. At first glance this may seem counterintuitive, as`fCrit`

has gone up, but is explained by the differing shapes of the two distributions: the pdf is broader for`ndf`

= 1 as compared to`ndf`

= 2 (compare Fig. L to H).

## Comments

## Fig. A

`ncp = 0`

, i.e., thecentralF-distribution.`fCrit`

in the above code block, is the value of`x`

such that the probability of exceeding`x`

is \(\alpha\). The corresponding parameter`alpha`

is defined above as 0.05.`fCrit`

= 4.102821. Notice the use of the quantile function`qf()`

to determine this value, and the default value of`ncp`

, namely zero, is used; specifically, one does not pass a 4th argument to`qf()`

.The decision rule for rejecting the NH uses the NH distribution of the F-statistic, i.e., reject the NH if F >=`fCrit`

. As expected,`prob > fCrit`

= 0.05 because this is how`fCrit`

was defined.## Fig. B

`ncp = 2`

,`ndf`

= 2 and`ddf`

= 10.`prob > fCrit`

= 0.177584, i.e., thestatistical power(compare this to Fig. A where`prob > fCrit`

was 0.05).## Fig. C

`ncp = 5`

,`ndf`

= 2 and`ddf`

= 10.`prob > fCrit`

= 0.3876841.## Fig. D

`ncp = 10`

,`ndf`

= 2 and`ddf`

= 10.`prob > fCrit`

is 0.6769776.`x`

= 4.102821, fraction 0.6769776 of the probability distribution in Fig. D lies to the right of this line## Summary

The larger that non-centrality parameter, the greater the shift to the right of the F-distribution, and the greater the statistical power.