Chapter 25 Empirical plots

25.1 TBA How much finished

70%

25.2 Introduction

Operating characteristics are visual depicters of performance. If properly defined, scalar quantities derived from operating characteristics can serve as quantitative measures of performance, termed figures of merit (FOMs). The previous chapter defined the FROC curve and suggested the area under this curve as a possible FOM. This chapter introduces mathematical expressions for empirical operating characteristics (FROC and others) possible with FROC data and associated FOMs.

A distinction between latent and actual marks is made followed by a summary of FROC notation applicable to a single modality single reader dataset. This is a key table, which will be used in later chapters. Following this, different empirical operating characteristics proposed for FROC data are described. Formulae are given for calculating each empirical operating characteristic.

The observed end-point of an operating characteristic is defined as that operating point achieved by cumulating all the ratings. For the FROC plot it is demonstrated that the observed FROC curve is not contained in the unit square, unlike the other operating characteristics, which are contained in the unit square.

25.3 Mark rating pairs

FROC data consists of mark-rating pairs. Each mark indicates the location of a region suspicious enough to warrant reporting and the rating is the associated confidence level. A mark is recorded as lesion localization (LL) if it is sufficiently close to a true lesion, according to the adopted proximity criterion, and otherwise it is recorded as non-lesion localization (NL).

In an FROC study the number of marks on an image is an a-priori unknown modality-reader-case dependent non-negative random integer. It is incorrect to estimate it by dividing the image area by the lesion area because not all regions of the image are equally likely to have lesions, lesions do not have the same size, and perhaps most important, clinicians don’t assign equal attention units to all areas of the image. The best insight into the number of marks per case is obtained from eye-tracking studies (Duchowski 2002), but even here the information is incomplete, as eye-tracking studies can only measure foveal gaze and not lesions found by peripheral vision, not to mention that such studies are very difficult to conduct in a clinical setting.

Experts tend to have smaller numbers of NL marks per case than non-experts while maintaining equal or more LL marks per case. As an example, in screening mammography, the number of marks per case (a case is defined as 4-views, two of each breast) that an expert will consider for marking to typically less than three. About 80% on non-diseased cases have no marks. The reason is that because of the low disease prevalence marking too many cases would result in unacceptably high recall rates.

25.3.1 Latent vs. actual marks

To distinguish between suspicious regions that were considered for marking and regions that were actually marked, it is necessary to introduce the distinction between latent marks and actual marks.

  • A latent mark is defined as a suspicious region, regardless of whether or not it was marked. A latent mark becomes an actual mark if it is marked.
  • A latent mark is a latent LL if it is close to a true lesion and otherwise it is a latent NL.
  • A non-diseased case can only have latent NLs. A diseased case can have latent NLs and latent LLs.
  • If marked, a latent NL is recorded as an actual NL.
  • If not marked, a latent NL is an unobservable event.
  • In contrast, unmarked lesions are observable events – one knows (trivially) which lesions were not marked.

25.3.2 Binning rule

Recall from Section 10.3 that ROC data modeling requires the existence of a case-dependent decision variable, or z-sample \(z\), and case-independent decision thresholds \(\zeta_r\), where \(r = 0, 1, ..., R_{ROC}-1\) and \(R_{ROC}\) is the number of ROC study bins, and the rule that if \(\zeta_r \leq z < \zeta_{r+1}\) the case is rated \(r + 1\). Dummy cutoffs are defined as \(\zeta_0 = -\infty\) and \(\zeta_{R_{ROC}} = \infty\). The z-sample applies to the whole case. To summarize:

\[\begin{equation} \left. \begin{aligned} if \left (\zeta_r \le z < \zeta_{r+1} \right )\Rightarrow \text {rating} = r+1\\ r = 0, 2, ..., R_{ROC}-1\\ \zeta_0 = -\infty\\ \zeta_{R_{ROC}} = \infty\\ \end{aligned} \right \} \tag{25.1} \end{equation}\]

Analogously, FROC data modeling requires the existence of a case and location-dependent z-sample for each latent mark and case and location-independent reporting thresholds \(\zeta_r\), where \(r = 1, ..., R_{FROC}\) and \(R_{FROC}\) is the number of FROC study bins, and the rule that a latent mark is marked and rated \(r\) if \(\zeta_r \leq z < \zeta_{r+1}\). Dummy cutoffs are defined as \(\zeta_0 = -\infty\) and \(\zeta_{R_{FROC}+1} = \infty\). For the same numbers of non-dummy cutoffs, the number of FROC bins is one less than the number of ROC bins. For example, 4 non-dummy cutoffs \(\zeta_1, \zeta_2, \zeta_3, \zeta_4\) can correspond to a 5-rating ROC study or to a 4-rating FROC study. To summarize:

\[\begin{equation} \left. \begin{aligned} if \left (\zeta_r \le z < \zeta_{r+1} \right )\Rightarrow \text {rating} = r\\ r = 1, 2, ..., R_{FROC}\\ \zeta_0 = -\infty\\ \zeta_{R_{FROC}+1} = \infty\\ \end{aligned} \right \} \tag{25.2} \end{equation}\]

25.4 FROC notation

Clear notation is vital to understanding this paradigm. The notation needs to account for case and location dependencies of ratings and the distinction between case-level and location-level ground truth. For example, a diseased case can have several regions that are non-diseased and a few diseased regions (the lesions). The notation also has to account for cases with no marks.

FROC notation is summarized in Table 25.1, in which all marks are latent marks. The table is organized into three columns, the first column is the row number, the second column has the symbol(s), and the third column has the meaning(s) of the symbol(s).

TABLE 25.1: FROC notation; all marks refer to latent marks; see comments
Row Symbol Meaning
1 t Case-level truth: 1 for non-diseased and 2 for diseased
2 \(K_t\) Number of cases with case-level truth t
3 \(k_t t\) Case \(k_t\) in case-level truth t
4 s Mark-level truth: 1 for NL and 2 for LL
5 \(l_s s\) Mark \(l_s\) in mark-level truth s
6 \(z_{k_t t l_1 1}\) z-sample for case \(k_t t\) and mark \(l_1 1\)
7 \(z_{k_2 2 l_2 2}\) z-sample for case \(k_2 2\) and mark \(l_2 2\)
8 \(R_{FROC}\) Number of FROC bins
9 \(\zeta_1\) Lowest reporting threshold
10 \(\zeta_r\) Other non-dummy reporting thresholds
11 \(\zeta_0, \zeta_{R_{FROC}+1}\) Dummy thresholds
12 \(N_{k_t t}\) Number of NLs on case \(k_t t\)
13 \(L_{k_2 2}\) Number of lesions on case \(k_2 2\)
14 \(W_{k_2 l_2}\) Weight of lesion \(l_2 2\) on case \(k_2 2\)
15 \(L_{max}\) Maximum number of lesions per case in dataset
16 \(L_T\) Total number of lesions in dataset

25.4.1 Comments on Table 25.1

  • Row 1: The case-truth index \(t\) refers to the case (or patient), with \(t = 1\) for non-diseased and \(t = 2\) for diseased cases. As a useful mnemonic, \(t\) is for truth.

  • Row 2: \(K_t\) is the number of cases with truth state \(t\); specifically, \(K_1\) is the number of non-diseased cases and \(K_2\) the number of diseased cases.

  • Row 3: Two indices \(k_t t\) are needed to select case \(k_t\) in truth state \(t\). As a useful mnemonic, \(k\) is for case.

  • Rows 4 and 5: For a similar reason, two indices \(l_s s\) are needed to select latent mark \(l_s\) in location level truth state \(s\), where \(s = 1\) corresponds to a latent NL and \(s = 2\) corresponds to a latent LL. One can think of \(l_s\) as indexing the locations of different latent marks with location-level truth state \(s\). As a useful mnemonic, \(l\) is for location.

    • \(l_1 = \{1, 2, ..., N_{k_t t}\}\) indexes latent NL marks, provided the case has at least one NL mark, and otherwise \(N_{k_t t} = 0\) and \(l_1 = \varnothing\), the null set.

    • The possible values of \(l_1\) are \(l_1 = \left \{ \varnothing \right \}\oplus \left \{ 1,2,...N_{k_t t} \right \}\). The null set applies when the case has no latent NL marks and \(\oplus\) is the “exclusive-or” symbol (“exclusive-or” is used in the English sense: “one or the other, but not neither nor both”). In other words, \(l_1\) can either be the null set or take on values \(1,2,...N_{k_t t}\).

    • Likewise, \(l_2 = \left \{ 1,2,...,L_{k_2 2} \right \}\) indexes latent LL marks. Unmarked LLs are assigned negative infinity ratings. The null set notation is not needed for latent LLs.

  • Row 6: The z-sample for case \(k_t t\) and latent NL mark \(l_1 1\) is denoted \(z_{k_t t l_1 1}\). Latent NL marks are possible on non-diseased and diseased cases (both values of \(t\) are allowed). The range of a z-sample is \(-\infty < z_{k_t t l_1 1} < \infty\), provided \(l_1 \neq \varnothing\); otherwise, it is an unobservable event.

  • Row 7: The z-sample of a latent LL is \(z_{k_2 2 l_2 2}\). Unmarked lesions are assigned negative infinity ratings and are observable events. The null-set notation is unnecessary for them.

  • Row 8: \(R_{FROC}\) is the number of bins in the FROC study.

  • Rows 9, 10 and 11: The cutoffs in the FROC study. The lowest threshold is \(\zeta_1\). The other non-dummy thresholds are \(\zeta_r\) where \(r=2,3,...,R_{FROC}\). The dummy thresholds are \(\zeta_0 = -\infty\) and \(\zeta_{R_{FROC}+1} = \infty\).

  • Row 12: \(N_{k_t t}\) is the total number of latent NL marks on case \(k_t t\).

  • Row 13: \(L_{k_2 2}\) is the number of lesions in diseased case \(k_2 2\).

  • Row 14: \(W_{k_2 l_2}\) is the weight (i.e., clinical importance) of lesion \(l_2 2\) in diseased case \(k_2 2\). The weights of lesions on a case sum to one: \(\sum_{l_2 = 1}^{L_{k_2 2}}W_{k_2 l_2} = 1\).

  • Row 15: \(L_{max}\) is the maximum number of lesions per case in the dataset.

  • Row 16: \(L_T\) is the total number of lesions in the dataset.

25.4.2 Discussion: cases with zero latent NL marks

An aspect of FROC data, that there could be cases with no NL marks, no matter how low the reporting threshold, has created problems both from conceptual and notational viewpoints. Taking the conceptual issue first, my thinking (prior to 2004) was that as the reporting threshold \(\zeta_1\) is lowered, the number of NL marks per case increases almost indefinitely. I visualized this process as each case “filling up” with NL marks.19 In fact the first modeling of FROC data (Chakraborty 1989) predicts that, as the reporting threshold is lowered to \(\zeta_1 = -\infty\), the number of NL marks per case approaches \(\infty\). However, observed FROC curves end with a finite value of NLs per case. This mismatch between observation and theory is one reason I introduced the radiological search model (RSM) (Chakraborty 2006b). I will have much more to say about this in a subsequent chapter, but for now I state one prediction (actually an assumption) of the RSM: the number of latent NL marks is a Poisson distributed random integer with a finite value for the mean parameter of the Poisson distribution. This means that the actual number of latent NL marks per case can be 0, 1, 2, .., whose average (over cases) is a finite number. With this background, let us return to the conceptual issue: why does the observer not keep “filling-up” the image with NL marks? The answer is that the observer can only mark regions that have a non-zero chance of being a lesion. For example, if the actual number of latent NLs on a particular case is 2, then, as the reporting threshold is lowered, the observer will make at most two NL marks. Having exhausted these two regions the observer will not mark any more regions because there are no more regions to be marked - all other regions in the image have, in the perception of the observer, zero chance of being a lesion.

The notational issue is how to handle images with no latent NL marks. Basically it involves restricting summations over cases \(k_ t t\) to those cases which have at least one latent NL mark, i.e., \(N_{k_t t} \neq 0\). This is illustrated in the next section.

25.5 The empirical FROC

The FROC was defined, Chapter 24, as the plot of LLF (along the ordinate) vs. NLF (along the abscissa).

Using the notation of Table 25.1 and assuming binned data20, then, corresponding to the operating point determined by threshold \(\zeta_r\), the FROC abscissa is \(\text{NLF}_r \equiv \text{NLF}\left ( \zeta_r \right )\), the total number of NLs rated \(\geq\) threshold \(\zeta_r\) divided by the total number of cases, and the corresponding ordinate is \(\text{LLF}_r \equiv \text{LLF}\left ( \zeta_r \right )\), the total number of LLs rated \(\geq\) threshold \(\zeta_r\) divided by the total number of lesions:

\[\begin{equation} \text{NLF}_r = \frac{n\left ( \text{NLs rated} \geq \zeta_r\right )}{n\left ( \text{cases} \right )} \tag{25.3} \end{equation}\]

and

\[\begin{equation} \text{LLF}_r = \frac{n\left ( \text{LLs rated} \geq \zeta_r\right )}{n\left ( \text{lesions} \right )} \tag{25.4} \end{equation}\]

The observed operating points correspond to the following values of \(r\):

\[\begin{equation} r = 1, 2, ...,R_{FROC} \tag{25.5} \end{equation}\]

Due to the ordering of the thresholds, i.e., \(\zeta_1 < \zeta_2 ... < \zeta_{R_{FROC}}\), higher values of \(r\) correspond to lower operating points. The uppermost operating point, i.e., that defined by \(r = 1\), is referred to the as the observed end-point.

Equations (25.3) and (25.4) is are equivalent to:

\[\begin{equation} \text{NLF}_r = \frac{1}{K_1+K_2} \sum_{t=1}^{2} \sum_{k_t=1}^{K_t} \mathbb{I} \left ( N_{k_t t} \neq 0 \right )\sum_{l_1=1}^{N_{k_t t}} \mathbb{I} \left ( z_{k_t t l_1 1} \geq \zeta_r \right ) \tag{25.6} \end{equation}\]

and

\[\begin{equation} \text{LLF}_r = \frac{1}{L_T} \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}} \mathbb{I} \left ( z_{k_2 2 l_2 2} \geq \zeta_r \right ) \tag{25.7} \end{equation}\]

Each indicator function, \(\mathbb{I}()\), yields unity if the argument is true and zero otherwise.

In Eqn. (25.6) \(\mathbb{I} \left ( N_{k_t t} \neq 0 \right )\) ensures that only cases with at least one latent NL are counted. Recall that \(N_{k_t t}\) is the total number of latent NLs in case \(k_t t\). Not including this term would cause the summation over \(l_1\) to be undefined for cases with zero latent NLs. The term \(\mathbb{I} \left ( z_{k_t t l_1 1} \geq \zeta_r \right )\) counts over all NL marks with ratings \(\geq \zeta_r\). The three summations yield the total number of NLs in the dataset with z-samples \(\geq \zeta_r\) and dividing by the total number of cases yields \(\text{NLF}_r\). This equation also shows explicitly that NLs on both non-diseased (\(t=1\)) and diseased (\(t=2\)) cases contribute to NLF.

In Eqn. (25.7) a summation over \(t\) is not needed as only diseased cases contribute to LLF. Analogous to the first indicator function term in Eqn. (25.6), a term like \(\mathbb{I} \left ( L_{k_2 2} \neq 0 \right )\) would be superfluous since \(L_{k_2 2} > 0\), as each diseased case must have at least one lesion. The term \(\mathbb{I} \left ( z_{k_2 2 l_2 2} \geq \zeta_r \right )\) counts over all LL marks with ratings \(\geq \zeta_r\). Dividing by \(L_T\), the total number of lesions in the dataset, yields \(\text{LLF}_r\).

25.5.1 Definition

The empirical FROC plot connects adjacent operating points \(\left (\text{NLF}_r, \text{LLF}_r \right )\), including the origin (0,0) and the observed end-point, with straight lines. The area under this plot is the empirical FROC AUC, denoted \(A_{\text{FROC}}\).

25.5.2 The origin, a trivial point

Since \(\zeta_{R_{FROC}+1} = \infty\) according to Eqn. (25.6) and Eqn. (25.7), \(r = R_{FROC}+1\) yields the trivial operating point (0,0).

25.5.3 The observed end-point and its semi-constrained property

The abscissa of the observed end-point \(NLF_1\), is defined by:

\[\begin{equation} \text{NLF}_1 = \frac{1}{K_1+K_2} \sum_{t=1}^{2} \sum_{k_t=1}^{K_t} \mathbb{I} \left ( N_{k_t t} \neq 0 \right ) \sum_{l_1=1}^{N_{k_t t}} \mathbb{I} \left ( z_{k_t t l_1 1} \geq \zeta_1 \right ) \tag{25.8} \end{equation}\]

Since each case could have an arbitrary number of NLs, \(NLF_1\) need not equal unity, except fortuitously.

The ordinate of the observed end-point \(LLF_1\), is defined by:

\[\begin{equation} \left. \begin{aligned} \text{LLF}_1 =& \frac{ \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}} \mathbb{I} \left ( z_{k_2 2 l_2 2} \geq \zeta_1 \right ) }{L_T}\\ \leq& 1 \end{aligned} \right \} \tag{25.9} \end{equation}\]

The numerator is the total number of lesions that were actually marked. The ratio is the fraction of lesions that are marked, which is \(\leq 1\).

This is the semi-constrained property of the observed end-point, namely, while the observed end-point ordinate is constrained to the range (0,1) the corresponding abscissa is not so constrained.

25.5.4 Futility of extrapolation outside the observed end-point

To understand this consider the expression for \(NLF_0\), i.e., using Eqn. (25.6) with \(r = 0\):

\[\begin{equation} \text{NLF}_0 = \frac{1}{K_1+K_2} \sum_{t=1}^{2} \sum_{k_t=1}^{K_t} \mathbb{I} \left ( N_{k_t t} \neq 0 \right ) \sum_{l_1=1}^{N_{k_t t}} \mathbb{I} \left ( z_{k_t t l_1 1} \geq -\infty \right ) \end{equation}\]

The right hand side of this equation can be separated into two terms, the contribution of latent NLs with z-samples in the range \(z \geq \zeta_1\) and those in the range \(-\infty \leq z < \zeta_1\). The first term yields the abscissa of the observed end-point, Eqn. (25.8). The 2nd term is:

\[\begin{equation} \left. \begin{aligned} \text{2nd term}=&\left (\frac{1}{K_1+K_2} \right )\sum_{t=1}^{2} \sum_{k_t=1}^{K_t} \mathbb{I} \left ( N_{k_t t} \neq 0 \right ) \sum_{l_1=1}^{N_{k_t t}} \mathbb{I} \left ( -\infty \leq z_{k_t t l_1 1} < \zeta_1 \right )\\ =&\frac{\text{unknown number}}{K_1+K_2} \end{aligned} \right \} \tag{25.10} \end{equation}\]

It represents the contribution of unmarked NLs, i.e., latent NLs whose z-samples were below \(\zeta_1\). It determines how much further to the right the observer’s NLF would have moved, relative to \(NLF_1\), if one could get the observer to lower the reporting criterion to \(-\infty\). Since the observer may not oblige, this term cannot, in general, be evaluated. Therefore \(NLF_0\) cannot be evaluated. The basic problem is that unmarked latent NLs represent unobservable events.

Turning our attention to \(LLF_0\):

\[\begin{equation} \left. \begin{aligned} \text{LLF}_0 =& \frac{ \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}} \mathbb{I} \left ( z_{k_2 2 l_2 2} \geq -\infty \right ) }{L_T}\\ =& 1 \end{aligned} \right \} \tag{25.11} \end{equation}\]

Unlike unmarked latent NLs, unmarked lesions can safely be assigned the \(-\infty\) rating, because an unmarked lesion is an observable event. The right hand side of Eqn. (25.11) evaluates to unity. However, since the corresponding abscissa \(NLF_0\) is undefined, one cannot plot this point. It follows that one cannot extrapolate outside the observed end-point.

The formalism should not obscure the fact that the futility of extrapolation outside the observed end-point of the FROC is a fairly obvious property: one does not know how far to the right the abscissa of the observed end-point might extend if one could get the observer to report every latent NL, no matter how low its z-sample.

25.6 The inferred ROC plot

By adopting a sensible rule for converting the zero or more mark-rating data per case to a single rating per case, and commonly the highest rating rule is used,21 it is possible to infer ROC data from FROC mark-rating data.

25.6.1 Inferred-ROC rating

The rating of the highest rated mark on a case, or \(-\infty\) if the case has no marks, is defined as the inferred-ROC rating for the case. Inferred-ROC ratings on non-diseased cases are referred to as inferred-FP ratings and those on diseased cases as inferred-TP ratings.

When there is little possibility for confusion, the prefix “inferred” is suppressed. Using the by now familiar cumulation procedure, FP counts are cumulated to calculate FPF and likewise, TP counts are cumulated to calculate TPF.

Definitions:

  • \(FPF(\zeta)\) = cumulated inferred FP counts with z-sample \(\geq\) threshold \(\zeta\) divided by total number of non-diseased cases.
  • \(TPF(\zeta)\) = cumulated inferred TP counts with z-sample \(\geq\) threshold \(\zeta\) divided by total number of diseased cases

Definition of ROC plot:

  • The ROC is the plot of inferred \(TPF(\zeta)\) vs. inferred \(FPF(\zeta)\).
  • The plot includes a straight line extension from the observed end-point to (1,1).

The mathematical definition of the ROC follows.

25.6.2 Inferred FPF

The highest z-sample ROC false positive (FP) rating for non-diseased case \(k_1 1\) is defined by:

\[\begin{equation} \left. \begin{aligned} FP_{k_1 1}=&\max_{l1} \left ( z_{k_1 1 l_1 1 } \mid l_1 \neq \varnothing \right ) \\ =& -\infty \mid l_1 = \varnothing \end{aligned} \right \} \tag{25.12} \end{equation}\]

If the case has at least one latent NL mark, then \(l_1 \neq \varnothing\), where \(\varnothing\) is the null set, and the first definition applies. If the case has no latent NL marks, then \(l_1 = \varnothing\), and the second definition applies. \(FP_{k_1 1}\) is the maximum z-sample over all latent marks occurring on non-diseased case \(k_1 1\), or \(-\infty\) if the case has no latent marks. The corresponding false positive fraction is defined by:

\[\begin{equation} \text{FPF}_r \equiv \text{FPF} \left ( \zeta_r \right ) = \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq \zeta_r\right ) \tag{25.13} \end{equation}\]

25.6.3 Inferred TPF

The inferred true positive (TP) z-sample for diseased case \(k_2 2\) is defined by:

\[\begin{equation} TP_{k_2 2} = \textstyle\max_{l_1 l_2}\left ( \left (z_{k_2 2 l_1 2} ,z_{k_2 2 l_2 2} \right ) \mid l_1 \neq \varnothing \right ) \tag{25.14} \end{equation}\]

or

\[\begin{equation} TP_{k_2 2} = \textstyle\max_{l_2} \left ( z_{k_2 2 l_2 2} \right ) \mid\left ( l_1 = \varnothing \land \left (\textstyle\max_{l_2}{\left (z_{k_2 2 l_2 2} \right )} \neq -\infty \right ) \right ) \tag{25.15} \end{equation}\]

or

\[\begin{equation} TP_{k_2 2} = = -\infty \mid \left ( l_1 = \varnothing \land\left ( \textstyle\max_{l_2}{\left (z_{k_2 2 l_2 2} \right )} = -\infty \right ) \right ) \tag{25.16} \end{equation}\]

Here \(\land\) is the logical AND operator.

  • If \(l_1 \neq \varnothing\) then Eqn. (25.14) applies, i.e., one takes the maximum over all ratings, NLs and LLs, whichever is higher, occurring on the diseased case.

  • If \(l_1 = \varnothing\) and at least one lesion is marked, then Eqn. (25.15) applies, i.e., one takes the maximum over all marked LLs.

  • If \(l_1 = \varnothing\) and no lesions are marked, then Eqn. (25.16) applies; this represents an unmarked diseased case; the \(-\infty\) rating assignment is justified because an unmarked diseased case is an observable event.

The inferred true positive fraction \(\text{TPF}_r\) is defined by:

\[\begin{equation} \text{TPF}_r \equiv \text{TPF}(\zeta_r) = \frac{1}{K_2}\sum_{k_2=1}^{K_2} \mathbb{I}\left ( TP_{k_2 2} \geq \zeta_r \right ) \tag{25.17} \end{equation}\]

25.6.4 Definition

The inferred empirical ROC plot connects adjacent points \(\left( \text{FPF}_r, \text{TPF}_r \right )\), including the origin (0,0), with straight lines plus a straight-line segment connecting the observed end-point to (1,1). Like a real ROC, this plot is constrained to lie within the unit square. The area under this plot is the empirical inferred ROC AUC, denoted \(A_{\text{ROC}}\).

25.7 The alternative FROC (AFROC) plot

  • Fig. 4 in (Philip C Bunch et al. 1977) anticipated another way of visualizing FROC data. I subsequently termed22 this the alternative FROC (AFROC) plot (Chakraborty 1989).
  • The empirical AFROC is defined as the plot of \(\text{LLF}(\zeta_r)\) along the ordinate vs. \(\text{FPF}(\zeta_r)\) along the abscissa.
  • \(\text{LLF}_r \equiv \text{LLF}(\zeta_r)\) was defined in Eqn. (25.7).
  • \(\text{FPF}_r \equiv \text{FPF}(\zeta_r)\) was defined in Eqn. (25.13).

25.7.1 Definition

The empirical AFROC plot connects adjacent operating points \(\left( \text{FPF}_r, \text{LLF}_r \right )\), including the origin (0,0) and (1,1), with straight lines. The area under this plot is the empirical inferred AFROC AUC, denoted \(A_{\text{AFROC}}\).

Key points:

  • The ordinates LLF of the FROC and AFROC are identical.
  • The abscissa FPF of the ROC and AFROC are identical.
  • The AFROC is, in this sense, a hybrid plot, incorporating aspects of both ROC and FROC plots.
  • Unlike the empirical FROC, whose observed end-point has the semi-constrained property, the AFROC end-point is constrained to within the unit square.

25.7.2 The constrained observed end-point of the AFROC

Since \(\zeta_{R_{FROC}+1} = \infty\), according to Eqn. (25.7) and Eqn. (25.13), \(r = R_{FROC}+1\) yields the trivial operating point (0,0). Likewise, since \(\zeta_0 = -\infty\), \(r = 0\) yields the trivial point (1,1):

\[\begin{equation} \left. \begin{aligned} \text{FPF}_{R_{FROC}+1} =& \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq \infty \right )\\ =& 0\\ \text{LLF}_{R_{FROC}+1} =& \frac{1}{L_T} \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}}\mathbb{I} \left ( LL_{k_2 2 l_2 2} \geq \infty \right )\\ =& 0 \end{aligned} \right \} \tag{25.18} \end{equation}\]

and

\[\begin{equation} \left. \begin{aligned} \text{FPF}_0 =& \frac{1}{K_1} \sum_{k_1=1}^{K_1} \mathbb{I} \left ( FP_{k_1 1} \geq -\infty \right )\\ =& 1\\ \text{LLF}_0 =& \frac{1}{L_T} \sum_{k_2=1}^{K_2} \sum_{l_2=1}^{L_{k_2 2}}\mathbb{I} \left ( LL_{k_2 2 l_2 2} \geq -\infty \right )\\ =& 1 \end{aligned} \right \} \tag{25.19} \end{equation}\]

Because every non-diseased case is assigned a rating, and is therefore counted, the right hand side of the first equation in (25.19) evaluates to unity. This is obvious for marked cases. Since each unmarked case also gets a rating, albeit a \(-\infty\) rating, it is also counted (the argument of the indicator function in Eqn. (25.19) is true even when the inferred FP rating is \(-\infty\)).

25.8 The weighted-AFROC (wAFROC) plot

The AFROC ordinate defined in Eqn. (25.7) gives equal importance to every lesion on a case. Therefore, a case with more lesions will have more influence on the AFROC (see TBA Chapter 14 for an explicit demonstration of this fact). This is undesirable since each case (i.e., patient) should get equal importance in the analysis. As with ROC analysis, one wishes to draw conclusions about the population of cases and each case is regarded as an equally valid sample from the population. In particular, one does not want the analysis to be skewed towards cases with greater than the average number of lesions.23

Another issue is that the AFROC assigns equal clinical importance to each lesion in a case. Lesion weights were introduced (Chakraborty and Berbaum 2004) to allow for the possibility that the clinical importance of finding a lesion might be lesion-dependent (Chakraborty and Yoon 2009). For example, it is possible that a diseased cases has lesions of two types with differing clinical importance; the figure-of-merit should give more credit to finding the more clinically important one. Clinical importance could be defined as the mortality associated with the specific lesion type; these can be obtained from epidemiological studies (DeSantis et al. 2011).

Let \(W_{k_2 l_2} \geq 0\) denote the weight (i.e., clinical importance) of lesion \(l_2\) in diseased case \(k_2\) (since weights are only applicable to diseased cases, one can, without ambiguity, drop the case-level and location-level truth subscripts, i.e., the notation \(W_{k_2 2 l_2 2}\) would be superfluous). For each diseased case \(k_2 2\) the weights are subject to the constraint:

\[\begin{equation} \sum_{l_2 =1 }^{L_{k_2 2}} W_{k_2 l_2} = 1 \tag{25.20} \end{equation}\]

The constraint assures that the each diseased case exerts equal importance in determining the weighted-AFROC (wAFROC) operating characteristic, regardless of the number of lesions in it (see TBA Chapter 14 for a demonstration of this fact).

The weighted lesion localization fraction \(\text{wLLF}_r\) is defined by (Chakraborty and Zhai 2016):

\[\begin{equation} \text{wLLF}_r \equiv \text{wLLF}\left ( \zeta_r \right ) = \frac{1}{K_2}\sum_{k_2=1}^{K_2}\sum_{l_2=1}^{L_{k_2 2}}W_{k_2 l_2} \mathbb{I}\left ( z_{k_2 l_2 2} \geq \zeta_r \right ) \tag{25.21} \end{equation}\]

25.8.1 Definition

The empirical wAFROC plot connects adjacent operating points \(\left ( \text{FPF}_r, \text{wLLF}_r \right )\), including the origin (0,0), with straight lines plus a straight-line segment connecting the observed end-point to (1,1). The area under this plot is the empirical weighted-AFROC AUC, denoted \(A_{\text{wAFROC}}\).

25.9 The AFROC1 plot

Historically the AFROC originally used a different definition of FPF, which is retrospectively termed the AFROC1 plot. Since NLs can occur on diseased cases, it is possible to define an inferred “FP” rating on a diseased case as the maximum of all NL ratings on the case, or \(-\infty\) if the case has no NLs. The quotes emphasize that this is non-standard usage of ROC terminology: in an ROC study, a FP can only occur on a non-diseased case. Since both case-level truth states are allowed, the highest false positive (FP) z-sample for case \(k_t t\) is [the “1” superscript below is necessary to distinguish it from Eqn. (25.12)]:

\[\begin{equation} \left. \begin{aligned} FP_{k_t t}^1 =& \max_{l_1} \left ( z_{k_t t l_1 1 } \mid l_1 \neq \varnothing \right )\\ =& -\infty \mid l_1 = \varnothing \end{aligned} \right \} \tag{25.22} \end{equation}\]

\(FP_{k_t t}^1\) is the maximum over all latent NL marks, labeled by the location index \(l_1\), occurring on case \(k_t t\), or \(-\infty\) if \(l_1 = \varnothing\). The corresponding false positive fraction \(FPF_r^1\) is defined by [the “1” superscript is necessary to distinguish it from Eqn. (25.13)]:

\[\begin{equation} FPF_r^1 \equiv FPF_r^1\left ( \zeta_r \right ) = \frac{1}{K_1+K_2}\sum_{t=1}^{2}\sum_{k_t=1}^{K_t} \mathbb{I}\left ( FP_{k_t t}^1 \geq \zeta_r \right ) \tag{25.23} \end{equation}\]

Note the subtle differences between Eqn. (25.13) and Eqn. (25.23). The latter counts “FPs” on non-diseased and diseased cases while Eqn. (25.13) counts FPs on non-diseased cases only, and for that reason the denominators in the two equations are different. The advisability of allowing a diseased case to be both a TP and a FP is questionable from both clinical and statistical considerations. However, this operating characteristic can be useful in applications where all cases contain lesions, for example lesion localization plus classification tasks (See Chapter TBA).

25.9.1 Definition

The empirical AFROC1 plot connects adjacent operating points \(\left ( FPF_r^1, \text{LLF}_r \right )\), including the origin (0,0) and (1,1), with straight lines. The only difference between AFROC1 and the AFROC plot is in the x-axis. The area under this plot is the empirical AFROC1 AUC, denoted \(A_{\text{AFROC1}}\).

25.10 The weighted-AFROC1 (wAFROC1) plot

25.10.1 Definition

The empirical weighted-AFROC1 (wAFROC1) plot connects adjacent operating points \(\left ( FPF_r^1, \text{wLLF}_r \right )\), including the origin (0,0) and (1,1), with straight lines. The only difference between it and the wAFROC plot is in the x-axis. The area under this plot is the empirical weighted-AFROC AUC, denoted \(A_{\text{wAFROC1}}\).

25.11 The EFROC plot

An exponentially transformed FROC (EFROC) plot has been proposed (Popescu 2011) that, like the AFROC, is contained within the unit square. The EFROC inferred FPF is defined by (this represents another way of inferring ROC data, albeit only FPF, from FROC data):

\[\begin{equation} FPF_r= 1 - \exp\left ( NLF\left ( \zeta_r \right ) \right ) \tag{25.24} \end{equation}\]

In other words, one computes \(NLF_r\) using NLs rated \(\geq \zeta_r\) on all cases and then transforms it to \(FPF_r\) using the exponential transformation shown. Note that \(FPF_r\) so defined is in the range (0,1).

25.11.1 Definition

The empirical EFROC plot connects adjacent operating points \(\left ( FPF_r^1, \text{LLF}_r \right )\), including the origin (0,0) and (1,1), with straight lines. The only difference between it and the AFROC plot is in the x-axis. The area under this plot is the empirical EFROC AUC, denoted \(A_{\text{EFROC}}\).

\(A_{\text{EFROC}}\) has the advantage, compared to \(A_{\text{FROC}}\), of being defined by points contained within the unit square. It has the advantage over the AFROC of using all NL ratings, not just the highest rated ones. In my opinion this is a mixed blessing. The effect on statistical power compared to \(A_{\text{AFROC}}\) has not been studied, but I expect the advantage to be minimal (because the highest rated NL contains more information than a randomly selected NL mark). A disadvantage is that cases with more LLs get more importance in the analysis; this can be corrected by replacing LLF with wLLF, essentially yielding a weighted version of the EFROC AUC. Another disadvantage is that inclusion of NLs on diseased cases causes the EFROC AUC to depend on diseased prevalence. The EFROC represents the first recognition by someone other than me, of significant limitations of the FROC curve, and that an operating characteristic for FROC data that is completely contained within the unit square is highly desirable.

25.12 Discussion

TBA This chapter started with the difference between latent and actual marks and the notation to describe FROC data. The notation is used in deriving formulae for FROC, inferred ROC, AFROC, wAFROC, AFROC1, wAFROC1 and EFROC operating characteristics. In each case an area measure was defined. With the exception of the FROC plot, all operating characteristics defined in this chapter are contained in the unit square. Discussion of the preferred operating characteristic is deferred to a subsequent chapter TBA.

25.13 References

References

Bunch, Philip C, John F Hamilton, Gary K Sanderson, and Arthur H Simmons. 1977. “A Free Response Approach to the Measurement and Characterization of Radiographic Observer Performance.” In Application of Optical Instrumentation in Medicine Vi, 127:124–35. International Society for Optics; Photonics.

Chakraborty, Dev P. 1989. “Maximum Likelihood Analysis of Free-Response Receiver Operating Characteristic (Froc) Data.” Medical Physics 16 (4): 561–68.

Chakraborty, Dev P. 2006b. “A Search Model and Figure of Merit for Observer Data Acquired According to the Free-Response Paradigm.” Journal Article. Phys. Med. Biol. 51: 3449–62.

Chakraborty, Dev P., and K. S. Berbaum. 2004. “Observer Studies Involving Detection and Localization: Modeling, Analysis and Validation.” Journal Article. Med Phys 31 (8): 2313–30.

Chakraborty, Dev P., and H. J. Yoon. 2008. “Operating Characteristics Predicted by Models for Diagnostic Tasks Involving Lesion Localization.” Journal Article. Medical Physics 35 (2): 435–45.

2009. “JAFROC Analysis Revisited: Figure-of-Merit Considerations for Human Observer Studies.” Journal Article. Proc. SPIE Medical Imaging: Image Perception, Observer Performance, and Technology Assessment 7263: 72630T.

Chakraborty, Dev P, and Xuetong Zhai. 2016. “On the Meaning of the Weighted Alternative Free-Response Operating Characteristic Figure of Merit.” Journal Article. Medical Physics 43 (5): 2548–57.

DeSantis, Carol, Rebecca Siegel, Priti Bandi, and Ahmedin Jemal. 2011. “Breast Cancer Statistics, 2011.” CA: A Cancer Journal for Clinicians 61 (6): 408–18.

Duchowski, A. T. 2002. Eye Tracking Methodology: Theory and Practice. Book. Clemson, SC: Clemson University.

Popescu, Lucretiu M. 2011. “Nonparametric Signal Detectability Evaluation Using an Exponential Transformation of the FROC Curve.” Journal Article. Medical Physics 38 (10): 5690–5702.

Swensson, Richard G. 1996. “Unified Measurement of Observer Performance in Detecting and Localizing Target Objects on Images.” Medical Physics 23 (10): 1709–25.


  1. I expected the number of NL marks per image to be limited only by the ratio of image size to lesion size, i.e., larger values for smaller lesions.↩︎

  2. This is not a limiting assumption: if the data is continuous, for finite numbers of cases, no ordering information is lost if the number of ratings is chosen large enough. This is analogous to Bamber’s theorem in Chapter 05, where a proof, although given for binned data, is applicable to continuous data.↩︎

  3. The highest rating method was used in early FROC modeling in (Philip C Bunch et al. 1977) and in (Richard G Swensson 1996), the latter in the context of LROC paradigm modeling.↩︎

  4. The late Prof. Richard Swensson did not like my choice of the word “alternative” in naming this operating characteristic. I had no idea in 1989 how important this operating characteristic would later turn out to be, otherwise a more meaningful name might have been proposed.↩︎

  5. Historical note: I became aware of how serious this issue could be when a researcher contacted him about using FROC methodology for nuclear medicine bone scan images, where the number of lesions on diseased cases can vary from a few to a hundred!↩︎