Introduction

  • In the region-of-interest (ROI) paradigm (Obuchowski 1997, @RN55) each case is regarded as consisting of \({{Q}_{k}}\) (\({{Q}_{k}}\ge 1\)) “quadrants” or “regions-of-interest” or ROIs, where k is the case index (\(k=1,2,...,K\)) and \(K\) is the total number of cases (i.e., case-level non-diseased plus case-level diseased cases). Each ROI needs to be classified, by the investigator, as either ROI-level-non-diseased (i.e., it has no lesions) or ROI-level-diseased (i.e., it has at least one lesion). Note the distinction between case-level and ROI-level truth states. One can have ROI-level non-diseased regions in a case-level diseased case. A case-level diseased case must contain at least one ROI-level diseased region and a case-level non-diseased case cannot have any ROI-level diseased regions.

  • The observer gives a single rating (in fact an ordered label) to each ROI, denoted \({{R}_{kr}}\) (\(r\) = 1, 2, …, \({{Q}_{k}}\)). Here \(r\) is the ROI index and \(k\) is the case index. The rating can be an integer or quasi- continuous (e.g., 0 – 100), or a floating point value, as long as higher numbers represent greater confidence in presence of one or more lesions in the ROI.

  • The ROI paradigm is not restricted to 4 or even a constant number of ROIs per case. That is the reason for the k subscript in \({{Q}_{k}}\).

  • The ROI data structure is a special case of the FROC data structure, the essential difference being that the number of ratings per case is an a-priori known value, equal to \({{Q}_{k}}\).

  • ROI-level non-diseased region ratings are stored in the NL field and ROI-level diseased region ratings are stored in the LL field.

  • One can think of the ROI paradigm as similar to the FROC paradigm, but with localization accuracy restricted to belonging to a region (one cannot distinguish multiple lesions within a region). Unlike the FROC paradigm, a rating is required for every ROI.

An example ROI dataset

An example simulated ROI dataset is included as datasetROI.

str(datasetROI)
#> List of 3
#>  $ ratings     :List of 3
#>   ..$ NL   : num [1:2, 1:5, 1:90, 1:4] 0.95 0.927 0.556 0.805 1.421 ...
#>   ..$ LL   : num [1:2, 1:5, 1:40, 1:4] 1.57 2.31 2.3 2.34 2.34 ...
#>   ..$ LL_IL: logi NA
#>  $ lesions     :List of 3
#>   ..$ perCase: int [1:40] 2 3 2 2 3 3 1 2 3 3 ...
#>   ..$ IDs    : num [1:40, 1:4] 2 1 1 1 1 2 4 1 1 1 ...
#>   ..$ weights: num [1:40, 1:4] 0.5 0.333 0.5 0.5 0.333 ...
#>  $ descriptions:List of 7
#>   ..$ fileName     : chr "datasetROI"
#>   ..$ type         : chr "ROI"
#>   ..$ name         : chr "SIM-ROI"
#>   ..$ truthTableStr: logi NA
#>   ..$ design       : chr "FCTRL"
#>   ..$ modalityID   : Named chr [1:2] "1" "2"
#>   .. ..- attr(*, "names")= chr [1:2] "1" "2"
#>   ..$ readerID     : Named chr [1:5] "1" "2" "3" "4" ...
#>   .. ..- attr(*, "names")= chr [1:5] "1" "2" "3" "4" ...
datasetROI$ratings$NL[1,1,1,]
#> [1]  0.9498680 -0.0582497 -0.7763780  0.0120730
mean(datasetROI$ratings$NL[,,1:50,])
#> [1] 0.1014348
datasetROI$ratings$NL[1,1,51,]
#> [1] 1.01867 0.34710    -Inf    -Inf
datasetROI$lesions$perCase[1]
#> [1] 2
datasetROI$ratings$LL[1,1,1,]
#> [1] 1.56928 2.05945    -Inf    -Inf
x <- datasetROI$ratings$LL;mean(x[is.finite(x)])
#> [1] 1.815513

Examination of the output reveals that:

  • This is a 2-treatment 5-reader dataset, with 50 non-diseased cases and 40 diseased cases, and \({{Q}_{k}}=4\) for all k.

  • For treatment 1, reader 1, case 1 (the 1st non-diseased case) the 4 ratings are 0.949868, -0.0582497, -0.776378, 0.012073. The mean of all ratings on non-diseased cases is 0.1014348.

  • For treatment 1, reader 1, case 51 (the 1st diseased case) the NL ratings are 1.01867, 0.3471. There are only two finite values because this case has two ROI-level-diseased regions, and 2 plus 2 makes for the assumed 4-regions per case. The corresponding $lesionVector field is 2.

  • The ratings of the 2 ROI-level-diseased ROIs on this case are 1.56928, 2.05945. The mean rating over all ROI-level-diseased ROIs is 1.8155127.

The ROI Excel data file

An Excel file in JAFROC format containing simulated ROI data corresponding to datasetROI, is included with the distribution. The first command (below) finds the location of the file and the second command reads it and saves it to a dataset object ds. !!!DPC!!!

fileName <- system.file("extdata", "RoiData.xlsx", package = "RJafroc", mustWork = TRUE)
ds <- DfReadDataFile(fileName)
ds$descriptions$type
#> [1] "ROI"

The DfReadDataFile function automatically recognizes that this is an ROI dataset. Its structure is similar to the JAFROC format Excel file, with some important differences, noted below. It contains three worksheets:

  • The Truth worksheet - this indicates which cases are diseased and which are non-diseased and the number of ROI-level-diseased region on each case.
    • There are 50 non-diseased cases (labeled 1-50) under column CaseID and 40 diseased cases (labeled 51-90).
    • The LesionID field for each non-diseased case (e.g., CaseID = 1) is zero and there is one row per case. For diseased cases, this field has a variable number of entries, ranging from 1 to 4. As an example, there are two rows for CaseID = 51 in the Excel file: one with LesionID = and one with LesionID = .
    • The Weights field is always zero (this field is not used in ROI analysis).

  • The FP (or NL) worksheet - this lists the ratings of ROI-level-non-diseased regions.
    • For ReaderID = 1, ModalityID = 1 and CaseID = 1 there are 4 rows, corresponding to the 4 ROI-level-non-diseased regions in this case. The corresponding ratings are 0.949868, -0.0582497, -0.776378, 0.012073. The pattern repeats for other treatments and readers, but the rating are, of course, different.
    • Each CaseID is represented in the FP worksheet (a rare exception could occur if a case-level diseased case has 4 diseased regions).

  • The TP (or LL) worksheet - this lists the ratings of ROI-level-diseased regions.
    • Because non-diseased cases generate TPs, one does not find any entry with CaseID = 1-50 in the TP worksheet.
    • The lowest CaseID in the TP worksheet is 51, which corresponds to the first diseased case.
    • There are two entries for this case, corresponding to the two ROI-level-diseased regions present in this case. Recall that corresponding to this CaseID in the Truth worksheet there were two entries with LesionID = 2 and 3. These must match the LesionID’s listed for this case in the TP worksheet. Complementing these two entries, in the FP worksheet for CaseID = 51, there are 2 entries corresponding to the two ROI-level-non-diseased regions in this case.
    • One should be satisfied that for each diseased case the sum of the number of entries in the TP and FP worksheets is always 4.

References

Obuchowski, Nancy A. 1997. “Nonparametric Analysis of Clustered Roc Curve Data.” Journal Article. Biometrics 53: 567–78.

Obuchowski, Nancy A., Michael L. Lieber, and Kimerly A. Powell. 2000. “Data Analysis for Detection and Localization of Multiple Abnormalities with Application to Mammography.” Journal Article. Acad. Radiol. 7 (7): 516–25.