Ch00Vig2DataFormatFroc.Rmd
UtilLesionDistr()
.UtilLesionWeightsDistr()
.LL
) - if it is close to a real lesion - or a non-lesion localization (NL
) otherwise.The Excel file has three worsheets. These are named Truth
, NL
or FP
and LL
or TP
.
Truth
worksheetThe Truth
worksheet contains 6 columns: CaseID
, LesionID
, Weight
, ReaderID
, ModalityID
and Paradigm
.
CaseID
: unique integers, one per case, representing the cases in the dataset.LesionID
: integers 0, 1, 2, etc., with each 0 representing a non-diseased case, 1 representing the first lesion on a diseased case, 2 representing the second lesion on a diseased case, if present, and so on.1
, 2
and 3
, while the diseased cases are labeled 70
, 71
, 72
, 73
and 74
.LesionID
column).LesionID
column of the Truth
worksheet).ReaderID
column contains 0, 1, 2
).ModalityID
column contains 0, 1
).Weight
: floating point; 0, for each non-diseased case, or values for each diseased case that add up to unity.70
has two lesions, with LesionID
s 1 and 2, and weights 0.3 and 0.7. Diseased case 71
has one lesion, with LesionID
= 1, and Weight
= 1. Diseased case 72
has three lesions, with LesionID
s 1, 2 and 3 and weights 1/3 each. Diseased case 73
has two lesions, with LesionID
s 1, and 2 and weights 0.1 and 0.9. Diseased case 74
has one lesion, with LesionID
= 1 and Weight
= 1.ReaderID
: a comma-separated listing of readers, each represented by a unique integer, that have interpreted the case. In the example shown below each cell has the value 0, 1, 2
. Each cell has to be text formatted. Otherwise Excel will not accept it.
ModalityID
: a comma-separated listing of modalities (or treatments), each represented by a unique integer, that apply to each case. In the example each cell has the value 0, 1
. Each cell has to be text formatted.
Paradigm
: In the example shown below, the contents are FROC
and crossed
. It informs the software that this is an FROC
dataset and the design is “crossed”, as in Vignette #1.The example shown above corresponds to Excel file inst/extdata/toyFiles/FROC/frocCr.xlsx
in the project directory.
frocCr <- system.file("extdata", "toyFiles/FROC/frocCr.xlsx",
package = "RJafroc", mustWork = TRUE)
x <- DfReadDataFile(frocCr, newExcelFileFormat = TRUE)
str(x)
#> List of 12
#> $ NL : num [1:2, 1:3, 1:8, 1:2] 1.02 2.89 2.21 3.01 2.14 ...
#> $ LL : num [1:2, 1:3, 1:5, 1:3] 5.28 5.2 5.14 4.77 4.66 4.87 3.01 3.27 3.31 3.19 ...
#> $ lesionVector : int [1:5] 2 1 3 2 1
#> $ lesionID : num [1:5, 1:3] 1 1 1 1 1 ...
#> $ lesionWeight : num [1:5, 1:3] 0.3 1 0.333 0.1 1 ...
#> $ dataType : chr "FROC"
#> $ modalityID : Named chr [1:2] "0" "1"
#> ..- attr(*, "names")= chr [1:2] "0" "1"
#> $ readerID : Named chr [1:3] "0" "1" "2"
#> ..- attr(*, "names")= chr [1:3] "0" "1" "2"
#> $ design : chr "CROSSED"
#> $ normalCases : int [1:3] 1 2 3
#> $ abnormalCases: int [1:5] 70 71 72 73 74
#> $ truthTableStr: num [1:2, 1:3, 1:8, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
x$dataType
member indicates that this is an FROC
dataset.x$lesionVector
member is a vector whose contents reflect the number of lesions in each diseased case, i.e., 2, 1, 3, 2, 1 in the current example.x$lesionID
member indicates the labeling of the lesions in each diseased case.x$lesionID
#> [,1] [,2] [,3]
#> [1,] 1 2 -Inf
#> [2,] 1 -Inf -Inf
#> [3,] 1 2 3
#> [4,] 1 2 -Inf
#> [5,] 1 -Inf -Inf
-Inf
is a filler used to denote a missing value. The second diseased case has one lesion labeled 1. The third diseased case has three lesions labeled 1, 2 and 3, etc.lesionWeight
member is the clinical importance of each lesion. Lacking specific clinical reasons, the lesions should be equally weighted; this is not true for this toy dataset.x$lesionWeight
#> [,1] [,2] [,3]
#> [1,] 0.3000000 0.7000000 -Inf
#> [2,] 1.0000000 -Inf -Inf
#> [3,] 0.3333333 0.3333333 0.3333333
#> [4,] 0.1000000 0.9000000 -Inf
#> [5,] 1.0000000 -Inf -Inf
These are found in the FP
or NL
worksheet, see below.
ReaderID
: the reader labels: these must be 0
, 1
, or 2
, as declared in the Truth
worksheet.ModalityID
: the modality labels: must be 0
or 1
, as declared in the Truth
worksheet.CaseID
: the labels of cases with NL
marks. In the FROC paradigm, NL
events can occur on non-diseased and diseased cases.FP_Rating
: the floating point ratings of NL
marks. Each row of this worksheet yields a rating corresponding to the values of ReaderID
, ModalityID
and CaseID
for that row.ModalityID
0, ReaderID
0 and CaseID
1 (the first non-diseased case declared in the Truth
worksheet), there is a single NL
mark that was rated 1.02, corresponding to row 2 of the FP
worksheet.NL
marks are also declared in the FP
worksheet. Some examples are seen at rows 15, 16 and 21-23 of the FP
worksheet.caseID
= 71 got two NL
marks, rated 2.24, 4.01.x$NL
list member, 2 in the current example. Absent this case, the length would have been one.NL
marks determines the length of the fourth dimension of the x$NL
list member.x$NL
reflect the contents of the FP
worksheet.These are found in the TP
or LL
worksheet, see below.
Truth
worsheet for this dataset, the maximum length would be 9 times 2 times 3, assuming every lesion is marked for each modality, reader and diseased case. The 9 comes from the total number of non-zero entries in the LesionID
column of the Truth
worksheet.CaseID
equal to 70
was marked (and rated 5.28) in ModalityID
0
and ReaderID
0
.x$LL
list member, 3 in the present example, is determined by the diseased case with the most lesions in the Truth
worksheet.x$LL
reflect the contents of the TP
worksheet.dataset11
, with structure as shown below:x <- dataset11
str(x)
#> List of 8
#> $ NL : num [1:4, 1:5, 1:158, 1:4] -Inf -Inf -Inf -Inf -Inf ...
#> $ LL : num [1:4, 1:5, 1:115, 1:20] -Inf -Inf -Inf -Inf -Inf ...
#> $ lesionVector: int [1:115] 6 4 7 1 3 3 3 8 11 2 ...
#> $ lesionID : num [1:115, 1:20] 1 1 1 1 1 1 1 1 1 1 ...
#> $ lesionWeight: num [1:115, 1:20] 0.167 0.25 0.143 1 0.333 ...
#> $ dataType : chr "FROC"
#> $ modalityID : Named chr [1:4] "1" "2" "3" "4"
#> ..- attr(*, "names")= chr [1:4] "1" "2" "3" "4"
#> $ readerID : Named chr [1:5] "1" "2" "3" "4" ...
#> ..- attr(*, "names")= chr [1:5] "1" "2" "3" "4" ...
x$lesionVector
.x$lesionVector
#> [1] 6 4 7 1 3 3 3 8 11 2 4 6 2 16 5 2 8 3 4 7 11 1 4 3 4
#> [26] 4 7 3 2 5 2 2 7 6 6 4 10 20 12 6 4 7 12 5 1 1 5 1 2 8
#> [51] 3 1 2 2 3 2 8 16 10 1 2 2 6 3 2 2 4 6 10 11 1 2 6 2 4
#> [76] 5 2 9 6 6 8 3 8 7 1 1 6 3 2 1 9 8 8 2 2 12 1 1 1 1
#> [101] 1 3 1 2 2 1 1 1 1 3 1 1 1 2 1
which()
function:for (el in 1:max(x$lesionVector)) cat(
"abnormal cases with", el, "lesions = ",
length(which(x$lesionVector == el)), "\n")
#> abnormal cases with 1 lesions = 25
#> abnormal cases with 2 lesions = 23
#> abnormal cases with 3 lesions = 13
#> abnormal cases with 4 lesions = 10
#> abnormal cases with 5 lesions = 5
#> abnormal cases with 6 lesions = 11
#> abnormal cases with 7 lesions = 6
#> abnormal cases with 8 lesions = 8
#> abnormal cases with 9 lesions = 2
#> abnormal cases with 10 lesions = 3
#> abnormal cases with 11 lesions = 3
#> abnormal cases with 12 lesions = 3
#> abnormal cases with 13 lesions = 0
#> abnormal cases with 14 lesions = 0
#> abnormal cases with 15 lesions = 0
#> abnormal cases with 16 lesions = 2
#> abnormal cases with 17 lesions = 0
#> abnormal cases with 18 lesions = 0
#> abnormal cases with 19 lesions = 0
#> abnormal cases with 20 lesions = 1
lesDistr
arrayfor (el in 1:max(x$lesionVector)) cat("fraction of abnormal cases with", el, "lesions = ",
length(which(x$lesionVector == el))/length(x$LL[1,1,,1]), "\n")
#> fraction of abnormal cases with 1 lesions = 0.2173913
#> fraction of abnormal cases with 2 lesions = 0.2
#> fraction of abnormal cases with 3 lesions = 0.1130435
#> fraction of abnormal cases with 4 lesions = 0.08695652
#> fraction of abnormal cases with 5 lesions = 0.04347826
#> fraction of abnormal cases with 6 lesions = 0.09565217
#> fraction of abnormal cases with 7 lesions = 0.05217391
#> fraction of abnormal cases with 8 lesions = 0.06956522
#> fraction of abnormal cases with 9 lesions = 0.0173913
#> fraction of abnormal cases with 10 lesions = 0.02608696
#> fraction of abnormal cases with 11 lesions = 0.02608696
#> fraction of abnormal cases with 12 lesions = 0.02608696
#> fraction of abnormal cases with 13 lesions = 0
#> fraction of abnormal cases with 14 lesions = 0
#> fraction of abnormal cases with 15 lesions = 0
#> fraction of abnormal cases with 16 lesions = 0.0173913
#> fraction of abnormal cases with 17 lesions = 0
#> fraction of abnormal cases with 18 lesions = 0
#> fraction of abnormal cases with 19 lesions = 0
#> fraction of abnormal cases with 20 lesions = 0.008695652
lesDistr
arrayUtility
function UtilLesionDistr()
lesDistr <- UtilLesionDistr(x)
lesDistr
#> [,1] [,2]
#> [1,] 1 0.217391304
#> [2,] 2 0.200000000
#> [3,] 3 0.113043478
#> [4,] 4 0.086956522
#> [5,] 5 0.043478261
#> [6,] 6 0.095652174
#> [7,] 7 0.052173913
#> [8,] 8 0.069565217
#> [9,] 9 0.017391304
#> [10,] 10 0.026086957
#> [11,] 11 0.026086957
#> [12,] 12 0.026086957
#> [13,] 16 0.017391304
#> [14,] 20 0.008695652
UtilLesionDistr()
function returns an array with two columns and number of rows equal to the number of distinct values of lesions per case.sum(UtilLesionDistr(x)[,2])
#> [1] 1
lesWghtDistr
arrayUtilLesionWeightsDistr()
.lesDistr
.lesDistr
.-Inf
.lesWghtDistr <- UtilLesionWeightsDistr(x)
cat("dim(lesDistr) =", dim(lesDistr),"\n")
#> dim(lesDistr) = 14 2
cat("dim(lesWghtDistr) =", dim(lesWghtDistr),"\n")
#> dim(lesWghtDistr) = 14 21
cat("lesWghtDistr = \n\n")
#> lesWghtDistr =
lesWghtDistr
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 1.00000000 -Inf -Inf -Inf -Inf -Inf
#> [2,] 2 0.50000000 0.50000000 -Inf -Inf -Inf -Inf
#> [3,] 3 0.33333333 0.33333333 0.33333333 -Inf -Inf -Inf
#> [4,] 4 0.25000000 0.25000000 0.25000000 0.25000000 -Inf -Inf
#> [5,] 5 0.20000000 0.20000000 0.20000000 0.20000000 0.20000000 -Inf
#> [6,] 6 0.16666667 0.16666667 0.16666667 0.16666667 0.16666667 0.16666667
#> [7,] 7 0.14285714 0.14285714 0.14285714 0.14285714 0.14285714 0.14285714
#> [8,] 8 0.12500000 0.12500000 0.12500000 0.12500000 0.12500000 0.12500000
#> [9,] 9 0.11111111 0.11111111 0.11111111 0.11111111 0.11111111 0.11111111
#> [10,] 10 0.10000000 0.10000000 0.10000000 0.10000000 0.10000000 0.10000000
#> [11,] 11 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909
#> [12,] 12 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
#> [13,] 16 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000
#> [14,] 20 0.05000000 0.05000000 0.05000000 0.05000000 0.05000000 0.05000000
#> [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [2,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [3,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [4,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [5,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [6,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [7,] 0.14285714 -Inf -Inf -Inf -Inf -Inf -Inf
#> [8,] 0.12500000 0.12500000 -Inf -Inf -Inf -Inf -Inf
#> [9,] 0.11111111 0.11111111 0.11111111 -Inf -Inf -Inf -Inf
#> [10,] 0.10000000 0.10000000 0.10000000 0.10000000 -Inf -Inf -Inf
#> [11,] 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 -Inf -Inf
#> [12,] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 -Inf
#> [13,] 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.0625
#> [14,] 0.05000000 0.05000000 0.05000000 0.05000000 0.05000000 0.05000000 0.0500
#> [,15] [,16] [,17] [,18] [,19] [,20] [,21]
#> [1,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [2,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [3,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [4,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [5,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [6,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [7,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [8,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [9,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [10,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [11,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [12,] -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [13,] 0.0625 0.0625 0.0625 -Inf -Inf -Inf -Inf
#> [14,] 0.0500 0.0500 0.0500 0.05 0.05 0.05 0.05
x$NL
or x$LL
list members is the total number of modalities, 2 in the current example.x$NL
or x$LL
list members is the total number of readers, 3 in the current example.x$NL
is the total number of cases, 8 in the current example. The first three positions account for NL
marks on non-diseased cases and the remaining 5 positions account for NL
marks on diseased cases.x$LL
is the total number of diseased cases, 5 in the current example.x$NL
is determined by the case (diseased or non-diseased) with the most NL
marks, 2 in the current example.x$LL
is determined by the diseased case with the most lesions, 3 in the current example.Bunch, PC, JF Hamilton, GK Sanderson, and AH Simmons. 1978. “Free Response Approach to Measurement and Characterization of Radiographic Observer Performance.” Journal Article. In American Journal of Roentgenology, 130:382–82. AMER ROENTGEN RAY SOC 1891 PRESTON WHITE DR, RESTON, VA 22091.
Chakraborty, Dev P. 2017. Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples. Book. Boca Raton, FL: CRC Press.