RJafroc documentation
Preface
TBA How much finished
The pdf file of the book
A note on the online distribution mechanism of the book
Structure of the book
Contributing to this book
Is this book relevant to you and what are the alternatives?
ToDos TBA
Chapters needing heavy edits
Shelved vs. removed vs. parked folders needing heavy edits
Coding aids
Quick Start
1
Help
1.1
TBA How much finished
1.2
Getting help on the software
1.3
References
2
JAFROC data format
2.1
TBA How much finished
2.2
Introduction
2.3
Note to existing users
2.4
Contents of Excel file
2.5
The
Truth
worksheet
2.6
The false positive (FP) ratings
2.7
The true positive (TP) ratings
2.8
A single reader dataset
2.9
References
3
Reading the Excel data file
3.1
TBA How much finished
3.2
Introduction
3.3
The structure of an ROC dataset
3.4
Correspondence between
NL
member of dataset and the
FP
worksheet
3.5
Case-index vs. caseID
3.6
Correspondence between
LL
member of dataset and the
TP
worksheet
3.7
References
4
Data format and reading FROC data
4.1
TBA How much finished
4.2
Introduction
4.3
The
Truth
worksheet
4.4
Reading the FROC dataset
4.5
The false positive (FP) ratings
4.6
The true positive (TP) ratings
4.7
On the distribution of numbers of lesions in diseased cases
4.7.1
Definition of
lesDistr
array
4.8
Definition of
lesWghtDistr
array
4.9
References
5
DBM analysis text output
5.1
TBA How much finished
5.2
Introduction
5.3
Analyzing the ROC dataset
5.4
Explanation of the output
5.5
References
6
OR analysis text output
6.1
TBA How much finished
6.2
Introduction
6.3
Analyzing the ROC dataset
6.4
Explanation of the output
6.5
References
7
OR analysis Excel output
7.1
TBA How much finished
7.2
Introduction
7.3
Generating the Excel output file
7.4
References
ROC paradigm
8
Preliminaries
8.1
TBA How much finished
8.2
Introduction
8.3
Clinical tasks
8.3.1
Workflow in an imaging study
8.3.2
The screening and diagnostic workup tasks
8.4
Imaging device development and its clinical deployment
8.4.1
Physical measurements
8.4.2
Quality Control and Image quality optimization
8.5
Image quality vs. task performance
8.6
Why physical measures of image quality are not enough
8.7
Model observers
8.8
Measuring observer performance: four paradigms
8.8.1
Basic approach to the analysis
8.8.2
Historical notes
8.9
Hierarchy of assessment methods
8.10
Overview of the book and how to use it
8.10.1
Overview of the book
8.10.2
How to use the book
8.11
Summary
8.12
Discussion
8.13
References
9
The Binary Task
9.1
TBA How much finished
9.2
Introduction
9.3
The fundamental 2x2 table
9.4
Sensitivity and specificity
9.4.1
Reasons for the names sensitivity and specificity
9.4.2
Estimating sensitivity and specificity
9.5
Disease prevalence
9.6
Accuracy
9.7
Negative and positive predictive values
9.7.1
Example calculation of PPV, NPV and accuracy
9.7.2
Comments
9.7.3
PPV and NPV are irrelevant to laboratory tasks
9.8
Summary
9.9
Discussion
9.10
References
10
Modeling the Binary Task
10.1
TBA How much finished
10.2
Introduction
10.3
Decision variable and decision threshold
10.3.1
Existence of a decision variable
10.3.2
Existence of a decision threshold
10.3.3
Adequacy of the training session
10.4
Changing the decision threshold: Example I
10.5
Changing the decision threshold: Example II
10.6
The equal-variance binormal model
10.7
The normal distribution
10.8
Analytic expressions for specificity and sensitivity
10.9
Demonstration of the concepts of sensitivity and specificity
10.9.1
Estimating mu from a finite sample
10.9.2
Changing the seed variable: case-sampling variability
10.9.3
Increasing the numbers of cases
10.10
Inverse variation of sensitivity and specificity and the need for a single FOM
10.11
The ROC curve
10.11.1
The chance diagonal
10.11.2
The guessing observer
10.11.3
Symmetry with respect to negative diagonal
10.11.4
Area under the ROC curve
10.11.5
Properties of the equal-variance binormal model ROC curve
10.11.6
Comments
10.11.7
Physical interpretation of the mu-parameter
10.12
Assigning confidence intervals to an operating point
10.13
Variability in sensitivity and specificity: the Beam et al study
10.14
Summary
10.15
References
11
Ratings Paradigm
11.1
TBA How much finished
11.2
Introduction
11.3
The ROC counts table
11.4
Operating points from counts table
11.4.1
Labeling the points
11.4.2
Examples
11.5
Automating all this
11.6
Relation between ratings paradigm and the binary paradigm
11.7
Ratings are not numerical values
11.8
A single “clinical” operating point from ratings data
11.9
The forced choice paradigm
11.10
Observer performance studies as laboratory simulations of clinical tasks
11.11
Discrete vs. continuous ratings: the Miller study
11.12
The BI-RADS ratings scale and ROC studies
11.13
The controversy
11.14
Discussion
11.15
References
12
Empirical AUC
12.1
TBA How much finished
12.2
Introduction
12.3
The empirical ROC plot
12.3.1
Notation for cases
12.3.2
An empirical operating point
12.4
Empirical operating points from ratings data
12.5
AUC under the empirical ROC plot
12.6
The Wilcoxon statistic
12.7
Bamber’s Equivalence theorem
12.8
Importance of Bamber’s theorem
12.9
Discussion / Summary
12.10
Appendix 5.A: Details of Wilcoxon theorem
12.10.1
Upper triangle
12.10.2
Lowest trapezoid
12.11
References
13
Binormal model
13.1
TBA How much finished
13.2
TBA Introduction
13.3
Binormal model
13.3.1
The basic model
13.3.2
Additional parameters for binned data
13.3.3
Sensitivity and specificity
13.3.4
Binormal model in conventional notation
13.4
Binormal ROC curve
13.5
Scalar threshold-independent measure
13.5.1
Partial AUC
13.5.2
Full AUC
13.5.3
The d’ measure
13.6
Partial AUC vs. true performance
13.7
Illustrative plots
13.8
Geometrical argument
13.9
Optimal operating point on ROC
13.10
Discussion
13.11
Appendix I: Density functions
13.12
Appendix II: Area under binormal ROC
13.12.1
General case (partial-area)
13.12.2
Special case (total area)
13.13
Appendix III: Invariance property of pdfs
13.14
Appendix IV: Fitting an ROC curve
13.14.1
JAVA fitted ROC curve
13.14.2
Simplistic straight line fit to the ROC curve
13.14.3
Maximum likelihood estimation (MLE)
13.14.4
Code implementing MLE
13.15
Appendix V: Validating fitting model
13.15.1
Estimating the covariance matrix
13.15.2
Estimating the variance of Az
13.16
References
14
Sources of AUC variability
14.1
TBA How much finished
14.2
Introduction
14.3
Three sources of variability
14.4
Dependence of AUC on the case sample
14.4.1
Case sampling variability of AUC
14.5
DeLong method
14.6
Bootstrap method
14.6.1
Demonstration of the bootstrap method
14.7
Jackknife method
14.8
Calibrated simulator
14.8.1
The need for a calibrated simulator
14.8.2
Implementation of a simple calibrated simulator
14.9
Discussion
14.10
References
Significance Testing
15
Hypothesis Testing
15.1
TBA How much finished
15.2
Introduction
15.3
Single-modality single-reader ROC study
15.4
Type-I errors
15.5
One vs. two sided tests
15.6
Statistical power
15.6.1
Factors affecting statistical power
15.7
Comments
15.8
Why alpha is chosen as 5%
15.9
Discussion
15.10
References
16
DBM method background
16.1
TBA How much finished
16.2
Introduction
16.2.1
Historical background
16.2.2
The Wagner analogy
16.2.3
The shortage of numbers to analyze and a pivotal breakthrough
16.2.4
Organization of chapter
16.3
Random and fixed factors
16.4
Reader and case populations
16.5
Three types of analyses
16.6
General approach
16.7
Summary TBA
16.8
References
17
Significance Testing using the DBM Method
17.1
TBA How much finished
17.2
The DBM sampling model
17.2.1
Explanation of terms in the model
17.2.2
Meanings of variance components in the DBM model (
TBA this section can be improved
)
17.2.3
Definitions of mean-squares
17.3
Expected values of mean squares
17.4
Random-reader random-case (RRRC) analysis
17.4.1
Calculation of mean squares: an example
17.4.2
Significance testing
17.4.3
The Satterthwaite approximation
17.4.4
Decision rules, p-value and confidence intervals
17.5
Sample size estimation for random-reader random-case generalization
17.5.1
The non-centrality parameter
17.5.2
The denominator degrees of freedom
17.5.3
Example of sample size estimation, RRRC generalization
17.6
Significance testing and sample size estimation for fixed-reader random-case generalization
17.7
Significance testing and sample size estimation for random-reader fixed-case generalization
17.8
Summary TBA
17.9
Things for me to think about
17.9.1
Expected values of mean squares
17.10
References
18
DBM method special cases
18.1
TBA How much finished
18.2
Fixed-reader random-case (FRRC) analysis
18.2.1
Single-reader multiple-treatment analysis
18.3
Random-reader fixed-case (RRFC) analysis
18.4
References
19
Introduction to the Obuchowski-Rockette method
19.1
TBA How much finished
19.2
Locations of helper functions
19.3
Introduction
19.4
Single-reader multiple-treatment
19.4.1
Overview
19.4.2
Significance testing
19.4.3
p-value and confidence interval
19.4.4
Null hypothesis validation
19.4.5
Application 1
19.4.6
Application 2
19.5
Single-treatment multiple-reader
19.5.1
Overview
19.5.2
Significance testing
19.5.3
Special case
19.6
Multiple-reader multiple-treatment
19.6.1
Structure of the covariance matrix
19.6.2
Physical meanings of the covariance terms
19.7
Summary
19.8
Discussion
19.9
Appendix: Covariance and correlation
19.9.1
Relation: chisquare and F with infinite ddf
19.9.2
Definitions of covariance and correlation
19.9.3
Special case when variables have equal variances
19.9.4
Estimating the variance-covariance matrix
19.9.5
The variance inflation factor
19.9.6
Meaning of the covariance matrix
19.9.7
Code illustrating the covariance matrix
19.9.8
Comparing DBM to Obuchowski and Rockette for single-reader multiple-treatments
19.10
References
20
Obuchowski Rockette (OR) Analysis
20.1
TBA How much finished
20.2
Introduction
20.3
Random-reader random-case
20.3.1
Two anecdotes
20.3.2
Hillis ddf
20.3.3
Decision rule, p-value and confidence interval
20.4
Fixed-reader random-case
20.5
Random-reader fixed-case
20.6
Single treatment analysis
21
Obuchowski Rockette Applications
21.1
TBA How much finished
21.2
Introduction
21.3
Hand calculation
21.3.1
Random-Reader Random-Case (RRRC) analysis
21.3.2
Fixed-Reader Random-Case (FRRC) analysis
21.3.3
Random-Reader Fixed-Case (RRFC) analysis
21.4
RJafroc: dataset02
21.4.1
Random-Reader Random-Case (RRRC) analysis
21.4.2
Fixed-Reader Random-Case (FRRC) analysis
21.4.3
Random-Reader Fixed-Case (RRFC) analysis
21.5
RJafroc: dataset04
21.5.1
Random-Reader Random-Case (RRRC) analysis
21.5.2
Fixed-Reader Random-Case (FRRC) analysis
21.5.3
Random-Reader Fixed-Case (RRFC) analysis
21.6
RJafroc: dataset04, FROC
21.6.1
Random-Reader Random-Case (RRRC) analysis
21.6.2
Fixed-Reader Random-Case (FRRC) analysis
21.6.3
Random-Reader Fixed-Case (RRFC) analysis
21.7
RJafroc: dataset04, FROC/DBM
21.7.1
Random-Reader Random-Case (RRRC) analysis
21.7.2
Fixed-Reader Random-Case (FRRC) analysis
21.7.3
Random-Reader Fixed-Case (RRFC) analysis
21.8
Summary
21.9
Discussion
21.10
Tentative
21.11
References
22
Sample size estimation for ROC studies DBM method
22.1
TBA How much finished
22.2
Introduction
22.3
Statistical Power
22.3.1
Observed vs. anticipated effect-size
22.3.2
Dependence of statistical power on estimates of model parameters
22.3.3
Formulae for random-reader random-case (RRRC) sample size estimation
22.3.4
Significance testing
22.3.5
p-value and confidence interval
22.3.6
Comparing DBM to Obuchowski and Rockette for single-reader multiple-treatments
22.4
Formulae for fixed-reader random-case (FRRC) sample size estimation
22.4.1
Formulae for random-reader fixed-case (RRFC) sample size estimation
22.4.2
Fixed-reader random-case (FRRC) analysis TBA
22.4.3
Random-reader fixed-case (RRFC) analysis
22.4.4
Single-treatment multiple-reader analysis
22.5
Discussion/Summary/2
22.6
References
23
Sample size estimation for ROC studies OR method
23.1
TBA How much finished
23.2
Introduction
23.3
Statistical Power
23.3.1
Sample size estimation for random-reader random-cases
23.3.2
Dependence of statistical power on estimates of model parameters
23.3.3
Formulae for random-reader random-case (RRRC) sample size estimation
23.3.4
Significance testing
23.3.5
p-value and confidence interval
23.3.6
Comparing DBM to Obuchowski and Rockette for single-reader multiple-treatments
23.4
Formulae for fixed-reader random-case (FRRC) sample size estimation
23.4.1
Formulae for random-reader fixed-case (RRFC) sample size estimation
23.4.2
Example 1
23.4.3
Fixed-reader random-case (FRRC) analysis
23.4.4
Random-reader fixed-case (RRFC) analysis
23.4.5
Single-treatment multiple-reader analysis
23.5
Discussion/Summary/3
23.6
References
FROC paradigm
24
The FROC paradigm
24.1
TBA How much finished
24.2
Introduction
24.3
Location specific paradigms
24.4
Visual search
24.4.1
Proximity criterion and scoring the data
24.4.2
Multiple marks in the same vicinity
24.4.3
Historical context
24.5
A pioneering FROC study in medical imaging
24.5.1
Image preparation
24.5.2
Image Interpretation and the 1-rating
24.5.3
Scoring the data
24.6
The free-response receiver operating characteristic (FROC) plot
24.7
Preview of the RSM data simulator
24.8
Population and binned FROC plots
24.9
Perceptual SNR
24.10
The “solar” analogy: search vs. classification performance
24.11
Discussion and suggestions
24.12
References
25
Empirical plots
25.1
TBA How much finished
25.2
Introduction
25.3
Mark rating pairs
25.3.1
Latent vs. actual marks
25.3.2
Binning rule
25.4
FROC notation
25.4.1
Comments on Table @ref(tab:froc-empirical-notation)
25.4.2
Discussion: cases with zero latent NL marks
25.5
The empirical FROC
25.5.1
Definition
25.5.2
The origin, a trivial point
25.5.3
The observed end-point and its semi-constrained property
25.5.4
Futility of extrapolation outside the observed end-point
25.6
The inferred ROC plot
25.6.1
Inferred-ROC rating
25.6.2
Inferred FPF
25.6.3
Inferred TPF
25.6.4
Definition
25.7
The alternative FROC (AFROC) plot
25.7.1
Definition
25.7.2
The constrained observed end-point of the AFROC
25.8
The weighted-AFROC (wAFROC) plot
25.8.1
Definition
25.9
The AFROC1 plot
25.9.1
Definition
25.10
The weighted-AFROC1 (wAFROC1) plot
25.10.1
Definition
25.11
The EFROC plot
25.11.1
Definition
25.12
Discussion
25.13
References
26
Empirical plot examples
26.1
TBA How much finished
26.2
Introduction
26.3
Raw FROC/AFROC/ROC plots
26.3.1
Code for raw plots
26.3.2
Explanation of the code
26.3.3
Key differences from the ROC paradigm:
26.4
The chance level FROC and AFROC
26.5
Location-level “true-negatives”
26.6
Binned FROC/AFROC/ROC plots
26.6.1
Code for binned plots
26.6.2
Effect of
seed
on binned plots
26.7
Structure of the binned data
26.8
Summary
26.9
Discussion
26.10
References
27
FROC vs. wAFROC
27.1
TBA How much finished
27.2
Introduction
27.3
FROC vs. wAFROC
27.3.1
Moderate difference in performance
27.3.2
Large difference in performance
27.3.3
Small difference in performance and identical thresholds
27.4
Summary of simulations
27.4.1
Summary of R1 simulations
27.4.2
Summary of R2 simulations
27.4.3
Comments
27.5
Effect size comparison
27.6
Performance depends on
\(\zeta_1\)
27.7
Discussion
27.8
References
28
Meanings of FROC figures of merit
28.1
TBA How much finished
28.2
Introduction
28.3
Empirical AFROC FOM-statistic
28.3.1
Upper limit for AFROC FOM-statistic
28.3.2
Range of AFROC FOM-statistic
28.4
Empirical weighted-AFROC FOM-statistic
28.5
Two Theorems
28.5.1
Theorem 1
28.5.2
Theorem 2
28.6
Numerical illustrations
28.7
Summary tables of ratings
28.8
AFROC plot from first principles
28.9
wAFROC plot from first principles
28.10
Physical interpretations
28.10.1
Physical interpretation of area under AFROC
28.10.2
Physical interpretation of area under wAFROC
28.11
Discussion
28.12
References
29
Visual Search
29.1
TBA How much finished
29.2
Introduction
29.3
Grouping and labeling ROIs
29.4
Recognition vs. detection
29.5
Search vs. classification
29.6
Two visual search paradigms
29.6.1
The conventional paradigm
29.6.2
The medical imaging visual search paradigm
29.7
Determining where the radiologist looks
29.8
The Kundel - Nodine search model
29.8.1
Glancing / Global impression
29.8.2
Scanning / local feature analysis
29.9
Kundel-Nodine model and CAD algorithms
29.10
Simultaneously acquired eye-tracking and FROC data
29.10.1
FROC and Eye-Tracking Data Collection
29.10.2
Measures of Visual Attention
29.10.3
Generalized ratings
29.11
Discussion / Summary
29.12
References
30
The radiological search model
30.1
TBA How much finished
30.2
Introduction
30.3
The radiological search model
30.4
RSM assumptions
30.5
Summary of RSM
30.6
Physical interpretation of RSM parameters
30.6.1
The
\(\mu\)
parameter
30.6.2
The
\(\lambda'\)
parameter
30.6.3
The
\(\nu'\)
parameter
30.7
Model re-parameterization
30.8
Discussion / Summary
30.9
References
31
Radiological search model predictions
31.1
TBA How much finished
31.2
Introduction
31.3
Inferred integer ROC ratings
31.3.1
Comments
31.4
Constrained end-point property
31.4.1
The abscissa of the ROC end-point
31.4.2
The ordinate of the ROC end-point
31.4.3
Variable number of lesions per case
31.5
The RSM-predicted ROC curve
31.5.1
Derivation of FPF
31.5.2
Derivation of TPF
31.5.3
Extension to varying numbers of lesions
31.5.4
“Proper” property of the RSM-predicted ROC curve
31.5.5
The pdfs for the ROC decision variable
31.5.6
RSM-predicted ROC-AUC and AFROC-AUC
31.5.7
RSM-predicted ROC and pdf curves
31.6
The RSM-predicted FROC curve
31.7
The RSM-predicted AFROC curve
31.7.1
Chance level performance on AFROC
31.7.2
The reader who does not yield any marks
31.8
Discussion / Summary
31.8.1
The Wagner review
31.9
References
32
Search and classification performances
32.1
TBA How much finished
32.2
Introduction
32.3
Quantifying search performance #rsm-search-search-performance}
32.4
Quantifying classification performance
32.4.1
Lesion-classification performance and the 2AFC LKE task
32.4.2
Significance of measuring search and lesion-classification performance
32.5
Discussion / Summary
32.5.1
The Wagner review
32.6
References
33
The FROC should not be used to measure performance
33.1
TBA How much finished
33.2
Introduction
33.3
The FROC curve is a poor descriptor of search performance
33.3.1
Clinically relevant portion of an operating characteristic #rsm-goodbye-froc-clinically-relevant}
33.4
Discussion / Summary
33.4.1
The Wagner review
33.5
References
34
Analyzing FROC data
34.1
TBA How much finished
34.2
Introduction
34.3
Example 1
34.4
Plotting wAFROC and ROC curves
34.5
Reporting an FROC study
34.6
Crossed-treatment analysis
34.7
Discussion / Summary
34.8
References
35
FROC sample size
35.1
TBA How much finished
35.2
Introduction
35.3
Example 1
35.4
Plotting wAFROC and ROC curves
35.5
FitRsmROC
usage example
35.6
Discussion / Summary
35.7
References
36
RSM fitting
36.1
TBA How much finished
36.2
Introduction
36.3
FROC likelihood function
36.3.1
Contribution of NLs
36.3.2
Contribution of LLs
36.3.3
Degeneracy problems
36.4
IDCA Likelihood function
36.5
ROC Likelihood function
36.6
FitRsmROC
implementation
36.7
FitRsmROC
usage example
36.8
Discussion / Summary
36.9
References
37
Three proper ROC fits
37.1
TBA How much finished
37.2
Introduction
37.3
Applications
37.3.1
Application to two datasets
37.4
Displaying composite plots
37.5
Displaying RSM parameters
37.6
Displaying CBM parameters
37.7
Displaying PROPROC parameters
37.8
Overview of findings
37.8.1
Slopes
37.8.2
Confidence intervals
37.8.3
Summary of slopes and confidence intervals
37.9
Discussion / Summary
37.10
Appendices
37.11
Datasets
37.12
Location of PROPROC files
37.13
Location of pre-analyzed results
37.14
Plots for Van Dyke dataset
37.15
References
CAD
38
Standalone CAD vs. Radiologists
38.1
TBA How much finished
38.2
Abstract
38.3
Keywords
38.4
Introduction
38.5
Methods
38.5.1
Studies assessing performance of CAD vs. radiologists
38.5.2
The 1T-RRFC analysis model
38.5.3
The 2T-RRRC analysis model
38.5.4
The 1T-RRRC analysis model
38.6
Software implementation
38.7
Results
38.8
Discussion
38.9
Appendix
38.9.1
Example 1
38.9.2
Example 2
38.9.3
Example 3
38.10
References
39
Optimal operating point on FROC
39.1
TBA How much finished
39.2
Introduction
39.3
Methods
39.3.1
Functions to be maximized
39.3.2
Vary lambda
39.3.3
Vary nu
39.3.4
Vary mu
39.4
Using the method
39.5
An application
39.6
Discussion
39.7
References
REFERENCES
40
Localization - classification tasks
40.1
TBA How much finished
40.2
Introduction
40.3
Abbreviations
40.4
History and basic idea
40.5
First example, File1.xlsx
40.5.1
Truth
sheet
40.5.2
TP
sheet
40.5.3
FP
sheet
40.5.4
The two ratings arrays
40.6
Second example, File2.xlsx
40.7
Third example, File3.xlsx
40.8
Fourth example, File4.xlsx
40.9
Fifth example, File5.xlsx
40.10
Precautions
40.11
Discussion
40.12
References
41
Split Plot Study Design
41.1
TBA How much finished
41.2
Mean Square R(T)
41.3
References
Published with bookdown
The RJafroc Book
REFERENCES