Analysis of rating-method ROC data in which data from patients and readers is pooled has a number of advantages over analysis of the data from individual readers: (1) it measures the performance of a diagnostic system rather than an average reader, (2) it is more appropriate for cost/benefit analysis, (3) it matches the commonly used visual display of pooled results.
Choice of design and statistical technique for an experiment depends upon whether statistical generalization to readers or patients is more fundamental, which in turn depends upon the nature of the experimental question. In studies of physical image characteristics, investigators often assume that there are no important differences among readers. Such studies are concerned with whether a physical manipulation influences image interpretability of a population of patients; thus the appropriate error term for testing differences is based on variation among patients. On the other hand, research designs in psychology tend to treat images within an experimental condition as a fixed factor with readers as a random variable. An experimental factor that might affect the reader’s perception of a radiograph, but does not change the radiograph, is a psychological factor. The basic problem in investigations of psychological variables is to study consistency of effect across readers so as to be able to generalize to new readers. The appropriate error term for testing differences related to psychological variables is based on reader variation. Because our research involves manipulation of the perception and/ or cognitive behavior of the reader, our application of maximum-likelihood and jackknife methods is designed to allow experimental results to be generalized to the population of radiologists from which the sample was selected.
Perceptual accuracy of the pooled ROC curve can be analyzed by the jackknife method. In previous investigations using a relatively small number of patients, conclusions derived from pseudovalues of the jackknife method agreed with conclusions derived from estimates of the maximum likelihood method (Berbaum et al. 1986; Berbaum, Franken, et al. 1988; Berbaum, El Khoury et al., 1988). When the maximum-likelihood method does not converge to a solution for individual-reader ROC curves, the most conservative approach is simply to exclude that reader’s data from further analysis. With smaller patient samples, conclusions from pseudovalues vs. maximum-likelihood estimates might diverge because of possible degenerate individual ROC data or strong statistical bias in maximum-likelihood estimates. Maximum-likelihood estimates may be biased in small samples, whereas the jackknife is a bias-reducing method of estimation.