For each sample size, 500 datasets were simulated, and for each dataset, a model was developed using the same method. Predictive performance was then tested in a large external dataset. Long horizontal lines represent performance values that would be observed had the model been developed in an extremely large sample (with almost zero epistemic uncertainty). Horizontal spread within each sample size grouping reflects random jitter for visualisation. The c-statistic is not shown for heuristic or bootstrap shrinkage, as these methods do not change the ranking of predictions compared with that using maximum likelihood estimation, and thus the c-statistic is the same. R2is the Nagelkerke R2value (which denotes the proportion of explained variation), and CITL is calibration-in-the-large. Reproduced from Riley et al,20with permission under a CC-BY license.