Comparison of high-risk sensitivity (for basal cell carcinoma, melanoma and SCC/SCCIS) versus fairness gap with regard to sex in dermatology across different baselines. We report results in-distribution (left) and OOD (right) for OOD 1, as well as for the less skewed (top) and more skewed (bottom) setting. We marked the baseline Pretrained on JFT with black. Label conditioning and Label and property conditioning correspond to the models that used synthetic images sampled from a diffusion model conditioned on only the label, and the label and sensitive attribute, respectively. We further compared to other strong contenders, that is, a BiT-ResNet model pretrained on ImageNet-21K (Pretrained on IN-21K), a model pretrained on JFT using RandAugment heuristic augmentations (RandAugment), a model trained with RandAugment on top of standard ImageNet augmentations (RandAugment + IN Augms), a model trained on a resampled version of the training dataset that is more balanced with regard to the sensitive attribute (Oversampling) and a model trained with focal loss (Focal loss). To ensure a fair comparison, all methods were trained and finetuned for the same number of steps and with the same batch size. For the fairness gap, smaller values are preferable. There are n = 1,349 samples in the in-distribution dataset and n = 6,639 samples in the OOD dataset. Data are presented as the mean s.d. across five technical replicates.