Precision-recall curve for images in each cohort. The images in each cohort were sorted based on their concept presence scores for redness and then compared with their malignancy labels. Precision was defined as the proportion of malignant images above a certain threshold out of all images above that threshold, whereas recall rate was defined as the proportion of malignant images above the threshold out of all malignant images. The top 500 and top 1,000 red images from Hospital Barcelona still contain more malignant than benign samples. The figures have been adapted from refs. 11 , 12 with permission and from ref. 16 under CC-BY license.