Mean hit ratio @ k of the trained XGBoost model in a LOGOCV at decreasing thresholds for K-locus identity (blue-green curves) and of an informed microbiologist approach (red). At the 100% threshold for grouping, identical K-locus sequences are grouped together, either in the training set or test set.