Heatmap of the Spearman correlation coefficients between the measured indel frequencies and the predicted efficiencies across twelve models (columns) for various datasets (rows). Abbreviated information about the datasets, including their years of publication, cell types, and numbers of included gRNAs, is provided in the row labels. The column labels represent the names of the computational models included in this benchmark. The Spearman correlation coefficients between the reported or measured indel frequencies and the predicted efficiencies are color-coded. The gray blocks in the heatmap indicate that the model was evaluated against its own training datasets. The previously reported models included DeepSpCas9 13 , DeepWt_U6 14 , DeepWt_T7 14 , DeepCRISPR 17 , the Doench score 6 , the Wang score 2 , the Xu score 16 , ChariRank 5 , the Doench score 18 , the wuCrispr/Wong score 20 , and CRISPR scan 1 . The statistical significance determined by Steiger’s test is shown between the two best models for each dataset (* P < 0.05, ** P < 1e-10; n.s. represents not significant).