The performance of different methods on each dataset. Red and blue grids correspond to better and worse performance, respectively. Six metrics on three evaluation tasks are shown in the columns, namely median absolute imputation error ‘median’, mean absolute imputation error ‘mean’, ARI score ‘ARI’, silhouette score based on the ground truth ‘sil(g)’, silhouette score based on SC3 clustering results ‘sil(s)’, and silhouette score based on marker genes ‘marker’. Scores in each column were normalized by subtracting the baseline (data before imputation) score, and then dividing by the difference between the maximum and the minimum score. Opposite scores were used for the ‘median’ and ‘mean’, as lower imputation errors indicate better performance. The methods were categorized as ‘Statistical’ or ‘Non-statistical’, according to their principles.