Workflow of scCDC. The theoretical entropy-expression curves of endogenous RNAs are simulated and the divergence of observed and expected entropy are calculated. Genes with significant entropy divergence were selected in each cluster and the common genes were defined as GCGs. For contamination correction, the clusters of cells do or do not express endogenous GCGs were first defined (eGCG + and eGCG cells). Youden index-based method was used to correct the contaminated counts based on the count distribution of the eGCG + cluster with the lowest expression and the count distribution of all eGCG cells. The contamination ratio of a GCG is calculated based on the proportion of the GCGs total UMI count among all genes total UMI count in eGCG cells (details in Methods )