Relative performance of MIGs and non-MIGs with respect to clustering gene expression data from different stages of PCa progression. Gene expression data were taken from GTEx, TCGA, and SU2C, respectively, and these datasets represent increasing stages of tumor progression (from healthy tissue in GTEx to advanced prostate tumors in SU2C). Clustering was performed using an n_cluster value of K = 3. Each box in this plot corresponds to a 1,000-length simulation in which non-MIGs are randomly sampled from among all non-MIGs in the genome. This plot exhibits a negative relationship between the relative abundance of non-MIGs and the quality of clustering, as quantified using the silhouette coefficient. The p value is based on a two-sided t test in which each sample value is taken to be the difference between the silhouette coefficient of one of the 1,000 100% non-MIG samples and the corresponding silhouette coefficient when only MIGs are used for clustering (i.e., the silhouette coefficient corresponding to 0% non-MIGs).