Comparison of gene clustering for P. lacertae and Blastocystis. Predicted gene sets for P. lacertae (P; n = 26,100) and Blastocystis ST7 (B; n = 6,020) were each clustered using OrthoMCL. The pie charts show the proportion of genes falling into six categories. Gene clusters that were present in both B and P, as well as stramenopile outgroups (OGs), were categorized as conserved, as were clusters present in B or P (as appropriate) and OG. Clusters found in both B and P but not OG, as well as species-specific clusters and unclustered genes (assumed also to be species-specific), are also shown. These five categories cover all genes found in the genome. The sixth category, sister-lineage loss, is shown in the same pie chart to emphasize the scale of gene loss relative to contemporary genome size. This category includes those genes assumed to be lost from the P. lacertae or Blastocystis ST7 genome since their lineage separation. For example, the P. lacertae genome contains 3,161 genes that are conserved in other stramenopiles but absent from Blastocystis, and so assumed to have been lost from Blastocystis after separating from P. lacertae. When combined with the contemporary Blastocystis gene set, these losses are 36% of all genes, and 51% when Blastocystis-specific genes are excluded. Five KEGG orthology (KO) terms that are significantly enriched among conserved genes (right) or sister-lineage losses (left) in each organism are tabulated besides the pie charts. For each KO term, a hypergeometric test assesses the significance of the difference between the observed (O) and expected (E) incidences, with a p value adjusted for multiple tests using Bonferroni correction. Terms that are over-represented relative to their genomic frequency are shaded red, while under-represented terms are shaded blue.