To evaluate its performance in taxonomic profiling, MetaPhlAn 4 was applied to synthetic metagenomes representing host-associated communities from the CAMI 2 taxonomic profiling challenge 60 ( n = 128 samples) and the SynPhlAn-nonhuman dataset ( n = 5 samples), representing more diverse environments from previous evaluations 4 . Species-level evaluation using the OPAL framework 61 shows that MetaPhlAn 4 is more accurate than the available alternatives in both the detection of which taxa are present (the F1 score is the harmonic mean of the precision and recall of detection) and their quantitative estimation (the BC beta-diversity is computed between the estimated profiles and the abundances in the gold standard). Additional evaluations performed using genomes within the SGB organization (labeled ‘SGB evaluation’; see Methods ) show that MetaPhlAn 4 further improves accuracy at this more refined taxonomic level. See Supplementary Tables 5 and 7 for more details (GI, gastrointestinal; UT, urogenital tract). Box plots in a and b show the median (center), 25th/75th percentile (lower/upper hinges), 1.5 interquartile range (whiskers) and outliers (points).