MPS-Sampling workflow: Final clustering of the genomes. Step 41: Computation of the similarity matrix. The similarity between each pair of Lin-combinations is computed and stored in a square matrix, called the similarity matrix (Additional file 9). The similarity is expressed by the Dice index [ 24 ] , which corresponds to the proportion of Lin-clusters shared by two Lin-combinations (missing values are omitted) (Additional file 8). Here, Comb6 and Comb7 share two Lin-clusters out of five (step 4-1, Fig. 3 ), so their Dice index in the matrix is 2/5. Step 42: Construction of MPS-clusters. The Lin-combinations (and the corresponding genomes) are clustered into MPS-clusters according to a hierarchical method with complete-linkage down to a minimum similarity . In the example, five MPS-clusters are built using minimum similarity = 0.5. A first MPS-cluster gathers two Lin-combinations: Comb1 (g F , g E , g G ) and Comb2 (g A ), corresponding to four genomes (g F , g E , g G , and g A ). A second MPS-cluster gathers two Lin-combinations: Comb5 (g H ) and Comb6 (g I ), corresponding to the two genomes (g H and g I ). A third MPS-cluster encompasses only one Lin-combination Comb3 (g B and g C ), corresponding to two genomes (g B and g C ). The two last genomes corresponding to Comb4 (g D ) and Comb7 (g J ) are isolated and correspond to singleton MPS-clusters. Step 5: Selection of MPS-representatives. One MPS-representative genome is selected per MPS-cluster. These MPS-representatives are selected according to rigorous priority rules (Additional file 10). Here, g G , g C , g D , g H , and g J are selected, each representing one MPS-cluster. Output: MPS-Sampling returns the list of the MPS-representative genomes, as well as the correspondance between each input genome and its MPS-representative genome