Workflow from Scientific Research

Open access visualization of Workflow, Flowchart, MPS-Sampling, Protein Sequences, Genomes
CC-BY
0
Views
0
Likes
DOI

MPS-Sampling workflow: Input. MPS-Sampling uses families of homologous single-copy protein sequences as input. In this example, ten genomes (g A , g B , …, g J ) and four protein families (uL1, uL2, uL3, and uL4) are considered. Dashes indicate the absence of a sequence in a genome. Here, uL2 is missing in g B , g C , g I and g J , and uL4 is missing in g I . Step 1–1: Construction of Lin-clusters. For each protein family, sequence clusters, called Lin-clusters, are built using Linclust [ 17 ] of the MMseqs2 suite. Because Linclust is applied to each protein family, the clustering, and thus Lin-clusters, may differ from one protein family to another. Here, the uL1 sequence of g A is clustered with uL1 sequences of g E , g F and g G , while uL4 sequences of g A and g D are grouped together. Step 1–2: Labeling of protein sequences. For each protein family, protein sequences are labeled according to the Lin-cluster to which they belong. Sequences from the largest Lin-clusters are labeled first. For instance, uL1 sequences from the largest Lin-cluster (g A , g E , g F , and g G ) are labeled as 1, sequences from the second largest Lin-cluster (g B , g C , and g D ) as 2, while the third largest Lin-cluster (g H , g I , and g J ) is labeled as 3

Related Plots

Discover More Scientific Plots

Browse thousands of high-quality scientific visualizations from open-access research