Workflow from Scientific Research

Citation
A 650-million-parameter (unsupervised) deep learning model was formed from 250 million protein sequences (developed by MetaAI scientists). This framework was used to predict the totality of ~450 million potential missense variant effects (i.e., a single-nucleotide change results in the substitution of one amino acid for another in the protein produced by a gene) screening through >40,000 protein structures in the full human genome. During model training, random locations across the genome are blinded to the model and the model is trained to recover these left-out amino acids. Such modeling tools implicitly extract and represent how one-dimensional amino acid sequences lead to two-dimensional and three-dimensional features of the protein structure and function, including ligand-receptor binding sites. Such protein language models are capable of providing high-quality predictions of any amino acid sequence as well as different kinds of coding variants. Reproduced with permission from Brandes et al.29
Related Plots
Browse by Category
Popular Collections
Related Tags
Discover More Scientific Plots
Browse thousands of high-quality scientific visualizations from open-access research