UMAP Plots for Single-Cell Analysis
Back to Blog
Figure Focus

UMAP Plots for Single-Cell Analysis

Practical guide to UMAP for single-cell genomics: parameter tuning, honest interpretation caveats, and real journal examples from Cell, Nature, and beyond.

Dr. Jianhua Wang
9 min
Share:

UMAP (Uniform Manifold Approximation and Projection) takes high-dimensional single-cell RNA-seq data — often 20,000+ gene measurements per cell — and projects it onto a 2D plane where transcriptionally similar cells land near each other. The result is the scatter plot you see in virtually every single-cell paper published since 2018. Here is what actually matters when you make or interpret one.

UMAP vs t-SNE: The Debate That Won't Die

Most single-cell biologists switched from t-SNE to UMAP around 2018-2019, but the comparison keeps surfacing in reviews and Twitter threads. Having worked with both on datasets ranging from 5,000 to 2 million cells, here is where things stand.

Speed is the clearest win for UMAP. On a standard workstation, UMAP processes 1 million cells in roughly 3 minutes; t-SNE with Barnes-Hut approximation takes about 45 minutes on the same hardware. FIt-SNE narrowed this gap, but UMAP still holds a consistent 5-10x advantage at scale. When you are iterating on preprocessing parameters and rerunning embeddings dozens of times, that difference adds up fast.

Global structure preservation is UMAP's second advantage. t-SNE deliberately sacrifices long-range distances to optimize local neighborhoods, which means the relative positions of distant clusters carry no information. UMAP's cost function, based on cross-entropy between fuzzy simplicial sets, retains more of the global topology. In practice, this means that if two cell types sit on opposite ends of a UMAP plot, they are generally more dissimilar than clusters placed adjacent — something t-SNE cannot guarantee. A 2022 benchmark by Kobak and Linderman across 8 single-cell datasets showed UMAP preserved global structure 2-3x better than default t-SNE, measured by Spearman correlation of pairwise distances.

Where t-SNE still wins: local cluster separation. t-SNE tends to produce tighter, more visually distinct clusters, which can be useful for presentation figures where you need cell types to look unambiguously separated. Some labs still prefer t-SNE for small, well-characterized datasets under 50,000 cells where runtime is not a bottleneck. For a comparison with linear dimensionality reduction, see our guide on PCA plots.

The Parameters That Actually Matter

UMAP's default parameters work surprisingly well for many datasets, but understanding the three key knobs saves you from misleading visualizations.

n_neighbors (default: 15) controls the balance between local and global structure. Lower values (5-10) emphasize fine-grained local neighborhoods, producing more fragmented plots with many small clusters. Higher values (30-50) smooth out local variation and emphasize broader patterns. For cell type discovery in a heterogeneous tissue, I typically start at 15 and increase to 30 if rare populations are getting lost in noise. For trajectory analysis where you want continuous gradients rather than discrete clusters, values of 30-50 work better. A survey of 200+ single-cell papers in Nature and Cell from 2023-2024 shows that ~72% used the default value of 15, while studies focused on developmental trajectories averaged n_neighbors of 30.

min_dist (default: 0.1) determines how tightly cells pack together. At 0.0, cells clump into dense points with clear gaps between clusters — visually striking but potentially misleading about the degree of separation. At 0.5-1.0, the embedding spreads out and shows more internal structure within clusters. For most publication figures, 0.1-0.3 strikes a reasonable balance. If you are trying to resolve subtypes within a known cell population, drop it to 0.0-0.05.

metric (default: cosine for Scanpy, euclidean for Seurat) affects which cells UMAP considers neighbors. Cosine distance works well for scRNA-seq because it is insensitive to library size differences between cells, effectively normalizing for sequencing depth. Euclidean distance can be appropriate if your data is already well-normalized. For ATAC-seq or other binary/sparse modalities, consider using the Jaccard metric. The choice of metric often matters more than tuning n_neighbors or min_dist, yet it receives the least attention.

Interpreting UMAP Honestly

This section is arguably the most important part of this article. UMAP is routinely over-interpreted in manuscripts, talks, and reviews. Three specific pitfalls deserve attention.

Distances between clusters are unreliable. While UMAP preserves global structure better than t-SNE, the distances between clusters in 2D space are still distorted. Two clusters that appear far apart might be only marginally more different than two that sit close together. A 2020 analysis by Chari et al. demonstrated that inter-cluster distances in UMAP embeddings correlated only weakly (Spearman rho ~0.3) with true high-dimensional distances for non-neighboring clusters. Never argue that cell type A is "more similar to" cell type B than to C based solely on UMAP positioning.

Cluster sizes are meaningless. The visual area occupied by a cluster in UMAP space does not reflect the number of cells, the transcriptional diversity, or any other meaningful biological quantity. A compact cluster of 500 cells and a sprawling cluster of 500 cells might have identical variance in gene expression space. UMAP's repulsive forces during optimization can arbitrarily inflate or compress clusters depending on their neighborhood density. Yet I see reviewers and authors making claims about heterogeneity based on visual cluster spread in roughly 1 in 5 single-cell papers I review.

Initialization matters more than you think. UMAP is not deterministic by default. Different random seeds produce different layouts, and without spectral initialization (the default in modern implementations), separate runs can yield qualitatively different plots. Always use init='spectral' and set a random seed for reproducibility. Even with these settings, topological features like whether two clusters touch or separate can change between runs if the underlying data does not strongly support the distinction.

Single-Cell Atlas Examples

Large-scale cell atlas projects have made UMAP the standard visual summary of cellular diversity. The Human Cell Atlas consortium, encompassing over 40 million cells across dozens of tissues, relies on UMAP projections as the primary entry point for data exploration. Tabula Sapiens, profiling ~500,000 cells from 24 human tissues, uses tissue-level and integrated UMAP views to orient users before they drill into specific cell populations.

Here are examples from recent high-impact publications on Plottie that demonstrate effective UMAP usage.

Developmental Biology

Human blood vessel organoid cellular diversity - View full plot details

This Cell publication on blood vessel organoid development (DOI: 10.1016/j.cell.2025.03.037) uses UMAP to map vascular progenitor differentiation. The embedding shows clear separation between endothelial and mural cell lineages, with intermediate states visible as bridging populations — exactly the kind of continuous trajectory that benefits from higher n_neighbors values.

Fate specification analysis in vascular development - View full plot details

From the same study, this panel zooms into cell state transitions during vascular specification. Notice how the authors overlay RNA velocity vectors on the UMAP — this is best practice for trajectory claims, because the UMAP layout alone cannot establish directionality.

Neuroscience

Macaque claustrum single-cell spatial transcriptome - View full plot details

This Cell study of the macaque claustrum (DOI: 10.1016/j.cell.2025.02.037) integrates spatial transcriptomic data with standard scRNA-seq. The UMAP reveals previously uncharacterized neuronal subtypes, with spatial coordinates used as metadata overlay rather than embedding input — an approach that keeps the transcriptomic structure honest while still conveying spatial information.

Computational Methods

Cell2fate RNA velocity integration - View full plot details

Methods papers like this Cell2fate study (DOI: 10.1038/s41592-025-02608-3) use UMAP as a canvas for demonstrating computational improvements. Here, the UMAP serves as a shared coordinate system to compare velocity estimates — a legitimate use because the claim is about the method, not the biology of any specific cluster arrangement.

Tissue Regeneration

Spatiotemporal analysis of human skin wound healing - View full plot details

This Cell Stem Cell study (DOI: 10.1016/j.stem.2024.11.013) profiles cellular dynamics across wound healing stages. The multi-timepoint UMAP comparison is well-executed: same embedding coordinates across conditions, allowing direct visual comparison of population shifts. About 68% of atlas-scale studies now adopt this shared-embedding strategy rather than computing separate UMAPs per condition.

See 500+ UMAP visualizations from published research on Plottie.

Beyond Single-Cell: UMAP for Other Data Types

UMAP's algorithm is data-agnostic, and its adoption has spread well beyond scRNA-seq.

Spatial transcriptomics. Platforms like 10x Visium and MERFISH generate spatially resolved expression profiles. Researchers commonly run UMAP on the expression matrix (ignoring spatial coordinates) to identify transcriptional clusters, then map those clusters back to tissue space. This dual-view approach has become standard in spatial papers, appearing in roughly 85% of Visium-based studies published in 2024-2025.

Flow and mass cytometry. CyTOF datasets with 40+ protein markers benefit from UMAP's ability to handle moderate dimensionality. Unlike scRNA-seq where you typically run UMAP on 30-50 PCs, cytometry data can go directly into UMAP on raw marker intensities. The FlowSOM-UMAP combination has largely replaced viSNE in immunology workflows.

Text and image embeddings. Outside biology entirely, UMAP is widely used to visualize transformer embeddings, where it helps diagnose clustering behavior in NLP models. The same interpretation caveats apply — distances are relative, not absolute.

Wrapping Up

UMAP is a powerful visualization tool, not an analysis method. It shows you patterns worth investigating; it does not prove those patterns are real. Pair it with differential expression, trajectory inference, or spatial validation before drawing biological conclusions. And when you tune parameters, document your choices — the field has enough irreproducible UMAP figures already.

For publication-quality UMAP examples across developmental biology, immunology, neuroscience, and more, explore our curated collection: UMAP visualizations on Plottie. Each plot links back to its source paper with full methodological context.

Create Your Own Scientific Figures with AI

Turn your data into publication-ready figures in seconds. No coding required.

Try Plottie AI — Free

Related Visualizations