Summary of scores by criteria of coverage. Adequacy scores represent aggregated weighted scores across coverage, consistency and robustness criteria. Each violin represents the density of scores (y axis) within a criteria group and document type (x axis), with wider areas indicating higher density (that is, higher number of documents with that score). The violin plot is overlaid with a swarm plot, where individual data points represent a document and are separated randomly (jittered) to reduce overplotting and enhance visualization of data distribution. Scores for each criterion are scaled to 0–1 for comparison (Methods). Results from two-sided Wilcoxon rank-sum tests with continuity correction show significant differences in adequacy, coverage (P < 0.001), consistency and robustness (P < 0.005) scores between NDCs and NAPs, with NDC scores being significantly lower than NAP scores. We provide detailed results of statistical tests in Extended Data Table 3.