We start by fine-tuning GPT-3 on a sample of the QMugs dataset and use this model to query for around 1,000 gaps from a normal distribution with shifted mean (mean 4.0 eV, s.d. 0.2 eV). We then iteratively select the high-gap samples of the generated molecules and fine-tune the model on these data (that is, starting from the second generation, the model is fine-tuned on molecules it itself generated). Smooth curves show kernel-density estimates; the plot is truncated at 10 eV, but the models also generate some molecules with larger HOMO–LUMO gaps. We chose a comparatively large number of evaluations for this figure to increase the clarity of the visualization. For the initialization, we evaluated 2,162 compounds using xTB, followed by 1,670, 250 and 1,572. If we limit the number of quantum chemistry evaluations to or lower than 100, we can still successfully shift the distribution, as shown in Supplementary Fig. 83 .