The tree map (TMAP) algorithm builds a nearest-neighbour graph, which is then embedded in two dimensions. Therefore, similar molecules are connected with an edge. We colour the points depending on whether they are part of the original dataset of ref. 43 but not generated (green) or part of the dataset and generated by our model (orange). Our models can also generate molecules that have not been part of the photoswitch dataset (note that the model was only trained on 92 molecules from this database). In some cases, those molecules have been reported before and are part of the PubChem database (blue) or are not part of the PubChem database (pink). From this figure, we see that the generated molecules sometimes substitutions for molecules in the dataset. In other cases, newly generated molecules introduce a completely new scaffold. For this visualization, we used the TMAP 65 algorithm on photoswitch molecules described using MinHash fingerprint with 2,048 permutations 66 .