The figure compares the model’s accuracy as a function of the number of training points. The dashed horizontal line indicates the performance reported in ref. 24 using random forest (RF) with a dataset of 1,252 points and 10-fold cross-validation, that is, corresponding to a training set size of around 1,126 points. The dotted line shows the performance of a simple rule-based baseline ‘if present in the composition, classify as single phase, else multi-phase’. The yellow line we obtained using the Automatminer 25 , which uses as input the chemical composition. The Automatminer then returns the best featurization and model among those that are implemented using automated machine learning with genetic programming (as implemented in the TPOT package 61 ). We additionally tested a neural network, CrabNet (red line, default settings) 62 , that performs well using compositions as input. The blue line is the performance of our GPT-3 model (with error bands showing s.e.m.). This figure shows that we reach similar accuracy to the model of ref. 24 with as little as around 50 data points. In addition, we also investigated a separate training and test set, for which the learning curve is shown in green. In this case, we tested on only compounds for which we could not find an exact match with a Google search. The learning curves for other metrics can be found in Supplementary Note 6.13 .