Regression performance comparison between the best natural gradient boosting decision tree (nGBDT) and the best random forest (RF) model. We used scatterplots to visualize the true log reduction values (x axis) and corresponding predicted log reduction values (y axis) generated by nGBDT (left) and RF (right). The black line in each plot denotes the 45 line where x axis values = y axis values. Predictions at all pH (3, 7, and 10) conditions, which are referred to as pan pH, are shown. Pearson correlation and R squared between predicted and true log reduction values were calculated. In addition, nGBDT can quantify the prediction uncertainty (i.e., standard deviation of the prediction, indicated by the error bars) and we also calculated the correlation between uncertainty and prediction error (see alsoFigures S3S5).