We trained logistic regression models to predict the presence of specific virus-like sequences based on host gene expression at single-cell resolution. The average accuracy of the logistic regression model trained on all macaque genes with donor animal and EVD time point as covariates is shown for the known virus EBOV (u10) and five novel virus IDs. The presence of virus-like sequences that displayed high cell type specificity could be predicted with >70% accuracy, whereas virus-like sequences with low cell type specificity could not be predicted above random chance (50%, marked by the red dashed line). As a negative control, viral presence and absence labels were scrambled at random in the training data, causing the prediction accuracy to drop to random chance (50%), as expected. Error bars indicate the standard deviation between models initialized with six different random seeds. The bottom barplots show the number of testing and training cells for each virus (also see Extended Data Fig. 9c ).