Baseline using related subjects as prediction. For each subject s a , we select an age- and sex-matched individual s b of a given relatedness (unrelated, sibling, dizygotic (DZ) or monozygotic (MZ) twin, or self-retest) and consider the measured connectome from s b as the prediction for s a . For each relatedness level, we compare the set of pseudo-predictions to measured connectomes using each performance metric, as though they were predictions from our model. Thus, we can observe the reliability and utility of each flavor and performance metric, and provide multiple baselines for evaluating model performance. We also show performance using the population mean for each flavor (‘pop. mean’) as pseudo-prediction, which for the avgcorr metric (bottom) exceeds self-retest, but for all other metrics is zero or chance.