Box Plot from Scientific Research

Citation
We varied the ID model selection criteria and compared the selected model against the oracle that chooses the model that is most fair OOD. We plotted the increase in OOD fairness gap of the selected model over the oracle, averaged across 42 combinations of OOD dataset, task and attribute. We used non-parametric bootstrap sampling ( n = 1,000) to define the bootstrap distribution for the metric. We found that selection criteria based on choosing models with minimum attribute encoding achieve better OOD fairness than naively selecting based on ID fairness or other aggregate performance metrics ('Minimum Attribute Prediction Accuracy' versus 'Minimum Fairness Gap': P = 9.60 x 10 -94 , one-tailed Wilcoxon rank-sum test; 'Minimum Attribute Prediction AUROC' versus 'Minimum Fairness Gap': P = 1.95 x 10 -12 , one-tailed Wilcoxon rank-sum test).
Related Plots
Browse by Category
Popular Collections
Related Tags
Discover More Scientific Plots
Browse thousands of high-quality scientific visualizations from open-access research