Training is accomplished by simulating data from randomly chosen narrow priors, ${p}_{\widetilde{{{\mathcal{M}}}}}({\mathcal{M}})$ , each parametrized by a reference chirp mass, $\widetilde{{\mathcal{M}}}$ , on which the network is also conditioned. Prior conditioning enables prior-specific heterodyning based on $\widetilde{{\mathcal{M}}}$ , followed by multibanding compression (Fig. 4a ), effectively simplifying the data distribution that the model must learn and reducing its dimensionality.