The processing pipeline of SeReNet with self-supervised training. Before network training, data are preprocessed using TW-Net and preDAO. TW-Net corrects sample motion (details in Supplementary Fig. 13a ), whereas preDAO estimates and corrects optical aberrations (details in Supplementary Fig. 14a ). Using the main modules of SeReNet, we first generated a focal stack with digital refocusing of multiple angular images with the depth-decomposition module, and then gradually transformed the stack into a volume with the deblurring and fusion module. Next, the 4D wave-optics PSFs were used to achieve forward projections of the 3D estimation. Finally, the loss between projections and raw measurement was iteratively reduced during training. The NLL-MPG loss was derived as the loss function (details in Supplementary Fig. 11a ). After the model is trained, SeReNet can make rapid predictions without the forward projection process. Four representative angular views are shown for simplicity.