MIDAS assumes that each cell’s measured counts and batch ID are generated from biological state and technical noise latent variables and uses the VAE to implement model learning and latent variable inference. Self-supervised learning is used to align different modalities on latent space through joint posterior regularization, and information-theoretic approaches help disentangle the latent variables.