COCA as a general augmenter for multi-animal patching according to a little manually labelled data. Behavioural video streams are separated into backgrounds (top left), trajectories (medium left) and manually labelled masks (bottom left). Self-training instance segmentation model is used to predict more unlabelled masks from manually labelled masks. They are then combined with backgrounds and trajectories to generate new scenarios of two free-moving mice.