Illustration of the desired performance of each agent on its learning tasks (left) and all tasks (right). Agent 1 (A1, top) starts on task 1 (T1) and later continues learning on T2 from the performance level that was previously reached by A2 thanks to a knowledge transfer (KT) operation. Furthermore, when A1 engages with T3, it obtains an already optimized policy from A2, and hence immediately maximizes the performance on that task. A2 and A3 engage in knowledge exchange (KE) when learning T3 at the same time, leading to faster, synergistic learning. The right plots show the performance on all tasks: agents are able to retain knowledge (LL) while both learning from their data and acquiring knowledge from other agents.