Schematic summary of attMIL and the multi-input DL architecture: a WSI is tessellated into smaller tiles, that are subsequently pre-processed and passed through the encoder to give image feature vectors. In the multi-input case, each image feature vector is concatenated by a vector representing the patients clinical data. The set of image feature vectors per WSI is then used as input to the attMIL model. In a first embedding block, the attMIL model reduces the dimension of each tiles initial feature vector to 256 (from 2,048 [+4 if clinical data are used in the input] when using the Wang encoder). Then, the attention score per tile is calculated. Using the attention score, the attention-weighted sum over all embedded feature vectors can be evaluated to give a 256-dimensional vector representing the entire WSI (green). Finally, this vector is passed through a classification block to obtain a biomarker prediction for the input WSI.