The states visited by different algorithms in Slow-down and ATSC-Grid. The distance between points represents the difference between states 84 . Due to the existence of the model, our approach outperforms other algorithms in exploration efficiency, thereby increasing sample efficiency. This augmentation is visually represented by the more even and expansive spatial coverage of the red data points within the depicted space.