When controlling traffic lights on a traffic system, obtaining information from many other traffic lights to make decisions for many other traffic lights can lead to power loss with each communication step, thereby reducing the endurance of the traffic lights. In addition, frequent and large-scale communication operations increase the probability of signal interference and raise computational complexity, which are drawbacks of extensive communication operations. Our approach relies on local communication between topologically adjacent agents to reduce communication costs, power consumption and computational complexity. Each agent receives the state observation from its neighbours and aggregates them with its state and action to obtain the final decision-dependent information. s 0 and a 0 represent the state and action, respectively, of the agent itself; s 1 s 4 represent the states of four adjacent agents and s 0 * represents the aggregated information state of the agent. Furthermore, our framework maps the real system to a model, allowing it to learn policies through safe and efficient interactions between agents and the model. On the right, we describe the relationships between agents and neighbours of different orders in our methodology, where the order corresponds to , and a larger implies a broader communication range. In our method, each agent accurately estimates the global value and policy gradient solely through its neighbours' information, aiding policy learning. Another advantage of our solution is its ability to handle heterogeneous agents and systems of multiple types. In our experiments, CACC represents a linear-type system, Flow represents a ring-type system, and ATSC, Power Grid, Real Power-Net and Pandemic Networks represent grid-type systems. The agents in CACC, Flow and ATSC-Grid are homogeneous, while agents in ATSC-Monaco, ATSC-New York, Real Power-Net and Pandemic Networks are heterogeneous. In addition, in the non-adjacent setting, some disconnected agents existed in systems.