Abstract:An optimization method based on the Actor-Critic algorithm is proposed to minimize the total task completion time and total flight distance in drone swarm task allocation. The Actor network in the Actor-Critic algorithm generates task allocation strategies based on the current state, while the Critic network evaluates the value of the strategies generated by the Actor network. By employing multi-step temporal difference errors and combining rewards from multiple time steps to update the strategy, the method improves learning efficiency and reduces delayed rewards. Comparative simulations are conducted across various task scenarios. The simulation results show that the proposed method significantly reduces task completion time and flight distance, validating its effectiveness in task allocation problems.