Abstract:To address the issues of low training efficiency and long grasping time in Deep Reinforcement Learning (DRL) for six-axis robotic arm target grasping, an improved Soft Actor-Critic (SAC) control method is proposed. This method combines Prioritized Experience Replay (PER) with n-step Temporal Difference (n-step TD) prediction to enhance performance. After completing an action, the intelligent agent calculates the n-step accumulated discounted reward G_t^((n)) and the TD error δ_t^((n)), which are then stored in the experience replay buffer. Data is sampled from the experience replay buffer using a priority experience sampling method to update various networks. In the simulation environment, multiple target point scenarios were designed for comparative experiments involving a robotic arm grasping target points. Comparisons with the classic Soft Actor-Critic (SAC) algorithm and SAC+PER algorithm revealed that, under the same number of training episodes, the improved SAC algorithm consistently exhibited faster convergence speed and shorter grasping time.