Abstract:For improving the fault tolerance of distributed simulation systems and meeting the need for large-scale combat simulation under (wide area network) WAN. analyzed the concept of cloud computing node failure normalization, the distributed simulation fault tolerance network model based on multi-server was presented, focused on the error recovery strategy especially, including the lease-based client error recovery, the heartbeat-based data server error recovery, and the log-based master server error recovery. Designed and implemented a prototype fault-tolerant system of distributed simulation, results show that the system can improve the fault tolerance of distributed simulation system effectively. This research has a certain reference value to achieve the combination of cloud computing and high level architecture (HLA), and can improve the robustness of simulation system.