Abstract:Aiming at the problem of privacy disclosure in the clustering process of traditional K-means clustering algorithm and the publicity of clustering results, an improved K-means algorithm with differential privacy protection was proposed. On the basis of the original K-means, density measurement is introduced to improve the in-class similarity of clusters and ensure that the selected centers are in relatively dense areas. The distance measure is introduced to reduce the similarity between clusters and ensure the high repulsion of different cluster centers. The average maximum similarity between classes is introduced, and the optimal number of clusters K and the optimal initial intra-class center are dynamically programmed. Privacy protection Laplacian noise is introduced to protect information security. Experimental results show that this algorithm has higher cluster availability and data reliability than traditional algorithms.