一种基于改进加权LDA模型的敏感词识别模型
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Sensitive Word Recognition Model Based on Improved Weighted LDA Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对目前互联网中主题识别时存在数据复杂、预测精度低的缺陷,提出一种基于改进加权潜在狄利克雷分配(latent Dirichlet allocation,LDA)模型的敏感词识别模型。建立特定领域敏感词语料库;为提高敏感信息主题的识别效率,对语料库进行粗粒度文本分类;通过加权模型,提高共现频率低但敏感特征明显的词的分布权重,从而可以发现更多具有低频隐式关系的词;以主流新闻网站爬取的数据为例,对所提模型进行验证。结果表明:该模型可识别和提取每个类别的文本更详细的敏感信息主题,该模型有效且准确。

    Abstract:

    In view of the defects of complex data and low prediction accuracy in the current Internet topic recognition, this paper proposes a sensitive word recognition model based on an improved weighted latent Dirichlet allocation (LDA) model. A corpus of sensitive words in a specific field is established; in order to improve the identification efficiency of sensitive information topics, a coarse-grained text classification is proposed for the corpus; a weighting model is proposed, and more words with low-frequency implicit relations can be found by increasing the distribution weight of words with low co-occurrence frequency but obvious sensitive characteristics; Taking the data crawled by mainstream news websites as an example, the proposed model is verified. The results show that the proposed model can identify and extract more detailed sensitive information topics from each text category, The simulation results further verify the effectiveness and accuracy of the proposed model.

    参考文献
    相似文献
    引证文献
引用本文

曾 玲.一种基于改进加权LDA模型的敏感词识别模型[J].,2025,44(06).

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-10
  • 最后修改日期:2024-09-17
  • 录用日期:
  • 在线发布日期: 2025-07-04
  • 出版日期:
文章二维码