图书情报知识 ›› 2024, Vol. 41 ›› Issue (6): 141-154,165.doi: 10.13366/j.dik.2024.06.141

• 情报、信息与共享 • 上一篇    下一篇

健康信息画像构建及虚假健康信息识别: 融合社会感知数据与发布者先验知识

赵又霖1,2 ,庞航远1 ,石燕青3   

  1. 1.河海大学商学院,南京,211100;
    2.南京大学信息管理学院,南京,210023;
    3.南京农业大学信息管理学院,南京,210095
  • 出版日期:2024-11-10 发布日期:2025-01-04
  • 作者简介:庞航远(ORCID: 0009-0005-7169-7807),硕士研究生,研究方向:数据分析与挖掘、知识组织研究,Email: hangyuanpang@hhu.edu.cn;石燕青(ORCID: 0000-0002-1091-7926),博士,副教授,研究方向:复杂网络、社交媒体数据挖掘,Email: yqs4869@njau.edu.cn。
  • 基金资助:
    本文系中国博士后科学基金特别资助项目“面向应急管理的时空数据语义模型构建及创新应用机理研究”(2021T140311)和中国博士后科学基金面上项目“环境污染突发事件的时空数据挖掘及协同治理机制研究”(2019M650108)的研究成果之一。

Construction of Health Information Portrait and the Identification of False Health Information by Integrating Social Sensing Data with Publisher's Prior Knowledge

ZHAO Youlin1,2 ,PANG Hangyuan1 ,SHI Yanqing3   

  1. 1.Business School, Hohai University, Nanjing, 211100;
    2.School of Information Management, Nanjing University, Nanjing, 210023;
    3.School of Information Management, Nanjing Agricultural University, Nanjing, 210095
  • Online:2024-11-10 Published:2025-01-04
  • Supported by:
    This is an outcome of the Special Funded Project "Research on the Construction of a Spatiotemporal Data Semantic Model for Emergency Management and the Mechanisms of Innovative Applications"(2021T140311), and the project "Research on Spatiotemporal Data Mining and Collaborative Governance Mechanisms for Environmental Pollution Incidents"(2019M650108), both supported by China Postdoctoral Science Foundation.

摘要: [目的/意义] 融合包含丰富个体情感、行为和交互信息的社会感知数据和发布者先验知识有助于提高虚假健康信息识别精度。[研究设计/方法] 基于社会感知数据,综合历史信息文本描述发布者对待检测信息的先验知识,并融合发布者先验知识,从发布者特征、内容特征和接收者行为特征3个维度提取健康信息特征;同时,建立健康信息画像,并基于Stacking集成学习模型构建虚假健康信息识别模型FHIR_SSD&PPK。[结论/发现] FHIR_SSD&PPK模型识别虚假健康信息的效果最好,准确率为92.35%;发布者特征的特征重要度占比总和最高,为51.59%,其中发布者先验知识特征的特征重要度为44.01%,并且与未考虑发布者先验知识的模型相比,F1值提升2.26%,说明本文提出的发布者先验知识是构建识别模型的关键特征。[创新/价值] FHIR_SSD&PPK模型融合社会感知数据和发布者先验知识,基于Stacking集成学习模型识别虚假健康信息,在细粒度和深度上对虚假健康信息识别研究进行了优化。

关键词: 健康信息画像, 虚假健康信息, 社会感知数据, 先验知识, Stacking集成学习

Abstract: [Purpose/Significance] The objective of this study is to explore how to integrate social sensing data ,containing rich individual emotion, behavior, and interaction information, with publisher's prior knowledge to enhance the accuracy of false health information recognition. [Design/Methodology] Based on the social sensing data and historical information text, this paper describes the prior knowledge of publishers about detection information. By integrating the publisher's prior knowledge, the study extracts health information features from three dimensions: publisher features, content features, and receiver behavior features. Concurrently, health information portraits are established and the Stacking ensemble learning models is used to build a False Health Information Recognition Model(FHIR_SSD&PPK), a false health information recognition model that integrates social sensing data and publisher's prior knowledge. [Findings/Conclusion] FHIR_SSD&PPK model has the best effect in identifying false health information, with an accuracy of 92.35%. The total feature importance weight of the publisher features accounts for the highest proportion, at 51.59%, among which the feature importance weight of the publisher's prior knowledge features is 44.01%, and the F-Measure increases by 2.26% compared to the model without considering the publisher's prior knowledge, indicating that the publisher's prior knowledge proposed in this article is a key feature for building an identification model. [Originality/Value] The FHIR_SSD&PPK model integrates social sensing data and publisher's prior knowledge, identifies false health information based on the Stacking ensemble learning model, optimizing the research in fine granularity and depth.

Keywords: Health information portrait, False health information, Social sensing data, Prior knowledge, Stacking ensemble learning