半月刊

ISSN 1000-1026

CN 32-1180/TP

+高级检索 English
基于文本特征增强的电力命名实体识别
作者:
作者单位:

1.南瑞集团有限公司(国网电力科学研究院有限公司),江苏省南京市 211106;2.江苏瑞中数据股份有限公司,江苏省南京市 211106

摘要:

针对电力领域语料规模小、实体嵌套、实体缩写等特点,提出基于文本特征增强的实体识别方法。首先,通过预设词库和低粒度分词的方式,在合理利用中文单词蕴含的语义信息的同时,降低分词传递误差的影响。其次,设计词级双向门控循环单元学习中文单词构造特征,融合词性和词长特征后,与单词向量拼接成为单词增强向量。然后,基于双向门控循环单元-注意力机制-条件随机场完成实体识别模型的构建和训练。在此基础上,采用电力领域语料库进行验证,F1分数为87.02%,证实了电力命名实体识别效果。

关键词:

基金项目:

国家重点研发计划资助项目(2017YFB1001800)。

通信作者:

作者简介:

刘文松(1983—),男,博士,高级工程师,主要研究方向:信息融合、人工智能技术研发及应用。E-mail:liuwensong1@sgepri.sgcc.com.cn
胡竹青(1981—),男,硕士,高级工程师,主要研究方向:系统架构设计及电力数字化。E-mail:huzhuqing@sgepri.sgcc.com.cn
张锦辉(1994—),女,硕士,工程师,主要研究方向:自然语言处理、人工智能技术应用。E-mail:zhangjinhui@sgepri.sgcc.com.cn
俞俊(1978—),男,通信作者,硕士,研究员级高级工程师,主要研究方向:数字化及智能化架构设计及应用。E-mail:yujun@sgepri.sgcc.com.cn


Named Entity Recognition for Electric Power Industry Based on Enhanced Text Features
Author:
Affiliation:

1.NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China;2.China Realtime Database Co., Ltd., Nanjing 211106, China

Abstract:

Considering the characteristics of small scale, nested entities and abbreviated entities for electric corpus, the named entity recognition (NER) method based on enhanced text features is proposed. Firstly, by the way of the low-grain word segment and the preset dictionary, the semantic information in Chinese words is properly utilized, and the transmission errors caused by word segment are decreased. Secondly, the structure features of a single Chinese word are learned by the word-level bidirectional gated recurrent unit (Word BiGRU). Together with the features of the part of speech for words and word length, the enhanced word vector is built by concatenating these feature vectors with word vectors. Finally, the NER model is designed with BiGRU, attention mechanism and conditional random field (CRF). The proposed method is verified by using electric corpus and F1 score reaches 87.02%, which proves the effectiveness of NER for electric power industry.

Keywords:

Foundation:
This work is supported by National Key R&D Program of China (No. 2017YFB1001800).
引用本文
[1]刘文松,胡竹青,张锦辉,等.基于文本特征增强的电力命名实体识别[J].电力系统自动化,2022,46(21):134-142. DOI:10.7500/AEPS20210323003.
LIU Wensong, HU Zhuqing, ZHANG Jinhui, et al. Named Entity Recognition for Electric Power Industry Based on Enhanced Text Features[J]. Automation of Electric Power Systems, 2022, 46(21):134-142. DOI:10.7500/AEPS20210323003.
复制
支撑数据及附录
分享
历史
  • 收稿日期:2021-03-23
  • 最后修改日期:2021-05-13
  • 录用日期:
  • 在线发布日期: 2022-11-01
  • 出版日期: