Integrating Refined Clustering and Deep Neural Networks for Biomedical Named Entity Recognition in Textual Data
Keywords:
Biomedical Named Entity Recognition (Bio-NER), Deep Learning, Cluster Analysis, BiLSTM-CRF, Attention Mechanisms, Natural Language Processing, Information ExtractionAbstract
The accurate extraction of named entities from the vast and ever-growing volume of biomedical literature is fundamental for accelerating research and discovery in life sciences. However, the unique characteristics of biomedical texts, including highly specialized terminology, widespread use of synonyms, and complex entity structures, pose significant challenges for traditional Named Entity Recognition (NER) systems. This study introduces an innovative methodology that combines a sophisticated, enhanced cluster merging strategy with a robust deep neural network architecture to improve the identification of biomedical entity names from text corpora. Our approach first employs a novel cluster refinement process to semantically link and consolidate fragmented or varied mentions of the same biomedical entity throughout the corpus. The information derived from these refined clusters is then integrated as a rich, auxiliary feature into a Bidirectional Long Short-Term Memory (BiLSTM) network, further enhanced by an attention mechanism and topped with a Conditional Random Field (CRF) layer. Experimental validation on the widely recognized GENIA corpus demonstrates that this integrated framework achieves superior performance compared to existing state-of-the-art Bio-NER methods. The synergy between context-aware clustering and powerful deep learning capabilities offers a robust and effective solution for navigating the intricacies of biomedical text, ultimately facilitating more precise and comprehensive information extraction for biological and clinical applications.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their articles published in this journal. All articles are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly cited.