Exploring Different Annotation Schemes for Single and Consecutive Named Entity Recognition in the Arabic Biomedical Domain using Transformer Models and Contextual Semantic Embeddings
Received: 24 December 2024 | Revised: 17 February 2025 | Accepted: 21 February 2025 | Online: 3 April 2025
Corresponding author: Ismail Ait Talghalit
Abstract
Named Entity Recognition (NER) is an important task for Natural Language Processing (NLP) in the Arabic biomedical field. However, most works on NER in the Arabic biomedical domain suffer from some limitations, such as the inability to capture the context and semantics within texts. Moreover, only a few research studies have efficiently handled biomedical consecutive entities in the Arabic language. To overcome these limitations, this study proposes an efficient method to build contextual models for biomedical NER tasks that capture context and semantics in Arabic text using transformer models and semantic embeddings. The extracted embeddings are combined with machine learning methods, including SVM, Decision Tree (DT), and AdaBoost, to identify both single and consecutive named entities accurately. Furthermore, the effect of seven annotation schemes, namely IO, IOB, IE, IOE, BI, BIES, and IOBES, was studied to determine the most suitable for Arabic biomedical NER. The experimental results showed that the BERT and AraBERT models when fine-tuned for the Arabic biomedical NER outperform well-known machine learning methods in terms of accuracy and F1 score. The findings across various annotation schemes highlight the effectiveness of the IO scheme for simple (single) entities, while IOBES and BIES annotation schemes are better suited for recognizing multi-token entities.
Keywords:
transformer models, deep learning, contextual embeddings, named entity recognition, natural language processing, Arabic biomedical domain, AraBERT, BERTDownloads
References
D. Mollá, M. van Zaanen, and D. Smith, "Named entity recognition for question answering: Australasian Language Technology Association Workshop," in Proceedings of the 2006 Australasian language technology workshop, 2006, pp. 51–58.
M. E. Khademi and M. Fakhredanesh, "Persian Automatic Text Summarization Based on Named Entity Recognition," Iranian Journal of Science and Technology, Transactions of Electrical Engineering, Jul. 2020.
R. K. Srihari and E. Peterson, "Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents," in Digital Libraries: Universal and Ubiquitous Access to Information, 2008, pp. 404–405.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality," in Advances in Neural Information Processing Systems, 2013, vol. 26.
V. A. Mozharova and N. V. Loukachevitch, "Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian," in Analysis of Images, Social Networks and Texts, 2017, pp. 185–195.
M. Konkol and M. Konopík, "Segment Representations in Named Entity Recognition," in Text, Speech, and Dialogue, 2015, pp. 61–70.
I. Demiros, S. Boutsis, V. Giouli, M. Liakata, H. Papageorgiou, and S. Piperidis, "Named Entity Recognition in Greek Texts.," in LREC, 2000.
N. Alshammari and S. Alanazi, "The impact of using different annotation schemes on named entity recognition," Egyptian Informatics Journal, vol. 22, no. 3, pp. 295–302, Sep. 2021.
I. Belhajem, "Effects of Multiple Annotation Schemes on Arabic Named Entity Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17060–17067, Oct. 2024.
A. Vaswani et al., "Attention is All you Need," in Advances in Neural Information Processing Systems, 2017, vol. 30.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, Minneapolis, MN, USA, Mar. 2019, pp. 4171–4186.
W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding." arXiv, Mar. 07, 2021.
F. El-Alami, S. Ouatik El Alaoui, and N. En Nahnahi, "Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 10, Part A, pp. 8422–8428, Nov. 2022.
N. Loukachevitch et al., "NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities," Bioinformatics, vol. 39, no. 4, Apr. 2023.
L. Luo, C. H. Wei, P. T. Lai, R. Leaman, Q. Chen, and Z. Lu, "AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning," Bioinformatics, vol. 39, no. 5, May 2023.
J. Hou, S. Saad, and N. Omar, "Enhancing traditional Chinese medical named entity recognition with Dyn-Att Net: a dynamic attention approach," PeerJ Computer Science, vol. 10, May 2024, Art. no. e2022.
C. Tang et al., "BioMNER: A Dataset for Biomedical Method Entity Recognition." arXiv, Jun. 28, 2024.
C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273–297, Sep. 1995.
Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, Aug. 1997.
L. Breiman, J. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. New York, NY, USA: Chapman and Hall/CRC, 2017.
N. Alshammari and S. Alanazi, "An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes," Data, vol. 5, no. 3, Sep. 2020, Art. no. 60.
E. F. T. K. Sang and S. Buchholz, "Introduction to the CoNLL-2000 Shared Task: Chunking." arXiv, Sep. 18, 2000.
M. Konkol and M. Konopík, "Segment Representations in Named Entity Recognition," in Text, Speech, and Dialogue, 2015, pp. 61–70.
H. C. Cho, N. Okazaki, M. Miwa, and J. Tsujii, "Named entity recognition with multiple segment representations," Information Processing & Management, vol. 49, no. 4, pp. 954–965, Jul. 2013.
Downloads
How to Cite
License
Copyright (c) 2025 Ismail Ait Talghalit, Hamza Alami, Said Ouatik El Alaoui

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.