Exploring the Impact of Annotation Schemes on Arabic Named Entity Recognition across General and Specific Domains

Taoufiq El Moussaoui; Chakir Loqman; Jaouad Boumhidi

doi:10.48084/etasr.10205

Authors

Taoufiq El Moussaoui LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Chakir Loqman LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Jaouad Boumhidi LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco

Volume: 15 | Issue: 2 | Pages: 21918-21924 | April 2025 | https://doi.org/10.48084/etasr.10205

Received: 11 January 2025 | Revised: 11 February 2025, 15 February 2025, and 18 February 2025 | Accepted: 21 February 2025 | Online: 14 March 2025

Corresponding author: Taoufiq El Moussaoui

Abstract

Named Entity Recognition (NER) is a fundamental task in natural language processing (NLP) that involves identifying and classifying entities into predefined categories. Despite its importance, the impact of annotation schemes and their interaction with domain types on NER performance, particularly for Arabic, remains underexplored. This study examines the influence of seven annotation schemes (IO, BIO, IOE, BIOES, BI, IE, and BIES) on arabic NER performance using the general-domain ANERCorp dataset and a domain-specific Moroccan legal corpus. Three models were evaluated: Logistic Regression (LR), Conditional Random Fields (CRF), and the transformer-based Arabic Bidirectional Encoder Representations from Transformers (AraBERT) model. Results show that the impact of annotation schemes on performance is independent of domain type. Traditional Machine Learning (ML) models such as LR and CRF perform best with simpler annotation schemes like IO due to their computational efficiency and balanced precision-recall metrics. On the other hand, AraBERT excels with more complex schemes (BIOES, BIES), achieving superior performance in tasks requiring nuanced contextual understanding and intricate entity relationships, though at the cost of higher computational demands and execution time. These findings underscore the trade-offs between annotation scheme complexity and computational requirements, offering valuable insights for designing NER systems tailored to both general and domain-specific Arabic NLP applications.

Keywords:

Arabic named entity recognition, annotation schemes, general-domain NER, domain-specific NER, AraBERT

Downloads

Download data is not yet available.

References

T. E. Moussaoui and C. Loqman, "Advancements in Arabic Named Entity Recognition: A Comprehensive Review," IEEE Access, vol. 12, pp. 180238–180266, 2024.

E. F. T. K. Sang and S. Buchholz, "Introduction to the CoNLL-2000 Shared Task: Chunking," 2000.

N. Alshammari and S. Alanazi, "The impact of using different annotation schemes on named entity recognition," Egyptian Informatics Journal, vol. 22, no. 3, pp. 295–302, Sep. 2021.

I. Belhajem, "Effects of Multiple Annotation Schemes on Arabic Named Entity Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17060–17067, Oct. 2024.

M. Kamran and S. Mansoor, "Named Entity Recognition System for Postpositional Languages: Urdu as a Case Study," International Journal of Advanced Computer Science and Applications, vol. 7, no. 10, 2016.

M. Konkol and M. Konopík, "Segment Representations in Named Entity Recognition," in Text, Speech, and Dialogue, vol. 9302, P. Král and V. Matoušek, Eds. Cham: Springer International Publishing, 2015, pp. 61–70.

A. Tkachenko, T. Petmanson, and S. Laur, "Named Entity Recognition in Estonian," in Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, Sofia, Bulgaria, Aug. 2013, pp. 78–83.

D. O. F. do Amaral, M. Buffet, and R. Vieira, "Comparative Analysis between Notations to Classify Named Entities using Conditional Random Fields," in Proceedings of Symposium in Information and Human Language Technology, Natal, RN, Brazil, Nov. 2015, pp. 27–31.

Y. Benajiba, P. Rosso, and J. M. BenedíRuiz, "ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy," in Computational Linguistics and Intelligent Text Processing, vol. 4394, A. Gelbukh, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 143–153.

J. D. Lafferty, A. McCallum, and F. C. N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," in ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning, Jun. 2001, pp. 282–289.

W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding." arXiv, Mar. 07, 2021.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv, May 24, 2019.

Vol. 15 (2025)	Vol. 7 (2017)
Vol. 14 (2024)	Vol. 6 (2016)
Vol. 13 (2023)	Vol. 5 (2015)
Vol. 12 (2022)	Vol. 4 (2014)
Vol. 11 (2021)	Vol. 3 (2013)
Vol. 10 (2020)	Vol. 2 (2012)
Vol. 9 (2019)	Vol. 1 (2011)
Vol. 8 (2018)

Exploring the Impact of Annotation Schemes on Arabic Named Entity Recognition across General and Specific Domains

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License