An Efficient Uzbek Speaker Recognition System for Resource-Constrained Devices Using Compact Acoustic Features and Lightweight Deep Models
Received: 9 April 2026 | Revised: 28 April 2026 | Accepted: 9 May 2026 | Online: 6 June 2026
Corresponding author: Parakhat Nurimov
Abstract
Speaker recognition systems have achieved strong performance, but many high-performing approaches remain computationally expensive and therefore not well-suited to resource-constrained devices. This limitation is particularly important in low-resource settings, including Uzbek speech applications, where practical lightweight solutions remain limited. This study presents an efficient Uzbek closed-set, text-independent speaker identification framework based on compact acoustic features and lightweight deep models. Two acoustic representations, namely MFCC-13 and Log-Mel-40, were evaluated along with two lightweight convolutional architectures, namely Small CNN and Compact CNN. The systems were assessed for recognition accuracy, F1 Score, parameter count, model size, and inference latency. The experimental results showed that the Log-Mel-40 + Compact CNN configuration achieved the best overall performance, obtaining 96.44% accuracy and 0.8957 F1-score, while maintaining a compact model size of 0.4606 MB and low inference latency. The findings indicate that practical Uzbek speaker recognition can be achieved on resource-constrained platforms through an appropriate combination of compact acoustic features and lightweight deep models.
Keywords:
speaker recognition, Uzbek speech, lightweight deep learning, MFCC, Log-Mel spectrogram, resource-constrained devices, compact convolutional neural networksReferences
Z. Bai and X.-L. Zhang, "Speaker Recognition Based on Deep Learning: An Overview," Neural Networks, vol. 140, pp. 65–99, Aug. 2021.
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, "X-Vectors: Robust DNN Embeddings for Speaker Recognition," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, Apr. 2018, pp. 5329–5333.
A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A Large-Scale Speaker Identification Dataset," in Interspeech 2017, Stockholm, Sweden, Aug. 2017, pp. 2616–2620.
A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, "VoxCeleb: Large-Scale Speaker Verification in the Wild," Computer Speech & Language, vol. 60, Mar. 2020, Art. no. 101027.
N. Simić et al., "Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech," Entropy, vol. 24, no. 3, Mar. 2022, Art. no. 414.
B. Liu, H. Wang, and Y. Qian, "Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3771–3784, 2024.
B. Liu, H. Wang, and Y. Qian, "Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory," in Interspeech 2023, Dublin, Ireland, Aug. 2023, pp. 1973–1977.
Z. Özcan and T. Kayıkçıoğlu, "Evaluating MFCC-Based Speaker Identification Systems with Data Envelopment Analysis," Expert Systems with Applications, vol. 168, Apr. 2021, Art. no. 114448.
S. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, Aug. 1980.
B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification," in Interspeech 2020, Shanghai, China, Oct. 2020, pp. 3830–3834.
A. Amraoui and S. Saadi, "A Novel Approach on Speaker Gender Identification and Verification Using DWT First Level Energy and Zero Crossing," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9570–9578, Dec. 2022.
I. McLoughlin et al., "Spectrogram Features for Audio and Speech Analysis," Applied Sciences, vol. 16, no. 2, Jan. 2026, Art. no. 572.
M. Kim et al., "Light-Weight Speaker Verification with Global Context Information," in Interspeech 2022, Incheon, South Korea, Sep. 2022, pp. 5105–5109.
J.-H. Choi, J.-Y. Yang, and J.-H. Chang, "Efficient Lightweight Speaker Verification with Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4580–4595, 2024.
N. Mamatov, A. Samijonov, P. Nurimov, and N. Niyozmatova, "Automatic Speaker Identification by Voice Based on Vector Quantization Method," International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 10, pp. 2443–2445, Aug. 2019.
Downloads
How to Cite
License
Copyright (c) 2026 Parakhat Nurimov, Narzillo Mamatov

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
