An Efficient Uzbek Speaker Recognition System for Resource-Constrained Devices Using Compact Acoustic Features and Lightweight Deep Models

Parakhat Nurimov; Narzillo Mamatov

doi:10.48084/etasr.19226

Authors

Parakhat Nurimov Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, National Research University, Tashkent, Uzbekistan
Narzillo Mamatov Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, National Research University, Tashkent, Uzbekistan

Volume: 16 | Issue: 3 | Pages: 36567-36573 | June 2026 | https://doi.org/10.48084/etasr.19226

Received: 9 April 2026 | Revised: 28 April 2026 | Accepted: 9 May 2026 | Online: 6 June 2026

Corresponding author: Parakhat Nurimov

Abstract

Speaker recognition systems have achieved strong performance, but many high-performing approaches remain computationally expensive and therefore not well-suited to resource-constrained devices. This limitation is particularly important in low-resource settings, including Uzbek speech applications, where practical lightweight solutions remain limited. This study presents an efficient Uzbek closed-set, text-independent speaker identification framework based on compact acoustic features and lightweight deep models. Two acoustic representations, namely MFCC-13 and Log-Mel-40, were evaluated along with two lightweight convolutional architectures, namely Small CNN and Compact CNN. The systems were assessed for recognition accuracy, F1 Score, parameter count, model size, and inference latency. The experimental results showed that the Log-Mel-40 + Compact CNN configuration achieved the best overall performance, obtaining 96.44% accuracy and 0.8957 F1-score, while maintaining a compact model size of 0.4606 MB and low inference latency. The findings indicate that practical Uzbek speaker recognition can be achieved on resource-constrained platforms through an appropriate combination of compact acoustic features and lightweight deep models.

Keywords:

speaker recognition, Uzbek speech, lightweight deep learning, MFCC, Log-Mel spectrogram, resource-constrained devices, compact convolutional neural networks

References

Z. Bai and X.-L. Zhang, "Speaker Recognition Based on Deep Learning: An Overview," Neural Networks, vol. 140, pp. 65–99, Aug. 2021.

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, "X-Vectors: Robust DNN Embeddings for Speaker Recognition," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, Apr. 2018, pp. 5329–5333.

A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A Large-Scale Speaker Identification Dataset," in Interspeech 2017, Stockholm, Sweden, Aug. 2017, pp. 2616–2620.

A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, "VoxCeleb: Large-Scale Speaker Verification in the Wild," Computer Speech & Language, vol. 60, Mar. 2020, Art. no. 101027.

N. Simić et al., "Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech," Entropy, vol. 24, no. 3, Mar. 2022, Art. no. 414.

B. Liu, H. Wang, and Y. Qian, "Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3771–3784, 2024.

B. Liu, H. Wang, and Y. Qian, "Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory," in Interspeech 2023, Dublin, Ireland, Aug. 2023, pp. 1973–1977.

Z. Özcan and T. Kayıkçıoğlu, "Evaluating MFCC-Based Speaker Identification Systems with Data Envelopment Analysis," Expert Systems with Applications, vol. 168, Apr. 2021, Art. no. 114448.

S. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, Aug. 1980.

B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification," in Interspeech 2020, Shanghai, China, Oct. 2020, pp. 3830–3834.

A. Amraoui and S. Saadi, "A Novel Approach on Speaker Gender Identification and Verification Using DWT First Level Energy and Zero Crossing," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9570–9578, Dec. 2022.

I. McLoughlin et al., "Spectrogram Features for Audio and Speech Analysis," Applied Sciences, vol. 16, no. 2, Jan. 2026, Art. no. 572.

M. Kim et al., "Light-Weight Speaker Verification with Global Context Information," in Interspeech 2022, Incheon, South Korea, Sep. 2022, pp. 5105–5109.

J.-H. Choi, J.-Y. Yang, and J.-H. Chang, "Efficient Lightweight Speaker Verification with Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4580–4595, 2024.

N. Mamatov, A. Samijonov, P. Nurimov, and N. Niyozmatova, "Automatic Speaker Identification by Voice Based on Vector Quantization Method," International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 10, pp. 2443–2445, Aug. 2019.