A Noise-Resilient Voice Command System for Smart Wheelchairs Using Gammatone Frequency Cepstral Coefficients and ResNet50

Fitri Utaminingrum; Aulia Riza Mufita; Aldiansyah Satrio Kabisat; I Komang Somawirata

doi:10.48084/etasr.16301

Authors

Fitri Utaminingrum Department of Informatics Engineering, Brawijaya University, Indonesia
Aulia Riza Mufita Department of Informatics Engineering, Brawijaya University, Indonesia
Aldiansyah Satrio Kabisat Department of Informatics Engineering, Brawijaya University, Indonesia
I Komang Somawirata Department of Electrical Engineering, National Institute of Technology, Indonesia

Volume: 16 | Issue: 2 | Pages: 33890-33895 | April 2026 | https://doi.org/10.48084/etasr.16301

Received: 16 November 2025 | Revised: 29 December 2025, 19 January 2026, and 7 February 2026 | Accepted: 8 February 2026 | Online: 4 April 2026

Corresponding author: Fitri Utaminingrum

Abstract

This study introduces a voice-activated smart wheelchair system engineered to support individuals with physical disabilities, especially in noisy environments. The proposed system employs GFCC for noise-resistant feature extraction and utilizes a ResNet50 deep learning architecture for command classification, implemented on an NVIDIA Jetson TX2 embedded platform. The model is designed to accurately identify Indonesian vocal commands related to wheelchair movement directions. Experimental evaluation encompasses epoch-wise performance analysis, confusion matrix evaluation, computational time measurement, and comprehensive testing within real-world environments under both calm and noisy conditions. The best model was found at epoch 72, when the validation accuracy was 94.6%, the validation loss was 0.221, and the macro-averaged precision, recall, and F1-score values were 0.955, 0.957, and 0.956, respectively. The average GFCC extraction and inference durations are 0.089 and 0.578 s, respectively, culminating in a total system latency of 0.667 s, thereby meeting real-time control specifications. Integrated testing shows that the proposed system works 88% of the time in quiet settings and 73.33% of the time in noisy ones. These findings demonstrate that the proposed GFCC–ResNet50 framework exhibits robust noise resistance and dependable real-time performance, rendering it appropriate for practical assistive mobility applications.

Keywords:

smart wheelchair, voice command recognition, GFCC, ResNet50, Jetson TX2, noisy environments

Downloads

Download data is not yet available.

References

Global Report on Health Equity for Persons with Disabilities, 1st ed. World Health Organization, 2022.

B. S. P. Laksono, T. Syaifuddin, and F. Utaminingrum, "Voice Recognition to Classify ‘Buka’ and ‘Tutup’ Sound to Open and Closes Door Using Mel Frequency Cepstral Coefficients (MFCC) and Convolutional Neural Network (CNN)," Journal of Information Technology and Computer Science, vol. 9, no. 1, pp. 58–66, Apr. 2024. DOI: https://doi.org/10.25126/jitecs.202491579

M. Z. Abbiyansyah and F. Utaminingrum, "Voice Recognition on Humanoid Robot Darwin OP Using Mel Frequency Cepstrum Coefficients (MFCC) Feature and Artificial Neural Networks (ANN) Method," in 2022 2nd International Conference on Information Technology and Education (ICIT&E), Jan. 2022, pp. 251–256. DOI: https://doi.org/10.1109/ICITE54466.2022.9759883

A. A. Alasadi, T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, and A. S. Alshebami, "Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System," Engineering, Technology & Applied Science Research, vol. 10, no. 2, pp. 5547–5553, Apr. 2020. DOI: https://doi.org/10.48084/etasr.3465

P. Bawa, V. Kadyan, and G. Chhabra, "A Multifaceted Feature Extraction Approach for Noise-Robust Punjabi Spoken Digit Recognition System Under Low-Resource Conditions," in 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Mar. 2024, pp. 1–6. DOI: https://doi.org/10.1109/ICRITO61523.2024.10522268

N. Boualoulou, T. Belhoussine Drissi, and B. Nsiri, "Comparison of Feature Extraction Methods Between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease Diagnosis," in IoT Based Control Networks and Intelligent Systems, 2024, pp. 231–247. DOI: https://doi.org/10.1007/978-981-99-6586-1_16

L. Borawar and R. Kaur, "ResNet: Solving Vanishing Gradient in Deep Networks," in Proceedings of International Conference on Recent Trends in Computing, 2023, pp. 235–247. DOI: https://doi.org/10.1007/978-981-19-8825-7_21

P. Nagpal, S. A. Bhinge, and A. Shitole, "A Comparative Analysis of ResNet Architectures," in 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Dec. 2022, pp. 1–8. DOI: https://doi.org/10.1109/SMARTGENCON56628.2022.10083966

B. Bangennavar, S. Patil, S. Kudal, B. D. Parmeshachari, and R. Latti, "People Tracking and Counting using Jetson TX2 Kit with Tracking Algorithm," in 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon), Nov. 2022, pp. 1–5. DOI: https://doi.org/10.1109/NKCon56289.2022.10126790

N. Takahashi, M. Gygli, B. Pfister, and L. V. Gool, "Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition," in Interspeech 2016, Sept. 2016, pp. 2982–2986. DOI: https://doi.org/10.21437/Interspeech.2016-805

S. Gondi and V. Pratap, "Performance Evaluation of Offline Speech Recognition on Edge Devices," Electronics, vol. 10, no. 21, Nov. 2021, Art. no. 2697. DOI: https://doi.org/10.3390/electronics10212697

N. M. Sharma, V. Kumar, P. K. Mahapatra, and V. Gandhi, "Comparative analysis of various feature extraction techniques for classification of speech disfluencies," Speech Communication, vol. 150, pp. 23–31, May 2023. DOI: https://doi.org/10.1016/j.specom.2023.04.003

Z. Mengxi and T. Zhiguo, "Research on Failure Identification of Partial Discharge Ultrasonic Signal Based on GFCC," in 2020 IEEE Electrical Insulation Conference (EIC), June 2020, pp. 412–416. DOI: https://doi.org/10.1109/EIC47619.2020.9158683

G. Sharma, K. Umapathy, and S. Krishnan, "Trends in audio signal feature extraction methods," Applied Acoustics, vol. 158, Jan. 2020, Art. no. 107020. DOI: https://doi.org/10.1016/j.apacoust.2019.107020

M. A. Islam, "Non-linear Power Exponent Effect in GFCC for Bangla and Malay speech Separation," in 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), Feb. 2022, pp. 206–211. DOI: https://doi.org/10.1109/ICISET54810.2022.9775917