A Noise-Resilient Voice Command System for Smart Wheelchairs Using Gammatone Frequency Cepstral Coefficients and ResNet50
Received: 16 November 2025 | Revised: 29 December 2025, 19 January 2026, and 7 February 2026 | Accepted: 8 February 2026 | Online: 4 April 2026
Corresponding author: Fitri Utaminingrum
Abstract
This study introduces a voice-activated smart wheelchair system engineered to support individuals with physical disabilities, especially in noisy environments. The proposed system employs GFCC for noise-resistant feature extraction and utilizes a ResNet50 deep learning architecture for command classification, implemented on an NVIDIA Jetson TX2 embedded platform. The model is designed to accurately identify Indonesian vocal commands related to wheelchair movement directions. Experimental evaluation encompasses epoch-wise performance analysis, confusion matrix evaluation, computational time measurement, and comprehensive testing within real-world environments under both calm and noisy conditions. The best model was found at epoch 72, when the validation accuracy was 94.6%, the validation loss was 0.221, and the macro-averaged precision, recall, and F1-score values were 0.955, 0.957, and 0.956, respectively. The average GFCC extraction and inference durations are 0.089 and 0.578 s, respectively, culminating in a total system latency of 0.667 s, thereby meeting real-time control specifications. Integrated testing shows that the proposed system works 88% of the time in quiet settings and 73.33% of the time in noisy ones. These findings demonstrate that the proposed GFCC–ResNet50 framework exhibits robust noise resistance and dependable real-time performance, rendering it appropriate for practical assistive mobility applications.
Keywords:
smart wheelchair, voice command recognition, GFCC, ResNet50, Jetson TX2, noisy environmentsDownloads
References
Global Report on Health Equity for Persons with Disabilities, 1st ed. World Health Organization, 2022.
B. S. P. Laksono, T. Syaifuddin, and F. Utaminingrum, "Voice Recognition to Classify ‘Buka’ and ‘Tutup’ Sound to Open and Closes Door Using Mel Frequency Cepstral Coefficients (MFCC) and Convolutional Neural Network (CNN)," Journal of Information Technology and Computer Science, vol. 9, no. 1, pp. 58–66, Apr. 2024. DOI: https://doi.org/10.25126/jitecs.202491579
M. Z. Abbiyansyah and F. Utaminingrum, "Voice Recognition on Humanoid Robot Darwin OP Using Mel Frequency Cepstrum Coefficients (MFCC) Feature and Artificial Neural Networks (ANN) Method," in 2022 2nd International Conference on Information Technology and Education (ICIT&E), Jan. 2022, pp. 251–256. DOI: https://doi.org/10.1109/ICITE54466.2022.9759883
A. A. Alasadi, T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, and A. S. Alshebami, "Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System," Engineering, Technology & Applied Science Research, vol. 10, no. 2, pp. 5547–5553, Apr. 2020. DOI: https://doi.org/10.48084/etasr.3465
P. Bawa, V. Kadyan, and G. Chhabra, "A Multifaceted Feature Extraction Approach for Noise-Robust Punjabi Spoken Digit Recognition System Under Low-Resource Conditions," in 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Mar. 2024, pp. 1–6. DOI: https://doi.org/10.1109/ICRITO61523.2024.10522268
N. Boualoulou, T. Belhoussine Drissi, and B. Nsiri, "Comparison of Feature Extraction Methods Between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease Diagnosis," in IoT Based Control Networks and Intelligent Systems, 2024, pp. 231–247. DOI: https://doi.org/10.1007/978-981-99-6586-1_16
L. Borawar and R. Kaur, "ResNet: Solving Vanishing Gradient in Deep Networks," in Proceedings of International Conference on Recent Trends in Computing, 2023, pp. 235–247. DOI: https://doi.org/10.1007/978-981-19-8825-7_21
P. Nagpal, S. A. Bhinge, and A. Shitole, "A Comparative Analysis of ResNet Architectures," in 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Dec. 2022, pp. 1–8. DOI: https://doi.org/10.1109/SMARTGENCON56628.2022.10083966
B. Bangennavar, S. Patil, S. Kudal, B. D. Parmeshachari, and R. Latti, "People Tracking and Counting using Jetson TX2 Kit with Tracking Algorithm," in 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon), Nov. 2022, pp. 1–5. DOI: https://doi.org/10.1109/NKCon56289.2022.10126790
N. Takahashi, M. Gygli, B. Pfister, and L. V. Gool, "Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition," in Interspeech 2016, Sept. 2016, pp. 2982–2986. DOI: https://doi.org/10.21437/Interspeech.2016-805
S. Gondi and V. Pratap, "Performance Evaluation of Offline Speech Recognition on Edge Devices," Electronics, vol. 10, no. 21, Nov. 2021, Art. no. 2697. DOI: https://doi.org/10.3390/electronics10212697
N. M. Sharma, V. Kumar, P. K. Mahapatra, and V. Gandhi, "Comparative analysis of various feature extraction techniques for classification of speech disfluencies," Speech Communication, vol. 150, pp. 23–31, May 2023. DOI: https://doi.org/10.1016/j.specom.2023.04.003
Z. Mengxi and T. Zhiguo, "Research on Failure Identification of Partial Discharge Ultrasonic Signal Based on GFCC," in 2020 IEEE Electrical Insulation Conference (EIC), June 2020, pp. 412–416. DOI: https://doi.org/10.1109/EIC47619.2020.9158683
G. Sharma, K. Umapathy, and S. Krishnan, "Trends in audio signal feature extraction methods," Applied Acoustics, vol. 158, Jan. 2020, Art. no. 107020. DOI: https://doi.org/10.1016/j.apacoust.2019.107020
M. A. Islam, "Non-linear Power Exponent Effect in GFCC for Bangla and Malay speech Separation," in 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), Feb. 2022, pp. 206–211. DOI: https://doi.org/10.1109/ICISET54810.2022.9775917
Downloads
How to Cite
License
Copyright (c) 2026 Fitr Utaminingrum, Aulia Riza Mufita, Aldiansyah Satrio Kabisat, I Komang Somawirata

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
