A Gamified Web Platform for the Automated Diagnosis of Childhood Phonological and Phonetic Disorders through Deep Learning

Josty Gerardo Tafur-Gonzales; Joao Arturo Basauri-Bazalar; Sandra Wong-Durand; Pedro Castaneda; Alejandra Onate-Andino

doi:10.48084/etasr.16859

Authors

Josty Gerardo Tafur-Gonzales Universidad Peruana de Ciencias Aplicadas, San Isidro, Lima, Peru
Joao Arturo Basauri-Bazalar Universidad Peruana de Ciencias Aplicadas, San Isidro, Lima, Peru
Sandra Wong-Durand Faculty of Information Systems Engineering, Universidad Peruana de Ciencias Aplicadas, San Isidro, Lima, Peru https://orcid.org/0000-0002-6154-2124
Pedro Castaneda Faculty of Systems Engineering and Electrical Mechanics, Universidad Nacional Toribio Rodriguez de Mendoza, Amazonas, Peru https://orcid.org/0000-0003-1865-1293
Alejandra Onate-Andino Escuela Superior Politecnica de Chimborazo (ESPOCH), Riobamba, Ecuador

Volume: 16 | Issue: 2 | Pages: 34301-34309 | April 2026 | https://doi.org/10.48084/etasr.16859

Received: 10 December 2025 | Revised: 21 January 2026 and 8 February 2026 | Accepted: 19 February 2026 | Online: 4 April 2026

Corresponding author: Sandra Wong-Durand

Abstract

This paper presents a gamified web platform for the automated diagnosis of children's phonetic–phonological disorders. The system integrates deep learning models with acoustic representations extracted using Wav2Vec2 and structured linguistic coding. It was evaluated on a clinical corpus of over 700 recordings, using cross-validation and a comparison between seven classification models. The model based on deep dense networks achieved an accuracy of 83.57%, exceeding the commonly accepted clinical threshold. In addition, the system reduced the evaluation time by 49.6% compared to the traditional method. The system was preliminarily evaluated using speech data collected from 10 children, focusing on technical feasibility and performance trends rather than definitive clinical validation. While the obtained results show promising classification accuracy, they should be interpreted as an initial proof of concept. The results support its applicability as an objective, accessible, and scalable tool in clinical and educational contexts.

Keywords:

speech sound disorders, deep learning, diagnostic automation, pediatric speech therapy, Wav2Vec 2.0, gamified platform, Spanish language processing, web-based evaluation tools

References

A. K. Namasivayam, D. Coleman, A. O’Dwyer, and P. van Lieshout, "Speech Sound Disorders in Children: An Articulatory Phonology Perspective," Frontiers in Psychology, vol. 10, Jan. 2020, Art. no. 2998. DOI: https://doi.org/10.3389/fpsyg.2019.02998

L. Sices, H. G. Taylor, L. Freebairn, A. Hansen, and B. Lewis, "Relationship Between Speech-Sound Disorders and Early Literacy Skills in Preschool-Age Children: Impact of Comorbid Language Impairment," Journal of Developmental & Behavioral Pediatrics, vol. 28, no. 6, pp. 438–447, Dec. 2007. DOI: https://doi.org/10.1097/DBP.0b013e31811ff8ca

S. Dudy, S. Bedrick, M. Asgari, and A. Kain, "Automatic analysis of pronunciations for children with speech sound disorders," Computer Speech & Language, vol. 50, pp. 62–84, July 2018. DOI: https://doi.org/10.1016/j.csl.2017.12.006

Z. Brahmi, M. Mahyoob, M. Al-Sarem, J. Algaraady, K. Bousselmi, and A. Alblwi, "Exploring the Role of Machine Learning in Diagnosing and Treating Speech Disorders: A Systematic Literature Review," Psychology Research and Behavior Management, vol. 17, pp. 2205–2232, Dec. 2024. DOI: https://doi.org/10.2147/PRBM.S460283

G. A. Attwell, K. E. Bennin, and B. Tekinerdogan, "A Systematic Review of Online Speech Therapy Systems for Intervention in Childhood Speech Communication Disorders," Sensors, vol. 22, no. 24, Dec. 2022, Art. no. 9713. DOI: https://doi.org/10.3390/s22249713

Y.-M. Kuo, S.-J. Ruan, Y.-C. Chen, and Y.-W. Tu, "Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders," Children, vol. 9, no. 7, July 2022, Art. no. 996. DOI: https://doi.org/10.3390/children9070996

S. S. Sung, J. So, T.-J. Yoon, and S. Ha, "Automatic detection of speech sound disorder in children using automatic speech recognition and audio classification," Phonetics and Speech Sciences, vol. 16, no. 3, pp. 87–94, 2024. DOI: https://doi.org/10.13064/KSSS.2024.16.3.087

X. Zhang, F. Qin, Z. Chen, L. Gao, G. Qiu, and S. Lu, "Fast screening for children’s developmental language disorders via comprehensive speech ability evaluation—using a novel deep learning framework," Annals of Translational Medicine, vol. 8, no. 11, pp. 707–707, June 2020. DOI: https://doi.org/10.21037/atm-19-3097

T. Brackenbury and L. Kopf, "Serious Games and Gamification: Game-Based Learning in Communication Sciences and Disorders," Perspectives of the ASHA Special Interest Groups, vol. 7, no. 2, pp. 482–498, Apr. 2022. DOI: https://doi.org/10.1044/2021_PERSP-21-00284

A. Vaezipour, J. Campbell, D. Theodoros, and T. Russell, "Mobile Apps for Speech-Language Therapy in Adults With Communication Disorders: Review of Content and Quality," JMIR mHealth and uHealth, vol. 8, no. 10, Oct. 2020, Art. no. e18858. DOI: https://doi.org/10.2196/18858

A. Iyer et al., "A machine learning method to process voice samples for identification of Parkinson’s disease," Scientific Reports, vol. 13, no. 1, Nov. 2023, Art. no. 20615. DOI: https://doi.org/10.1038/s41598-023-47568-w

S. Cho et al., "Automatic classification of AD pathology in FTD phenotypes using natural speech," Alzheimer’s & Dementia, vol. 20, no. 5, pp. 3416–3428, May 2024. DOI: https://doi.org/10.1002/alz.13748

F. García-Gutiérrez et al., "Unveiling the sound of the cognitive status: Machine Learning-based speech analysis in the Alzheimer’s disease spectrum," Alzheimer’s Research & Therapy, vol. 16, no. 1, Feb. 2024, Art. no. 26. DOI: https://doi.org/10.1186/s13195-024-01394-y

F. Javanmardi, S. R. Kadiri, and P. Alku, "Pre-trained models for detection and severity level classification of dysarthria from speech," Speech Communication, vol. 158, Mar. 2024, Art. no. 103047. DOI: https://doi.org/10.1016/j.specom.2024.103047

G. Vuong, C. L. Burns, J. Dignam, D. A. Copland, H. Wedley, and A. J. Hill, "Configuration of a telerehabilitation system to deliver a comprehensive aphasia therapy program via telerehabilitation (TeleCHAT): A human-centred design approach," Aphasiology, vol. 39, no. 1, pp. 93–124, Jan. 2025. DOI: https://doi.org/10.1080/02687038.2024.2314328

D. Mulfari, D. La Placa, C. Rovito, A. Celesti, and M. Villari, "Deep learning applications in telerehabilitation speech therapy scenarios," Computers in Biology and Medicine, vol. 148, Sept. 2022, Art. no. 105864. DOI: https://doi.org/10.1016/j.compbiomed.2022.105864

A. E. O. Castellanos, C.-M. Liu, and C. Shi, "Deep Mobile Linguistic Therapy for Patients with ASD," International Journal of Environmental Research and Public Health, vol. 19, no. 19, Oct. 2022, Art. no. 12857. DOI: https://doi.org/10.3390/ijerph191912857

A. S. Nunes et al., "Digital assessment of speech in Huntington disease," Frontiers in Neurology, vol. 15, Jan. 2024, Art. no. 1310548. DOI: https://doi.org/10.3389/fneur.2024.1310548

Y. Momota et al., "Language patterns in Japanese patients with Alzheimer disease: A machine learning approach," Psychiatry and Clinical Neurosciences, vol. 77, no. 5, pp. 273–281, May 2023. DOI: https://doi.org/10.1111/pcn.13526

G. Gosztolya, V. Svindt, J. Bóna, and I. Hoffmann, "Extracting Phonetic Posterior-Based Features for Detecting Multiple Sclerosis From Speech," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 3234–3244, 2023. DOI: https://doi.org/10.1109/TNSRE.2023.3300532

B. G. Schultz et al., "Disease Delineation for Multiple Sclerosis, Friedreich Ataxia, and Healthy Controls Using Supervised Machine Learning on Speech Acoustics," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 4278–4285, 2023. DOI: https://doi.org/10.1109/TNSRE.2023.3321874

J. Liu et al., "Efficient Pause Extraction and Encode Strategy for Alzheimer’s Disease Detection Using Only Acoustic Features from Spontaneous Speech," Brain Sciences, vol. 13, no. 3, Mar. 2023, Art. no. 477. DOI: https://doi.org/10.3390/brainsci13030477

M. Geng et al., "Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2597–2611, 2022. DOI: https://doi.org/10.1109/TASLP.2022.3195113

J. Song et al., "Detection and differentiation of ataxic and hypokinetic dysarthria in cerebellar ataxia and parkinsonian disorders via wave splitting and integrating neural networks," Plos One, vol. 17, no. 6, June 2022, Art. no. e0268337. DOI: https://doi.org/10.1371/journal.pone.0268337

P. Kadambi, T. J. Mahr, K. C. Hustad, and V. Berisha, "A Tunable Forced Alignment System Based on Deep Learning: Applications to Child Speech," Journal of Speech, Language, and Hearing Research, vol. 68, no. 7S, pp. 3583–3601, July 2025. DOI: https://doi.org/10.1044/2024_JSLHR-24-00347

Y. Li, D.-S. Pham, R. Ward, N. Hennessey, and T. Tan, "Using AI to Automate Phonetic Transcription and Perform Forced Alignment for Clinical Application in the Assessment of Speech Sound Disorders," in Workshop on Large Language Models and Generative AI for Health at AAAI 2025, Philadelphia, PA, USA, 2025.

H. M. D. P. M. Herath, W. A. S. A. Weraniyagoda, R. T. M. Rajapaksha, P. A. D. S. N. Wijesekara, K. L. K. Sudheera, and P. H. J. Chong, "Automatic Assessment of Aphasic Speech Sensed by Audio Sensors for Classification into Aphasia Severity Levels to Recommend Speech Therapies," Sensors, vol. 22, no. 18, Sept. 2022, Art. no. 6966. DOI: https://doi.org/10.3390/s22186966

F. Bertini, D. Allevi, G. Lutero, D. Montesi, and L. Calzà, "Automatic Speech Classifier for Mild Cognitive Impairment and Early Dementia," ACM Transactions on Computing for Healthcare, vol. 3, no. 1, Oct. 2021, Art. no. 8. DOI: https://doi.org/10.1145/3469089

M. R. Kumar et al., "Dementia Detection from Speech Using Machine Learning and Deep Learning Architectures," Sensors, vol. 22, no. 23, Nov. 2022, Art. no. 9311. DOI: https://doi.org/10.3390/s22239311

C. Laganas et al., "Parkinson’s Disease Detection Based on Running Speech Data From Phone Calls," IEEE Transactions on Biomedical Engineering, vol. 69, no. 5, pp. 1573–1584, May 2022. DOI: https://doi.org/10.1109/TBME.2021.3116935

C.-H. Hsiao, S.-J. Ruan, C.-L. Chen, Y.-W. Tu, Y.-C. Chen, and G. M. Rahmatullah, "A Text-Dependent End-to-End Speech Sound Disorder Detection and Diagnosis in Mandarin-Speaking Children," IEEE Transactions on Instrumentation and Measurement, vol. 73, 2024, Art. no. 3001911. DOI: https://doi.org/10.1109/TIM.2024.3438853

A. Das and R. Gutierrez-Osuna, "Improving Mispronunciation Detection Using Speech Reconstruction," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4420–4433, 2024. DOI: https://doi.org/10.1109/TASLP.2024.3434497

A. Samad, A. U. Rehman, and S. A. Ali, "Performance Evaluation of Learning Classifiers of Children Emotions using Feature Combinations in the Presence of Noise," Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5088–5092, Dec. 2019. DOI: https://doi.org/10.48084/etasr.3193

J. T. Tafur Gonzales, J. B. Bazalar, S. A. Wong Durand, and A. D. García Núñez, "Deep Learning Based Web System for the Automated Diagnosis of Phonological-Phonemic Disorders in Infants," in 2025 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering, Nan, Thailand, 2025, pp. 680–685. DOI: https://doi.org/10.1109/ECTIDAMTNCON64748.2025.10961963

J. G. Tafur Gonzales, "Spanish audios classified according to phonetic-phonological speech disorders." Mendeley Data, July 15, 2025.