A FastConformer Framework for Dialect-Inclusive Kannada Speech Recognition

Alaka Ananth; P. S. Venugopala; Sachin S. Bhat

doi:10.48084/etasr.18083

Authors

Alaka Ananth Nitte (Deemed to be University) NMAM Institute of Technology, Nitte, Udupi, India | Visvesvaraya Technological University, Belagavi, India
P. S. Venugopala Nitte (Deemed to be University) NMAM Institute of Technology, Nitte, Udupi, India
Sachin S. Bhat Shri Madhwa Vadiraja Institute of Technology and Management, Bantakal, India

Volume: 16 | Issue: 3 | Pages: 35747-35755 | June 2026 | https://doi.org/10.48084/etasr.18083

Received: 10 February 2026 | Revised: 14 March 2026 and 27 March 2026 | Accepted: 6 April 2026 | Online: 6 June 2026

Corresponding author: Alaka Ananth

Abstract

Despite advances in Automatic Speech Recognition (ASR), low-resource languages such as Kannada suffer from high Word Error Rates (WER), especially across different regional dialects. The present study addresses this issue by presenting a robust multi-dialect Kannada ASR system using a linguistically informed methodology based on a FastConformer architecture, fine-tuned using a carefully curated and dialect-balanced speech corpus representing six major regional dialects of Kannada. The approach introduces three novel elements: (1) dialect-aware curation, (2) unified dialect-invariant architecture, and (3) a controlled baseline framework to quantify the relative contributions of pretraining and architectural design. It employs character-level tokenization and full end-to-end adaptation with advanced architectural features such as convolutional subsampling and relative positional encoding, specifically tailored to address the phonotactic richness and morphological complexity of Kannada. The experimental results demonstrate state-of-the-art performance on both validation and test sets, achieving a WER of 11.23% and Character Error Rate (CER) of 5.31%, with real-time inference capabilities and consistent accuracy across dialectal boundaries. This represents a relative reduction of 15% compared to earlier Kannada baselines. Ablation and fine-tuning strategies confirm the significant contributions of each architectural component. The key contributions of this study include the development of the first multi-dialect Kannada speech corpus and the subsequent demonstration of an effective fine-tuning strategy for end-to-end speech recognition models. Beyond technical innovation, this work advances digital accessibility for Kannada speakers, enabling accurate and inclusive voice-driven technologies for diverse linguistic communities.

Keywords:

FastConformer, Kannada, speech recognition, character embedding

References

G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, Nov. 2012.

Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning," Nature, vol. 521, no. 7553, pp. 436–444, May 2015.

A. Graves, A. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, May 2013, pp. 6645–6649.

A. Graves and J. Schmidhuber, "Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures," Neural Networks, vol. 18, no. 5–6, pp. 602–610, Jul. 2005.

A. Vaswani et al., "Attention Is All You Need." arXiv, 2017.

A. Gulati et al., "Conformer: Convolution-Augmented Transformer for Speech Recognition," in Interspeech 2020, Oct. 2020, pp. 5036–5040.

M. Burchi and V. Vielzeuf, "Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition," in IEEE Automatic Speech Recognition and Understanding Workshop, Cartagena, Colombia, Dec. 2021, pp. 8–15.

A. Baevski, H. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations." arXiv, 2020.

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, "Robust Speech Recognition via Large-Scale Weak Supervision." arXiv, 2022.

A. Babu et al., "XLS-R: Self-Supervised Cross-Lingual Speech Representation Learning at Scale," in Interspeech 2022, Sep. 2022, pp. 2278–2282.

T. Javed, K. Bhogale, A. Raman, P. Kumar, A. Kunchukuttan, and M. M. Khapra, "IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 12942–12950, Jun. 2023.

V. Pratap et al., "Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters," in Interspeech 2020, Oct. 2020, pp. 4751–4755.

R. G. Rajakumari, D. K. Renuka, and L. A. Kumar, "Enhancing ASR Accuracy and Coherence Across Indian Languages with Wav2vec2 and GPT-2," ICTACT Journal on Data Science and Machine Learning, vol. 6, no. 2, pp. 761–764, Mar. 2025.

N. Sethiya, S. Nair, P. Walia, and C. Maurya, "Indic-ST: A Large-Scale Multilingual Corpus for Low-Resource Speech-to-Text Translation," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 24, no. 6, pp. 1–25, Jun. 2025.

B. Choudhury, V. Kumar, and S. Singh, "IndicVoices-R: Multilingual, Multi-Speaker Speech Corpus for Indian TTS." Hugging Face, 2024, [Online]. Available: https://huggingface.co/datasets/ai4bharat/indicvoices_r.

M. C. Shunmuga Priya, D. Karthika Renuka, and L. Ashok Kumar, "Robust Multi-Dialect End-to-End ASR Model Jointly with Beam Search Threshold Pruning and LLM," SN Computer Science, vol. 6, no. 4, Mar. 2025, Art. no. 323.

D. S. Park et al., "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition," in Interspeech 2019, Graz, Austria, Sep. 2019, pp. 2613–2617.

P. Punitha and G. Hemakumar, "Speaker Dependent Continuous Kannada Speech Recognition Using HMM," in International Conference on Intelligent Computing Applications, Coimbatore, India, Mar. 2014, pp. 402–405.

S. C. Sajjan and C. Vijaya, "Continuous Speech Recognition of Kannada Language Using Triphone Modeling," in 2016 International Conference on Wireless Communications, Signal Processing and Networking, Chennai, India, Mar. 2016, pp. 451–455.

R. Pradeep and K. S. Rao, "Deep Neural Networks for Kannada Phoneme Recognition," in Ninth International Conference on Contemporary Computing, Noida, India, Aug. 2016, pp. 1–6.

P. S. Praveen Kumar, G. Thimmaraja Yadava, and H. S. Jayanna, "Continuous Kannada Speech Recognition System Under Degraded Condition," Circuits, Systems, and Signal Processing, vol. 39, no. 1, pp. 391–419, Jan. 2020.

D. S. Jayalakshmi, K. P. Sathvik, and J. Geetha, "Speech Recognition for Kannada Using LSTM," in Advances and Applications of Artificial Intelligence & Machine Learning, vol. 1078, B. Unhelkar, H. M. Pandey, A. P. Agrawal, and A. Choudhary, Eds. Singapore: Springer Nature Singapore, 2023, pp. 189–201.

Y. G. Thimmaraja, B. G. Nagaraja, and H. S. Jayanna, "Improvements in ASR System to Access the Real-Time Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects," Multimedia Tools and Applications, vol. 83, no. 2, pp. 4195–4217, Jan. 2024.

R. Shashidhar and S. Patilkulkarni, "Audiovisual Speech Recognition for Kannada Language Using Feed Forward Neural Network," Neural Computing and Applications, vol. 34, no. 18, pp. 15603–15615, Sep. 2022.

G. Thimmaraja Yadava, B. G. Nagaraja, and G. P. Raghudathesh, "Real-Time Automatic Continuous Speech Recognition System for Kannada Language/Dialects," Wireless Personal Communications, vol. 134, no. 1, pp. 209–223, Jan. 2024.

Mahadevaswamy, "Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise," Wireless Personal Communications, vol. 130, no. 3, pp. 2039–2058, Jun. 2023.

R. Shashidhar, M. P. Shashank, G. Jagadamba, and V. Ravi, "A Fusion Approach for Kannada Speech Recognition Using Audio and Visual Cue," in IoT Sensors, ML, AI and XAI: Empowering A Smarter World, vol. 50, B. Pradhan and S. Mukhopadhyay, Eds. Cham: Springer Nature Switzerland, 2024, pp. 387–414.

Y. G. Thimmaraja, B. G. Nagaraja, and H. S. Jayanna, "Development of Noise Robust Real Time Automatic Speech Recognition System for Kannada Language/Dialects," Engineering Applications of Artificial Intelligence, vol. 135, Sep. 2024, Art. no. 108693.

G. T. Yadava, B. G. Nagaraja, and H. S. Jayanna, "An End-to-End Continuous Kannada ASR System Under Uncontrolled Environment," Multimedia Tools and Applications, vol. 83, no. 3, pp. 7981–7994, Jan. 2024.

Y. G. Thimmaraja, B. G. Nagaraja, and H. S. Jayanna, "Advancements in End-to-End Isolated Kannada ASR System by Combining Robust Noise Elimination Technique and TDNN," Intelligent Systems with Applications, vol. 20, Nov. 2023, Art. no. 200288.

Y. G. Thimmaraja, B. G. Nagaraja, H. S. Jayanna, and B. R. Shivakumar, "A Spoken Query System to Access the Real Time Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects," Multimedia Tools and Applications, vol. 83, no. 10, pp. 28675–28688, Sep. 2023.

N. B. Chittaragi and S. G. Koolagudi, "Automatic Dialect Identification System for Kannada Language Using Single and Ensemble SVM Algorithms," Language Resources and Evaluation, vol. 54, no. 2, pp. 553–585, Jun. 2020.

M. Latha, M. Shivakumar, G. Manjula, M. Hemakumar, and M. K. Kumar, "Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition," SN Computer Science, vol. 4, no. 3, Mar. 2023, Art. no. 272.

R. G. Rajakumari, D. K. Renuka, L. A. Kumar, C. Thiraviya, S. Vaimitra, and S. V. Easwaramoorthy, "Multilingual Automatic Speech Recognition for Indian Languages-E2E Framework," in OITS International Conference on Information Technology, Vijayawada, India, Dec. 2024, pp. 138–142.

M. Shanthamallappa and B. P. Pradeep Kumar, "Enhanced Perceptual Wavelet Packet Features for Spontaneous Kannada Sentence Recognition Under Uncontrolled Conditions," International Journal of Speech Technology, vol. 28, no. 1, pp. 153–174, Mar. 2025.

G. Thimmaraja Yadava, B. G. Nagaraja, and H. S. Jayanna, "Amalgamation of Noise Elimination and TDNN Acoustic Modelling Techniques for the Advancements in Continuous Kannada ASR System," Multimedia Tools and Applications, vol. 83, no. 7, pp. 19953–19968, Jul. 2023.

P. Rajeswari, N. Shankaraiah, and S. Rathnakara, "Enhancement and Reconstruction of Dysphonic Kannada Speech Using a Generative Adversarial Network and a SepFormer Model," Engineering, Technology & Applied Science Research, vol. 15, no. 6, pp. 29097–29102, Dec. 2025.

G. T. Yadava and B. G. Nagaraja, "Noise Robust E2E Continuous Kannada ASR System Under Real Time Conditions," Circuits, Systems, and Signal Processing, vol. 44, no. 7, pp. 4965–4987, Jul. 2025.