A Hybrid Vision Transformer for T1-Weighted MRI-Based Alzheimer's Disease Staging with Biomarker Fusion
Corresponding author: Sonali Deshpande
Abstract
Magnetic Resonance Imaging (MRI) with deep learning is widely applied for computer-aided diagnosis of Alzheimer's Disease (AD); however, existing models generally struggle with small and imbalanced datasets and often fail to thoroughly utilize anatomically meaningful biomarkers, which are essential for the detection of AD in the early stages. In this work, we tackle these shortcomings by introducing Hybrid Transformer for Alzheimer's Diagnosis (HyTraAD), a hybrid transformer-based model for 4-stage AD classification, including Cognitively Normal (CN), Early Mild Cognitive Impairment (EMCI), Mild Cognitive Impairment (MCI), and AD, from T1-weighted structural MRIs. Our approach combines a Residual Network (ResNet) 50 feature extractor with a light-weight Vision Transformer (ViT) encoder and directly fuses three volumetric biomarkers: hippocampal volume, temporal parietal cortical thickness, and ventricular volume into the learned representation. To address dataset imbalance and improve robustness, a noise-tolerant preprocessing pipeline is introduced, combining Tomek Links for removing borderline samples with the Synthetic Minority Over-sampling Technique (SMOTE) for balancing underrepresented classes. The model was evaluated on the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort consisting of 1,850 subjects. Experimental results demonstrate that HyTraAD achieves 99.81% overall accuracy, a macro F1-score of 0.99, and an EMCI recall of 0.95 on the test set, outperforming recent hybrid Convolutional Neural Network (CNN)-ViT architectures such as Hybrid ResNet-50 + ViT (RViT) and Visual Geometry Group (VGG)-based TSwinformer. Ablation studies further confirm that both biomarker integration and the proposed Tomek Links-SMOTE preprocessing strategy contribute significantly to the performance improvements, particularly in enhancing sensitivity to EMCI cases. Collectively, the results demonstrate that HyTraAD provides a flexible and interpretable framework for MRI-based staging of AD, which is promising for future deployment in clinically oriented decision-support systems, particularly in multi-center and multimodal diagnostic settings.
Keywords:
Alzheimer's Disease (AD), attention mechanism, residual feature encoding, cognitive impairment detection, deep learning, Vision Transformer (ViT)Downloads
References
H. Givian, J.-P. Calbimonte, and the Alzheimer's Disease Neuroimaging Initiative, "Early diagnosis of Alzheimer's disease and mild cognitive impairment using MRI analysis and machine learning algorithms," Discover Applied Sciences, vol. 7, no. 1, Dec. 2024, Art. no. 27. DOI: https://doi.org/10.1007/s42452-024-06440-w
Alzheimer's Disease International, "World Alzheimer Report 2019: Attitudes to dementia", Alzheimer's Disease International (ADI), London, United Kingdom, Sept. 2019.
Alzheimer's Association, "2019 Alzheimer's disease facts and figures," Alzheimer's & Dementia, vol. 15, no. 3, pp. 321–387, Mar. 2019. DOI: https://doi.org/10.1016/j.jalz.2019.01.010
C. R. Jack et al., "NIA‐AA Research Framework: Toward a biological definition of Alzheimer's disease," Alzheimer's & Dementia, vol. 14, no. 4, pp. 535–562, Apr. 2018. DOI: https://doi.org/10.1016/j.jalz.2018.03.004
R. C. Petersen, G. E. Smith, S. C. Waring, R. J. Ivnik, E. G. Tangalos, and E. Kokmen, "Mild Cognitive Impairment: Clinical Characterization and Outcome," Archives of Neurology, vol. 56, no. 3, Mar. 1999, Art. no. 303. DOI: https://doi.org/10.1001/archneur.56.3.303
T. Jo, K. Nho, and A. J. Saykin, "Deep Learning in Alzheimer's Disease: Diagnostic Classification and Prognostic Prediction Using Neuroimaging Data," Frontiers in Aging Neuroscience, vol. 11, Aug. 2019, Art. no. 220. DOI: https://doi.org/10.3389/fnagi.2019.00220
S. Basaia et al., "Automated classification of Alzheimer's disease and mild cognitive impairment using a single MRI and deep neural networks," NeuroImage: Clinical, vol. 21, 2019, Art. no. 101645. DOI: https://doi.org/10.1016/j.nicl.2018.101645
G. Litjens et al., "A survey on deep learning in medical image analysis," Medical Image Analysis, vol. 42, pp. 60–88, Dec. 2017. DOI: https://doi.org/10.1016/j.media.2017.07.005
T. Illakiya, K. Ramamurthy, M. V. Siddharth, R. Mishra, and A. Udainiya, "AHANet: Adaptive Hybrid Attention Network for Alzheimer's Disease Classification Using Brain Magnetic Resonance Imaging," Bioengineering, vol. 10, no. 6, June 2023, Art. no. 714. DOI: https://doi.org/10.3390/bioengineering10060714
C. Matsoukas, J. F. Haslum, M. Söderberg, and K. Smith, "Is it Time to Replace CNNs with Transformers for Medical Images?" arXiv, 2021.
K. Kawadkar, "Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification." arXiv, 2025, https://doi.org/10.48550/ARXIV.2507.21156.
Y. Shen et al., "MoViT: Memorizing Vision Transformers for Medical Image Analysis." arXiv, 2023. DOI: https://doi.org/10.1007/978-3-031-45676-3_21
A. Muhammad, Q. Jin, O. Elwasila, and Y. Gulzar, "Hybrid Deep Learning Architecture with Adaptive Feature Fusion for Multi-Stage Alzheimer's Disease Classification," Brain Sciences, vol. 15, no. 6, June 2025, Art. no. 612. DOI: https://doi.org/10.3390/brainsci15060612
H. Yan, V. Mubonanyikuzo, T. E. Komolafe, L. Zhou, T. Wu, and N. Wang, "Hybrid-RViT: Hybridizing ResNet-50 and Vision Transformer for Enhanced Alzheimer's disease detection," PLOS ONE, vol. 20, no. 2, Feb. 2025, Art. no. e0318998. DOI: https://doi.org/10.1371/journal.pone.0318998
S. Khanapur, J. S. Nayak, B. S. Rajeshwari, M. Namratha, C. B. Bharadwaj, and R. Bhardwaj, "SHAP-Based Explainability for Local and Global Insights in Alzheimer's Detection," Engineering, Technology & Applied Science Research, vol. 16, no. 1, pp. 30940–30947, Feb. 2026. DOI: https://doi.org/10.48084/etasr.13932
E. N. Marzban, A. M. Eldeib, I. A. Yassine, Y. M. Kadah, and for the Alzheimer's Disease Neurodegenerative Initiative, "Alzheimer's disease diagnosis from diffusion tensor images using convolutional neural networks," PLOS ONE, vol. 15, no. 3, Mar. 2020, Art. no. e0230409. DOI: https://doi.org/10.1371/journal.pone.0230409
A. Payan and G. Montana, "Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks." arXiv, 2015.
J. Wen et al., "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation," Medical Image Analysis, vol. 63, July 2020, Art. no. 101694. DOI: https://doi.org/10.1016/j.media.2020.101694
A. Dosovitskiy et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv, 2020.
R. Kadri, B. Bouaziz, M. Tmar, and F. Gargouri, "Multimodal deep learning based on the combination of EfficientNetV2 and ViT for Alzheimer's disease early diagnosis enhanced by SAGAN data augmentation," International Journal of Computer Information Systems and Industrial Management Applications, vol. 14, pp. 313–325, May 2022.
M. Baniata, S. Abuowaida, M. Aljaidi, M. Kharabsheh, A. Alsarhan, and A. A. Alsuwaylimi, "A Multi-Modal Attention-Guided Network for Alzheimer's Disease Classification Using Deep Learning," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27150–27158, Oct. 2025. DOI: https://doi.org/10.48084/etasr.12510
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, June 2002. DOI: https://doi.org/10.1613/jair.953
M. B. Khatooni and M. Soryani, "EffNetViTLoRA: An Efficient Hybrid Deep Learning Approach for Alzheimer's Disease Diagnosis." arXiv, Aug. 2025.
N. Shaffi, V. Viswan, and M. Mahmud, "Ensemble of vision transformer architectures for efficient Alzheimer's Disease classification," Brain Informatics, vol. 11, no. 1, Dec. 2024, Art. no. 25. DOI: https://doi.org/10.1186/s40708-024-00238-7
C. R. Jack et al., "The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods," Journal of Magnetic Resonance Imaging, vol. 27, no. 4, pp. 685–691, Apr. 2008. DOI: https://doi.org/10.1002/jmri.21049
Downloads
How to Cite
License
Copyright (c) 2026 Sonali Deshpande, Nilima Kulkarni

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
