A Hybrid Vision Transformer for T1-Weighted MRI-Based Alzheimer's Disease Staging with Biomarker Fusion

Sonali Deshpande; Nilima Kulkarni

doi:10.48084/etasr.17151

Authors

Sonali Deshpande Computer Science & Engineering Department, MIT SoC, MIT Art, Design and Technology University, Pune, 412201, Maharashtra, India
Nilima Kulkarni Computer Science & Engineering Department, MIT SoC, MIT Art, Design and Technology University, Pune, 412201, Maharashtra, India

Volume: 16 | Issue: 2 | Pages: 34546-34552 | April 2026 | https://doi.org/10.48084/etasr.17151

Received: 29 December 2025 | Revised: 20 January 2026 and 3 February 2026 | Accepted: 7 February 2026 | Online: 18 March 2026
Corresponding author: Sonali Deshpande

Abstract

Magnetic Resonance Imaging (MRI) with deep learning is widely applied for computer-aided diagnosis of Alzheimer's Disease (AD); however, existing models generally struggle with small and imbalanced datasets and often fail to thoroughly utilize anatomically meaningful biomarkers, which are essential for the detection of AD in the early stages. In this work, we tackle these shortcomings by introducing Hybrid Transformer for Alzheimer's Diagnosis (HyTraAD), a hybrid transformer-based model for 4-stage AD classification, including Cognitively Normal (CN), Early Mild Cognitive Impairment (EMCI), Mild Cognitive Impairment (MCI), and AD, from T1-weighted structural MRIs. Our approach combines a Residual Network (ResNet) 50 feature extractor with a light-weight Vision Transformer (ViT) encoder and directly fuses three volumetric biomarkers: hippocampal volume, temporal parietal cortical thickness, and ventricular volume into the learned representation. To address dataset imbalance and improve robustness, a noise-tolerant preprocessing pipeline is introduced, combining Tomek Links for removing borderline samples with the Synthetic Minority Over-sampling Technique (SMOTE) for balancing underrepresented classes. The model was evaluated on the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort consisting of 1,850 subjects. Experimental results demonstrate that HyTraAD achieves 99.81% overall accuracy, a macro F1-score of 0.99, and an EMCI recall of 0.95 on the test set, outperforming recent hybrid Convolutional Neural Network (CNN)-ViT architectures such as Hybrid ResNet-50 + ViT (RViT) and Visual Geometry Group (VGG)-based TSwinformer. Ablation studies further confirm that both biomarker integration and the proposed Tomek Links-SMOTE preprocessing strategy contribute significantly to the performance improvements, particularly in enhancing sensitivity to EMCI cases. Collectively, the results demonstrate that HyTraAD provides a flexible and interpretable framework for MRI-based staging of AD, which is promising for future deployment in clinically oriented decision-support systems, particularly in multi-center and multimodal diagnostic settings.

Keywords:

Alzheimer's Disease (AD), attention mechanism, residual feature encoding, cognitive impairment detection, deep learning, Vision Transformer (ViT)

Downloads

Download data is not yet available.

References

H. Givian, J.-P. Calbimonte, and the Alzheimer's Disease Neuroimaging Initiative, "Early diagnosis of Alzheimer's disease and mild cognitive impairment using MRI analysis and machine learning algorithms," Discover Applied Sciences, vol. 7, no. 1, Dec. 2024, Art. no. 27. DOI: https://doi.org/10.1007/s42452-024-06440-w

Alzheimer's Disease International, "World Alzheimer Report 2019: Attitudes to dementia", Alzheimer's Disease International (ADI), London, United Kingdom, Sept. 2019.

Alzheimer's Association, "2019 Alzheimer's disease facts and figures," Alzheimer's & Dementia, vol. 15, no. 3, pp. 321–387, Mar. 2019. DOI: https://doi.org/10.1016/j.jalz.2019.01.010

C. R. Jack et al., "NIA‐AA Research Framework: Toward a biological definition of Alzheimer's disease," Alzheimer's & Dementia, vol. 14, no. 4, pp. 535–562, Apr. 2018. DOI: https://doi.org/10.1016/j.jalz.2018.03.004

R. C. Petersen, G. E. Smith, S. C. Waring, R. J. Ivnik, E. G. Tangalos, and E. Kokmen, "Mild Cognitive Impairment: Clinical Characterization and Outcome," Archives of Neurology, vol. 56, no. 3, Mar. 1999, Art. no. 303. DOI: https://doi.org/10.1001/archneur.56.3.303

T. Jo, K. Nho, and A. J. Saykin, "Deep Learning in Alzheimer's Disease: Diagnostic Classification and Prognostic Prediction Using Neuroimaging Data," Frontiers in Aging Neuroscience, vol. 11, Aug. 2019, Art. no. 220. DOI: https://doi.org/10.3389/fnagi.2019.00220

S. Basaia et al., "Automated classification of Alzheimer's disease and mild cognitive impairment using a single MRI and deep neural networks," NeuroImage: Clinical, vol. 21, 2019, Art. no. 101645. DOI: https://doi.org/10.1016/j.nicl.2018.101645

G. Litjens et al., "A survey on deep learning in medical image analysis," Medical Image Analysis, vol. 42, pp. 60–88, Dec. 2017. DOI: https://doi.org/10.1016/j.media.2017.07.005

T. Illakiya, K. Ramamurthy, M. V. Siddharth, R. Mishra, and A. Udainiya, "AHANet: Adaptive Hybrid Attention Network for Alzheimer's Disease Classification Using Brain Magnetic Resonance Imaging," Bioengineering, vol. 10, no. 6, June 2023, Art. no. 714. DOI: https://doi.org/10.3390/bioengineering10060714

C. Matsoukas, J. F. Haslum, M. Söderberg, and K. Smith, "Is it Time to Replace CNNs with Transformers for Medical Images?" arXiv, 2021.

K. Kawadkar, "Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification." arXiv, 2025, https://doi.org/10.48550/ARXIV.2507.21156.

Y. Shen et al., "MoViT: Memorizing Vision Transformers for Medical Image Analysis." arXiv, 2023. DOI: https://doi.org/10.1007/978-3-031-45676-3_21

A. Muhammad, Q. Jin, O. Elwasila, and Y. Gulzar, "Hybrid Deep Learning Architecture with Adaptive Feature Fusion for Multi-Stage Alzheimer's Disease Classification," Brain Sciences, vol. 15, no. 6, June 2025, Art. no. 612. DOI: https://doi.org/10.3390/brainsci15060612

H. Yan, V. Mubonanyikuzo, T. E. Komolafe, L. Zhou, T. Wu, and N. Wang, "Hybrid-RViT: Hybridizing ResNet-50 and Vision Transformer for Enhanced Alzheimer's disease detection," PLOS ONE, vol. 20, no. 2, Feb. 2025, Art. no. e0318998. DOI: https://doi.org/10.1371/journal.pone.0318998

S. Khanapur, J. S. Nayak, B. S. Rajeshwari, M. Namratha, C. B. Bharadwaj, and R. Bhardwaj, "SHAP-Based Explainability for Local and Global Insights in Alzheimer's Detection," Engineering, Technology & Applied Science Research, vol. 16, no. 1, pp. 30940–30947, Feb. 2026. DOI: https://doi.org/10.48084/etasr.13932

E. N. Marzban, A. M. Eldeib, I. A. Yassine, Y. M. Kadah, and for the Alzheimer's Disease Neurodegenerative Initiative, "Alzheimer's disease diagnosis from diffusion tensor images using convolutional neural networks," PLOS ONE, vol. 15, no. 3, Mar. 2020, Art. no. e0230409. DOI: https://doi.org/10.1371/journal.pone.0230409

A. Payan and G. Montana, "Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks." arXiv, 2015.

J. Wen et al., "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation," Medical Image Analysis, vol. 63, July 2020, Art. no. 101694. DOI: https://doi.org/10.1016/j.media.2020.101694

A. Dosovitskiy et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv, 2020.

R. Kadri, B. Bouaziz, M. Tmar, and F. Gargouri, "Multimodal deep learning based on the combination of EfficientNetV2 and ViT for Alzheimer's disease early diagnosis enhanced by SAGAN data augmentation," International Journal of Computer Information Systems and Industrial Management Applications, vol. 14, pp. 313–325, May 2022.

M. Baniata, S. Abuowaida, M. Aljaidi, M. Kharabsheh, A. Alsarhan, and A. A. Alsuwaylimi, "A Multi-Modal Attention-Guided Network for Alzheimer's Disease Classification Using Deep Learning," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27150–27158, Oct. 2025. DOI: https://doi.org/10.48084/etasr.12510

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, June 2002. DOI: https://doi.org/10.1613/jair.953

M. B. Khatooni and M. Soryani, "EffNetViTLoRA: An Efficient Hybrid Deep Learning Approach for Alzheimer's Disease Diagnosis." arXiv, Aug. 2025.

N. Shaffi, V. Viswan, and M. Mahmud, "Ensemble of vision transformer architectures for efficient Alzheimer's Disease classification," Brain Informatics, vol. 11, no. 1, Dec. 2024, Art. no. 25. DOI: https://doi.org/10.1186/s40708-024-00238-7

C. R. Jack et al., "The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods," Journal of Magnetic Resonance Imaging, vol. 27, no. 4, pp. 685–691, Apr. 2008. DOI: https://doi.org/10.1002/jmri.21049