AugCXR Dataset: An Augmented Chest X-Ray Image Dataset for Robust Deep Learning Pneumonia Diagnosis
Received: 14 November 2025 | Revised: 6 January 2026, 16 February 2026, and 26 February 2026 | Accepted: 28 February 2026 | Online: 6 June 2026
Corresponding author: Tahir Mehmood
Abstract
Chest X-ray (CXR) imaging is commonly used to detect pneumonia, but reliance on expert radiologists may delay diagnosis and increase costs. The shortage of radiologists and the risk of diagnostic errors highlight the need for automated solutions. Deep learning models using Convolutional Neural Networks (CNNs) have shown potential in computer-aided diagnosis of pneumonia from CXR images. Most studies use the publicly available dataset from Guangzhou Women and Children’s Hospital, which includes CXR images of children aged 1 to 5. However, the dataset is imbalanced, with more pneumonia cases than normal. This imbalance may affect model performance and generalizability. This study proposes a geometric data augmentation method using five transformations: rotation, width-shift, height-shift, zoom, and brightness to balance the dataset and improve model accuracy. The proposed Augmented Chest X-ray (AugCXR) dataset was validated using three widely adopted architectures: Improved Visual Geometry Group-13 (IVGG13), MobileNetV2, and EfficientNetV2L. The results demonstrate that the proposed augmentation method enhances classification performance across all three pretrained deep learning models.
Keywords:
pneumonia, chest X-ray, data augmentation, deep learningReferences
"Pneumonia in Children," World Health Organization, Nov. 2022. https://www.who.int/news-room/fact-sheets/detail/pneumonia.
S. Andronikou et al., "Guidelines for the Use of Chest Radiographs in Community-Acquired Pneumonia in Children and Adolescents," Pediatric Radiology, vol. 47, no. 11, pp. 1405–1411, Oct. 2017.
K. Pink, I. Mitchell, and H. Davies, "P17 The Accuracy of a Diagnosis of Pneumonia in a UK Teaching Hospital," Thorax, vol. 67, no. Suppl 2, Dec. 2012, Art. no. A71.
H. N. T. Al-Azzawi et al., "Utilization of a Deep Convolutional Neural Network for the Binary Classification of Chest X-Ray Pneumonia," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 20471–20483, Feb. 2025.
T. Mehmood, A. E. Gerevini, A. Lavelli, M. Olivato, and I. Serina, "Distilling Knowledge with a Teacher’s Multitask Model for Biomedical Named Entity Recognition," Information, vol. 14, no. 5, Apr. 2023, Art. no. 255.
A. M. Nababan et al., "Extreme Learning Machine Approach on Heart Abnormalities Identification in ECG Images," International Journal of Electronics and Telecommunications, pp. 473–480, Jun. 2024.
N. C. Kundur, B. C. Anil, P. M. Dhulavvagol, R. Ganiger, and B. Ramadoss, "Pneumonia Detection in Chest X-Rays Using Transfer Learning and TPUs," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11878–11883, Oct. 2023.
T. Rahman et al., "Transfer Learning with Deep Convolutional Neural Network (CNN) for Pneumonia Detection Using Chest X-ray," Applied Sciences, vol. 10, no. 9, May 2020, Art. no. 3233.
T. Kaur and T. K. Gandhi, "Automated Brain Image Classification Based on VGG-16 and Transfer Learning," in International Conference on Information Technology, Bhubaneswar, India, Dec. 2019, pp. 94–98.
M. Ali et al., "Pneumonia Detection Using Chest Radiographs with Novel EfficientNetV2L Model," IEEE Access, vol. 12, pp. 34691–34707, 2024.
Z.-P. Jiang, Y.-Y. Liu, Z.-E. Shao, and K.-W. Huang, "An Improved VGG16 Model for Pneumonia Image Classification," Applied Sciences, vol. 11, no. 23, Nov. 2021, Art. no. 11185.
M. Elgendi et al., "The Effectiveness of Image Augmentation in Deep Learning Networks for Detecting COVID-19: A Geometric Transformation Perspective," Frontiers in Medicine, vol. 8, Mar. 2021, Art. no. 629134.
B. Jing and Y. Du, "DTDG-Net: A Few-shot Data Augmentation Method for X-ray Security Inspection Images," in International Conference on Image Processing, Computer Vision and Machine Learning, Shenzhen, China, Nov. 2024, pp. 493–502.
M. M. A. Monshi, J. Poon, V. Chung, and F. M. Monshi, "CovidXrayNet: Optimizing Data Augmentation and CNN Hyperparameters for Improved COVID-19 Detection from CXR," Computers in Biology and Medicine, vol. 133, Jun. 2021, Art. no. 104375.
S. Kora Venu and S. Ravula, "Evaluation of Deep Convolutional Generative Adversarial Networks for Data Augmentation of Chest X-ray Images," Future Internet, vol. 13, no. 1, Dec. 2020, Art. no. 8.
M. Moradi, A. Madani, A. Karargyris, and T. F. Syeda-Mahmood, "Chest X-Ray Generation and Data Augmentation for Cardiovascular Abnormality Classification," in Medical Imaging 2018: Image Processing, Houston, TX, United States, Mar. 2018, Art. no. 57.
S. Motamed, P. Rogalla, and F. Khalvati, "Data Augmentation Using Generative Adversarial Networks (GANs) for GAN-Based Detection of Pneumonia and COVID-19 in Chest X-Ray Images," Informatics in Medicine Unlocked, vol. 27, 2021, Art. no. 100779.
Y. Pamungkas, M. R. N. Ramadani, and E. N. Njoto, "Effectiveness of CNN Architectures and SMOTE to Overcome Imbalanced X-Ray Data in Childhood Pneumonia Detection," Journal of Robotics and Control, vol. 5, no. 3, pp. 775–785, Apr. 2024.
A. Alqahtani, Q. Abu Al‐Haija, A. A. Alsulami, B. Alturki, N. Alqahtani, and R. Alsini, "Optimizing Chest Tuberculosis Image Classification with Oversampling and Transfer Learning," IET Image Processing, vol. 18, no. 5, pp. 1109–1118, Apr. 2024.
E. Chamseddine, N. Mansouri, M. Soui, and M. Abed, "Handling Class Imbalance in COVID-19 Chest X-Ray Images Classification: Using SMOTE and Weighted Loss," Applied Soft Computing, vol. 129, Nov. 2022, Art. no. 109588.
K. Koonsanit, S. Thongvigitmanee, N. Pongnapang, and P. Thajchayapong, "Image Enhancement on Digital X-Ray Images Using N-CLAHE," in 2017 10th Biomedical Engineering International Conference (BMEiCON), Hokkaido, Japan, Aug. 2017, pp. 1–4.
S. Saifullah and R. Dreżewski, "Modified Histogram Equalization for Improved CNN Medical Image Segmentation," Procedia Computer Science, vol. 225, pp. 3021–3030, 2023.
M. Sharma and D. Kumar, "Comparative Analysis of Image Enhancement Techniques for Chest X-ray Images," in International Conference on Computational Intelligence and Sustainable Engineering Solutions, Greater Noida, India, May 2022, pp. 130–135.
K. Munir, M. Usama Tanveer, H. J. Alyamani, A. Bermak, and A. Ur Rehman, "PneuX-Net: An Enhanced Feature Extraction and Transformation Approach for Pneumonia Detection in X-Ray Images," IEEE Access, vol. 13, pp. 84024–84037, 2025.
D. Kermany, "Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification." Mendeley, Jan. 06, 2018, [Online]. Available: https://data.mendeley.com/datasets/rscbjbr9sj/2.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun. 2018, pp. 4510–4520.
M. Tan and Q. Le, "EfficientNetV2: Smaller Models and Faster Training," in Proceedings of the 38th International Conference on Machine Learning, Virtual, Jul. 2021, Art. no. 139.
Downloads
How to Cite
License
Copyright (c) 2026 Waqar Ahmad, Deepak Panday, Muhammad Ibrahim, Tahir Mehmood, Muhammad Yaqoob

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
