BiPoP: Bipolar Disorder Optimized Preprocessing Framework for Stress Disorder Identification through Gene Expression Data using Deep Learning
Received: 30 November 2024 | Revised: 22 January 2025 and 5 February 2025 | Accepted: 7 February 2025 | Online: 3 April 2025
Corresponding author: M. Sudha
Abstract
Gene expression data are widely used in diagnosing diseases and identifying promising genes with the advancement in computational tools in biology. Gene Expression Omnibus (GEO) datasets provide the gene expression data for various diseases and disorders. For Bipolar Disorder, GSE46449 was obtained from the NCBI data repository. This study aimed to classify control (Normal) and case (Disordered) individuals from samples using Machine Learning (ML)/Deep Learning (DL) models. The preprocessing involved the removal of null values and normalization of gene expression values using R. The second step focussed on the selection of optimal features/genes from the gene expression dataset. The Pearson Correlation Coefficient (PCC) along with Principal Component Analysis (PCA) were used for feature selection. The samples were then classified using ML/DL models. A Multi-Layer Perceptron (MLP) was used to validate the optimal feature set to classify healthy and disordered individuals. The proposed Bipolar Disorder Preprocessing Framework (BiPoP) was validated for its targeted use, highlighting its multifunctional and fine-tuned approach to preprocessing and achieving a classification accuracy of 98.9%.
Keywords:
gene expression data, machine learning, feature selection, bipolar disorder, multilayer perceptronDownloads
References
T. Tekin Erguzel, C. Tas, and M. Cebi, "A wrapper-based approach for feature selection and classification of major depressive disorder–bipolar disorders," Computers in Biology and Medicine, vol. 64, pp. 127–137, Sep. 2015.
Z. Li, D. Li, and X. Chen, "Characterizing the polygenic overlaps of bipolar disorder subtypes with schizophrenia and major depressive disorder," Journal of Affective Disorders, vol. 309, pp. 242–251, Jul. 2022.
D. Bassett, "Borderline personality disorder and bipolar affective disorder. Spectra or spectre? A review," Australian & New Zealand Journal of Psychiatry, vol. 46, no. 4, pp. 327–339, Apr. 2012.
D. Q. Zeebaree, H. Haron, and A. M. Abdulazeez, "Gene Selection and Classification of Microarray Data Using Convolutional Neural Network," in 2018 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, Oct. 2018, pp. 145–150.
S. Osama, H. Shaban, and A. A. Ali, "Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review," Expert Systems with Applications, vol. 213, Mar. 2023, Art. no. 118946.
R. Tabares-Soto, S. Orozco-Arias, V. Romero-Cano, V. S. Bucheli, J. L. Rodríguez-Sotelo, and C. F. Jiménez-Varón, "A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data," PeerJ Computer Science, vol. 6, Apr. 2020, Art. no. e270.
A. Wahid and M. T. Banday, "Classification of DNA microarray gene expression Leukaemia data through ABC and CNN method," International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 7s, pp. 119–131, 2023.
N. Alromema, A. H. Syed, and T. Khan, "A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data," Diagnostics, vol. 13, no. 4, Jan. 2023, Art. no. 708.
K. Sekaran and M. Sudha, "Diagnostic Gene Biomarker Selection for Alzheimer’s Classification using Machine Learning," International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 12, pp. 2348–2352, Oct. 2019.
A. El-Gawady, M. A. Makhlouf, B. S. Tawfik, and H. Nassar, "Machine Learning Framework for the Prediction of Alzheimer’s Disease Using Gene Expression Data Based on Efficient Gene Selection," Symmetry, vol. 14, no. 3, Mar. 2022, Art. no. 491.
S. Koppad, A. Basava, K. Nash, G. V. Gkoutos, and A. Acharjee, "Machine Learning-Based Identification of Colon Cancer Candidate Diagnostics Genes," Biology, vol. 11, no. 3, Mar. 2022, Art. no. 365.
N. B. Hiremath and P. Dayananda, "Differential Gene Expression Analysis of Non-Small Cell Lung Cancer Samples to Classify Candidate Genes," Engineering, Technology & Applied Science Research, vol. 13, no. 2, pp. 10571–10577, Apr. 2023.
G. Anurekha and P. Geetha, "An Intelligent Hybrid Ensemble Gene Selection Model for Autism Using DNN," Intelligent Automation & Soft Computing, vol. 35, no. 3, pp. 3049–3064, 2023.
N. Bhandari, R. Walambe, K. Kotecha, and M. Kaliya, "Integrative gene expression analysis for the diagnosis of Parkinson’s disease using machine learning and explainable AI," Computers in Biology and Medicine, vol. 163, Sep. 2023, Art. no. 107140.
A. El-Gawady, B. S. Tawfik, and M. A. Makhlouf, "Hybrid Feature Selection Method for Predicting Alzheimer’s Disease Using Gene Expression Data," Computers, Materials & Continua, vol. 74, no. 3, pp. 5559–5572, 2023.
H. K. Joon, A. Thalor, and D. Gupta, "Machine learning analysis of lung squamous cell carcinoma gene expression datasets reveals novel prognostic signatures," Computers in Biology and Medicine, vol. 165, Oct. 2023, Art. no. 107430.
Y. Zhu, T. Li, and W. Li, "An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data," Computational Intelligence and Neuroscience, vol. 2022, no. 1, 2022, Art. no. 1452301.
L. Koumakis, "Deep learning models in genomics; are we there yet?," Computational and Structural Biotechnology Journal, vol. 18, pp. 1466–1473, Jan. 2020.
P. K. Mallick, S. K. Mohapatra, G. S. Chae, and M. N. Mohanty, "Convergent learning–based model for leukemia classification from gene expression," Personal and Ubiquitous Computing, vol. 27, no. 3, pp. 1103–1110, Jun. 2023.
M. Arabfard, M. Ohadi, V. Rezaei Tabar, A. Delbari, and K. Kavousi, "Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach," BMC Genomics, vol. 20, no. 1, Nov. 2019, Art. no. 832.
S. Sasikala, S. A. alias Balamurugan, and S. Geetha, "A Novel Feature Selection Technique for Improved Survivability Diagnosis of Breast Cancer," Procedia Computer Science, vol. 50, pp. 16–23, Jan. 2015.
C. L. Clelland, L. L. Read, L. J. Panek, R. H. Nadrich, C. Bancroft, and J. D. Clelland, "Utilization of Never-Medicated Bipolar Disorder Patients towards Development and Validation of a Peripheral Biomarker Profile," PLOS ONE, vol. 8, no. 6, 2013, Art. no. e69082.
R. Zhou, S. K. Ng, J. J. Y. Sung, W. W. B. Goh, and S. H. Wong, "Data pre-processing for analyzing microbiome data – A mini review," Computational and Structural Biotechnology Journal, vol. 21, pp. 4804–4815, Jan. 2023.
M. W. Libbrecht and W. S. Noble, "Machine learning applications in genetics and genomics," Nature Reviews Genetics, vol. 16, no. 6, pp. 321–332, Jun. 2015.
C. Jayaweera and N. Aziz, "Reliability of Principal Component Analysis and Pearson Correlation Coefficient, for Application in Artificial Neural Network Model Development, for Water Treatment Plants," IOP Conference Series: Materials Science and Engineering, vol. 458, no. 1, Sep. 2018, Art. no. 012076.
F. Alharbi and A. Vakanski, "Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review," Bioengineering, vol. 10, no. 2, Feb. 2023, Art. no. 173.
D. Kang et al., "StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis," BMC Genomics, vol. 20, no. 11, Dec. 2019, Art. no. 949.
Downloads
How to Cite
License
Copyright (c) 2025 M. Sudha, M. Sarala Shobini

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.