Amalgamating Ensemble Machine Learning Soft Voting Classifier, SMOTE, and Pearson's Correlation Coefficient for Enhanced Malware Detection

Authors

  • Mustafa Jumaah Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq
  • Ali A. Yassin Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq
  • Zaid Ameen Abduljabbar Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq | Department of Business Management, Al-Imam University College, Balad 34011, Iraq
  • Muwafaq Jawad Directorate General of Education Basra, Ministry of Education, Basra 61001, Iraq
  • Vincent Omollo Nyangaresi Department of Computer Science and Software Engineering, Jaramogi Oginga Odinga University of Science and Technology, Bondo 40601, Kenya | Department of Applied Electronics, Saveetha School of Engineering, SIMATS, Chennai, Tamil Nadu 602105, India
  • Ali Hassan Ali Department of Mathematics, College of Education for Pure Sciences, University of Basrah, Basrah 61004, Iraq | Technical Engineering College, Al-Ayen University, Thi-Qar 64001, Iraq | Institute of Mathematics, University of Debrecen, Pf. 400, H-4002 Debrecen, Hungary
Volume: 15 | Issue: 3 | Pages: 22746-22752 | June 2025 | https://doi.org/10.48084/etasr.10420

Abstract

Obfuscated malware poses a significant threat to personal and IoT devices, and traditional detection methods often face significant challenges and weaknesses in their capabilities and performance. This study proposes a malware detection approach using Machine Learning (ML) algorithms and a soft voting ensemble technique, enhanced by the Pearson's correlation coefficient for feature selection on the CIC-MalMem-2022 dataset. It addresses data imbalances with the Synthetic Minority Oversampling Technique (SMOTE) method and employs various ML classifiers. The results demonstrate improved accuracy, precision, and recall in malware detection compared to single classifiers and traditional methods. The research model is evaluated using a confusion matrix and evaluation metrics, and achieves 99.99% accuracy rate, 99.99% classification rate, 99.99% precision rate, 99.99% recall rate and 99.99% F1 score, surpassing the results of previous studies. These results indicate that the combination of feature selection and ensemble learning can significantly improve the efficiency and security of high-performance malware prediction systems, paving the way for advanced threat mitigation strategies.

Keywords:

machine learning, malware detection, SMOTE, IoT, feature selection

Downloads

Download data is not yet available.

References

M. Lombardi, F. Pascale, and D. Santaniello, "Internet of Things: A General Overview between Architectures, Protocols and Applications," Information, vol. 12, no. 2, Feb. 2021, Art. no. 87. DOI: https://doi.org/10.3390/info12020087

M. Shafiq, Z. Gu, O. Cheikhrouhou, W. Alhakami, and H. Hamam, "The Rise of ‘Internet of Things’: Review and Open Research Issues Related to Detection and Prevention of IoT-Based Security Attacks," Wireless Communications and Mobile Computing, vol. 2022, no. 1, Aug. 2022, Art. no. 8669348. DOI: https://doi.org/10.1155/2022/8669348

A. E. Omolara et al., "The internet of things security: A survey encompassing unexplored areas and new insights," Computers & Security, vol. 112, Jan. 2022, Art. no. 102494. DOI: https://doi.org/10.1016/j.cose.2021.102494

K. Aldriwish, "A Deep Learning Approach for Malware and Software Piracy Threat Detection," Engineering, Technology & Applied Science Research, vol. 11, no. 6, pp. 7757–7762, Dec. 2021. DOI: https://doi.org/10.48084/etasr.4412

R. Sridharan and S. Domnic, "Network policy aware placement of tasks for elastic applications in IaaS-cloud environment," Cluster Computing, vol. 24, no. 2, pp. 1381–1396, Jun. 2021. DOI: https://doi.org/10.1007/s10586-020-03194-z

M. A. Mohammed, M. A. Hussain, Z. A. Oraibi, Z. A. Abduljabbar, and V. O. Nyangaresi, "Secure Content Based Image Retrieval System Using Deep Learning," Basrah Researches Sciences, vol. 49, no. 2, pp. 94–111, Dec. 2023. DOI: https://doi.org/10.56714/bjrs.49.2.9

J.-P. A. Yaacoub, H. N. Noura, O. Salman, and A. Chehab, "Robotics cyber security: vulnerabilities, attacks, countermeasures, and recommendations," International Journal of Information Security, vol. 21, no. 1, pp. 115–158, Feb. 2022. DOI: https://doi.org/10.1007/s10207-021-00545-8

S. Abdelkader et al., "Securing modern power systems: Implementing comprehensive strategies to enhance resilience and reliability against cyber-attacks," Results in Engineering, vol. 23, Sep. 2024, Art. no. 102647. DOI: https://doi.org/10.1016/j.rineng.2024.102647

I. H. Sarker, A. I. Khan, Y. B. Abushark, and F. Alsolami, "Internet of Things (IoT) Security Intelligence: A Comprehensive Overview, Machine Learning Solutions and Research Directions," Mobile Networks and Applications, vol. 28, no. 1, pp. 296–312, Feb. 2023. DOI: https://doi.org/10.1007/s11036-022-01937-3

G. Sharma, S. Vidalis, N. Anand, C. Menon, and S. Kumar, "A Survey on Layer-Wise Security Attacks in IoT: Attacks, Countermeasures, and Open-Issues," Electronics, vol. 10, no. 19, Oct. 2021, Art. no. 2365. DOI: https://doi.org/10.3390/electronics10192365

A. Al-Marghilani, "Comprehensive Analysis of IoT Malware Evasion Techniques," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7495–7500, Aug. 2021. DOI: https://doi.org/10.48084/etasr.4296

Gopinath and S. C. Sethuraman, "A comprehensive survey on deep learning based malware detection techniques," Computer Science Review, vol. 47, Feb. 2023, Art. no. 100529. DOI: https://doi.org/10.1016/j.cosrev.2022.100529

M. Aqeel, F. Ali, M. W. Iqbal, T. A. Rana, M. Arif, and Md. R. Auwul, "A Review of Security and Privacy Concerns in the Internet of Things (IoT)," Journal of Sensors, vol. 2022, no. 1, Sep. 2022, Art. no. 5724168. DOI: https://doi.org/10.1155/2022/5724168

N. K. Gyamfi, N. Goranin, D. Ceponis, and H. A. Čenys, "Automated System-Level Malware Detection Using Machine Learning: A Comprehensive Review," Applied Sciences, vol. 13, no. 21, Nov. 2023, Art. no. 11908. DOI: https://doi.org/10.3390/app132111908

E. Nowroozi, A. Dehghantanha, R. M. Parizi, and K.-K. R. Choo, "A survey of machine learning techniques in adversarial image forensics," Computers & Security, vol. 100, Jan. 2021, Art. no. 102092. DOI: https://doi.org/10.1016/j.cose.2020.102092

M. Soni and D. K. Singh, "New directions for security attacks, privacy, and malware detection in WBAN," Evolutionary Intelligence, vol. 16, no. 6, pp. 1917–1934, Dec. 2023. DOI: https://doi.org/10.1007/s12065-022-00759-2

D. Gibert, C. Mateu, and J. Planes, "The rise of machine learning for detection and classification of malware: Research developments, trends and challenges," Journal of Network and Computer Applications, vol. 153, Mar. 2020, Art. no. 102526. DOI: https://doi.org/10.1016/j.jnca.2019.102526

M. J. J. Ghrabat, G. Ma, I. Y. Maolood, S. S. Alresheedi, and Z. A. Abduljabbar, "An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier," Human-centric Computing and Information Sciences, vol. 9, no. 1, Aug. 2019, Art. no. 31. DOI: https://doi.org/10.1186/s13673-019-0191-8

R. J. Mohammed et al., "A Robust Hybrid Machine and Deep Learning-based Model for Classification and Identification of Chest X-ray Images," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16212–16220, Oct. 2024. DOI: https://doi.org/10.48084/etasr.7828

K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, and H. Liu, "A Review of Android Malware Detection Approaches Based on Machine Learning," IEEE Access, vol. 8, pp. 124579–124607, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3006143

M. S. Khalefa et al., "Deep Sentiment Analysis System with Attention Mechanism for the COVID-19 Vaccine," TEM Journal, vol. 13, no. 2, pp. 1470–1480, May 2024. DOI: https://doi.org/10.18421/TEM132-61

H. M. Jasim et al., "Provably Efficient Multi-Cancer Image Segmentation Based on Multi-Class Fuzzy Entropy," Informatica, vol. 47, no. 8, pp. 77–88, Sep. 2023. DOI: https://doi.org/10.31449/inf.v47i8.4840

M. J. J. Ghrabat, G. Ma, Z. A. Abduljabbar, M. A. Al Sibahee, and S. J. Jassim, "Greedy Learning of Deep Boltzmann Machine (GDBM)’s Variance and Search Algorithm for Efficient Image Retrieval," IEEE Access, vol. 7, pp. 169142–169159, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2948266

N. Usman et al., "Intelligent Dynamic Malware Detection using Machine Learning in IP Reputation for Forensics Data Analytics," Future Generation Computer Systems, vol. 118, pp. 124–141, May 2021. DOI: https://doi.org/10.1016/j.future.2021.01.004

A. Mezina and R. Burget, "Obfuscated malware detection using dilated convolutional network," in 2022 14th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, Valencia, Spain, 2022, pp. 110–115. DOI: https://doi.org/10.1109/ICUMT57764.2022.9943443

H. Naeem, S. Dong, O. J. Falana, and F. Ullah, "Development of a deep stacked ensemble with process based volatile memory forensics for platform independent malware detection and classification," Expert Systems with Applications, vol. 223, Aug. 2023, Art. no. 119952. DOI: https://doi.org/10.1016/j.eswa.2023.119952

B. Taşcı, "Deep-Learning-Based Approach for IoT Attack and Malware Detection," Applied Sciences, vol. 14, no. 18, Sep. 2024, Art. no. 8505. DOI: https://doi.org/10.3390/app14188505

M. Rostami, K. Berahmand, E. Nasiri, and S. Forouzandeh, "Review of swarm intelligence-based feature selection methods," Engineering Applications of Artificial Intelligence, vol. 100, Apr. 2021, Art. no. 104210. DOI: https://doi.org/10.1016/j.engappai.2021.104210

J. Allgaier and R. Pryss, "Cross-Validation Visualized: A Narrative Guide to Advanced Methods," Machine Learning and Knowledge Extraction, vol. 6, no. 2, pp. 1378–1388, Jun. 2024. DOI: https://doi.org/10.3390/make6020065

Y. Zhang, H. Zhang, J. Cai, and B. Yang, "A Weighted Voting Classifier Based on Differential Evolution," Abstract and Applied Analysis, vol. 2014, no. 1, May 2014, Art. no. 376950. DOI: https://doi.org/10.1155/2014/376950

M. A. Khan et al., "Voting Classifier-Based Intrusion Detection for IoT Networks," in Advances on Smart and Soft Computing: Proceedings of ICACIn 2021, Casablanca, Morocco, 2021, pp. 313–328. DOI: https://doi.org/10.1007/978-981-16-5559-3_26

T. Carrier, P. Victor, A. Tekeoglu, and A. H. Lashkari, "Malware memory analysis (CIC-MalMem-2022)." Canadian Institute for Cybersecurity, UNB, 2022. [Online]. Available: https://www.unb.ca/cic/datasets/malmem-2022.html.

Downloads

How to Cite

[1]
M. Jumaah, A. A. Yassin, Z. A. Abduljabbar, M. Jawad, V. O. Nyangaresi, and A. H. Ali, “Amalgamating Ensemble Machine Learning Soft Voting Classifier, SMOTE, and Pearson’s Correlation Coefficient for Enhanced Malware Detection”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 3, pp. 22746–22752, Jun. 2025.

Metrics

Abstract Views: 220
PDF Downloads: 274

Metrics Information