A Machine Learning Approach for Malware Detection in Database Systems

Abdulalem Ali; Arafat Al-Dhaqm; NZ Jhanjhi; Shukor Abd Razak; Doaa M. Bamasoud

doi:10.48084/etasr.18568

Authors

Abdulalem Ali Institute of Computer Science and Digital Innovation, UCSI University, Federal Territory of Kuala Lumpur, Malaysia
Arafat Al-Dhaqm School of Computer Science (SCS), Center for Intelligent and Innovation (CII), Taylor's University, Subang Jaya, Malaysia
NZ Jhanjhi School of Computer Science (SCS), Center for Intelligent and Innovation (CII), Taylor's University, Subang Jaya, Malaysia
Shukor Abd Razak Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Malaysia
Doaa M. Bamasoud Department of Information Systems & Cyber Security, College of Computing and Information Technology, University of Bisha, Bisha, Saudi Arabia

Volume: 16 | Issue: 3 | Pages: 36514-36522 | June 2026 | https://doi.org/10.48084/etasr.18568

Received: 6 March 2026 | Revised: 31 March 2026, 18 April 2026, 20 April 2026, and 22 April 2026 | Accepted: 23 April 2026 | Online: 6 June 2026

Corresponding author: Arafat Al-Dhaqm

Abstract

The proliferation of malware over the past decade has resulted in significant business losses for many organizations. The increasing speed, volume, and complexity of malware make it progressively more difficult for the anti-malware community to detect and eliminate threats. In recent years, researchers and antivirus companies have begun applying Machine Learning (ML) and Deep Learning (DL) methods to malware recognition and analysis. This study proposes the Machine Learning Approach to Detect Malware in Database Systems (MLADMDBS), an ML-based malware detection system for databases. The proposed system employs a two-phase framework that combines offline model training with online real-time detection. The Random Forest classifier, trained on 18 handcrafted features derived from SQL queries and user-behavior data, achieves high performance on a temporally separated test set of 3,000 database activities, with accuracy, precision, and recall exceeding 99.8%. These results, obtained on a controlled synthetic dataset, represent an upper-bound estimate; real-world performance on production traffic is expected to differ and will be validated in future work. Per-attack-type analysis confirms strong detection across SQL injection, data exfiltration, and database structure manipulation categories. With an average inference latency of 0.8 ms per query, the system is well-suited for real-time monitoring of database activity and automated threat response.

Keywords:

database systems, Machine Learning (ML), Random Forest classifier

References

M. D. Shelar and S. S. Rao, "Enhanced capsule network-based executable files malware detection and classification—deep learning approach," Concurrency and Computation: Practice and Experience, vol. 36, no. 4, Feb. 2024, Art. no. e7928.

X. Zhang, K. Wu, Z. Chen, and C. Zhang, "MalCaps: A Capsule Network Based Model for the Malware Classification," Processes, vol. 9, no. 6, June 2021, Art. no. 929.

R. Nazir et al., "A review on machine learning techniques for network security," Journal of Cyber Security Technology, vol. 10, no. 1, pp. 1–45, Mar. 2025.

A. M. Thomas, B. Abraham, and A. B. Sagar, "Database Security And Integrity: Ensuring Reliable And Secure Data Management," Journal of Advanced Database Management & Systems, vol. 11, no. 3, pp. 9–19, Aug. 2024.

M. Malik and T. Patel, "Database Security - Attacks and Control Methods," International Journal of Information Sciences and Techniques, vol. 6, no. 1/2, pp. 175–183, Mar. 2016.

M. A. O. A. Mhara, A. A. A. Abdulrahman, and A. A. S. Baroud, "Cyber Attacks And Threats: Study Of The Types Of Cyber Attacks: Hacking, Viruses, Targeted Attacks, And Electronic Espionage," International Journal of Electrical Engineering and Sustainability, vol. 2, no. 4, pp. 38–47, Dec. 2024.

E. K. Sahin, "Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost," Stochastic Environmental Research and Risk Assessment, vol. 37, no. 3, pp. 1067–1092, Mar. 2023.

T. Lu, Y. Du, L. Ouyang, Q. Chen, and X. Wang, "Android Malware Detection Based on a Hybrid Deep Learning Model," Security and Communication Networks, vol. 2020, no. 1, Aug. 2020, Art. no. 8863617.

E. Snow, M. Alam, A. Glandon, and K. Iftekharuddin, "End-to-end Multimodel Deep Learning for Malware Classification," in 2020 International Joint Conference on Neural Networks, Glasgow, UK, 2020, pp. 1–7.

R. Chaganti, V. Ravi, and T. D. Pham, "A multi-view feature fusion approach for effective malware classification using Deep Learning," Journal of Information Security and Applications, vol. 72, Feb. 2023, Art. no. 103402.

K. Rieck, T. Holz, C. Willems, P. Düssel, and P. Laskov, "Learning and Classification of Malware Behavior," in 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Paris, France, 2008, pp. 108–125.

C. Irawan, T. Mantoro, and M. A. Ayu, "Malware Detection and Classification Model Using Machine Learning Random Forest Approach," in 2021 IEEE 7th International Conference on Computing, Engineering and Design, Sukabumi, Indonesia, 2021, pp. 1–5.

U. Garg, N. Sharma, M. Kumar, and A. Singh, "Identification and Detection of Behavior Based Malware using Machine Learning," in 2023 International Conference on Artificial Intelligence and Smart Communication, Greater Noida, India, 2023, pp. 915–918.

T. Nelson, A. O’Brien, and C. Noteboom, "Machine Learning Applications in Malware Classification: A Meta-Analysis Literature Review," International Journal on Cybernetics & Informatics, vol. 12, no. 1, pp. 1–12, Jan. 2023.

M. Kalash, M. Rochan, N. Mohammed, N. Bruce, Y. Wang, and F. Iqbal, "A Deep Learning Framework for Malware Classification," International Journal of Digital Crime and Forensics, vol. 12, no. 1, pp. 90–108, 2020.

R. Mitsuhashi and T. Shinagawa, "Exploring Optimal Deep Learning Models for Image-based Malware Variant Classification," in 2022 IEEE 46th Annual Computers, Software, and Applications Conference, Los Alamitos, CA, USA, 2022, pp. 779–788.

M. Sewak, S. K. Sahay, and H. Rathore, "An investigation of a deep learning based malware detection system," in Proceedings of the 13th International Conference on Availability, Reliability and Security, Hamburg, Germany, 2018, pp. 1–5.

A. F. Alshmarni and M. A. Alliheedi, "Enhancing Malware Detection by Integrating Machine Learning with Cuckoo Sandbox," Journal of Information Security and Cybercrimes Research, vol. 7, no. 1, pp. 85–92, June 2024.

M. S. Akhtar and T. Feng, "Malware Analysis and Detection Using Machine Learning Algorithms," Symmetry, vol. 14, no. 11, Nov. 2022, Art. no. 2304.

S. Zhang, M. Gao, L. Wang, S. Xu, W. Shao, and R. Kuang, "A Malware-Detection Method Using Deep Learning to Fully Extract API Sequence Features," Electronics, vol. 14, no. 1, Jan. 2025, Art. no. 167.

W. Almobaideen, O. Abu Alghanam, M. Abdullah, S. B. Hussain, and U. Alam, "Comprehensive review on machine learning and deep learning techniques for malware detection in android and IoT devices," International Journal of Information Security, vol. 24, no. 3, Apr. 2025, Art. no. 110.

A.-A. Al-Maari, M. Abdulnabi, Y. Nathan, A. Ali, U. Ali, and M. Khan, "Optimized Credit Card Fraud Detection Leveraging Ensemble Machine Learning Methods," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22287–22294, June 2025.

"Stack Exchange Data Explorer: Public query tool for Stack Exchange sites." Stack Exchange Inc. https://data.stackexchange.com/.

"TPC-H Benchmark Specification, Revision 3.0.1." Transaction Processing Performance Council. https://www.tpc.org/tpch/.

Swisskyrepo. "PayloadsAllTheThings/SQL Injection." GitHub Repository. https://github.com/swisskyrepo/PayloadsAllTheThings/tree/master/SQL%20Injection.

"OWASP Web Security Testing Guide v4.2." OWASP Foundation. https://owasp.org/www-project-web-security-testing-guide/.

M. A. O. Mullick, R. R. Ratul, S. B. Sharif, S. J. Anannaya, and M. M. A. Shibly, "Rule-Based SQL Injection (RbSQLi) Dataset." Mendeley Data, Sept. 29, 2025.