Evaluating the Data Imputation Impact on Gradient Boosting Model Predictive Performance in Sensor Failure Recovery for Smart Irrigation Systems

Authors

  • Miftahul Walid Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Malang, Indonesia | Department of Informatics Engineering, Universitas Islam Madura, Pemekasan, Indonesia
  • Muhammad Ashar Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Malang, Indonesia
  • Heru Wahyu Herwanto Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Malang, Indonesia
Volume: 16 | Issue: 2 | Pages: 34452-34459 | April 2026 | https://doi.org/10.48084/etasr.17776

Abstract

The reliability of Internet of Things (IoT)-based irrigation systems depends heavily on the quality of data acquired from various sensors. In real-world applications, missing values and faulty data, caused by transmission faults or sensor failures, can severely compromise the application of Machine Learning (ML), particularly in critical tasks such as detecting pump failures. This study investigates the impact of missing data imputation on the performance of an Extreme Gradient Boosting (XGBoost) classifier under a controlled setting. Missing data were simulated at rates ranging from 5% to 30%, and the imputation techniques applied included Multiple Imputation by Chained Equations (MICE), k-Nearest Neighbors (KNN), Random Forest (RF), and iterative gradient-boosting methods. The quality of the imputed data was evaluated using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), whereas classifier performance was assessed using accuracy, precision, recall, F1-score, and Area Under the Receiver Operating Characteristic Curve (AUC). For 5% missing data, MICE achieved the lowest errors (MAE = 0.018, RMSE = 0.026), whereas at 30% missing data, errors increased (MAE = 0.072, RMSE = 0.094). Classifier performance declined with increasing missing data: accuracy decreased from 0.972 to 0.818 and recall from 0.961 to 0.742. The application of MICE mitigated this decline, maintaining an accuracy of 0.982 and recall of 0.975 at 5% missing data, and an accuracy of 0.863 and recall of 0.812 at 30% missing data, with AUC remaining above 0.880 in both cases. Notably, the imputation method with the lowest MAE and RMSE did not always produce the best classification performance, indicating that numerical precision does not necessarily translate into improved classification skill. These results highlight the importance of selecting imputation techniques based on the specific nature of the problem to preserve feature integrity and sustain the performance of Artificial Intelligence (AI)-based smart irrigation systems in real-world sensor environments.

Keywords:

data imputation, missing data, XGBoost, predictive maintenance, smart irrigation, sensor data quality, Machine Learning (ML), IoT systems

Downloads

Download data is not yet available.

References

Mokh. S. Hadi, P. Adi Nugraha, I. M. Wirawan, I. Ari Elbaith Zaeni, M. A. Mizar, and M. Irvan, "IoT Based Smart Garden Irrigation System," in 2020 4th International Conference on Vocational Education and Training, Malang, Indonesia, 2020, pp. 361–365. DOI: https://doi.org/10.1109/ICOVET50258.2020.9230197

D. Kurniawan, R. J. Putra, A. Bella, M. Ashar, and K. Dedes, "Smart Garden with IoT Based Real Time Communication using MQTT Protocol," in 2021 7th International Conference on Electrical, Electronics and Information Engineering, Malang, Indonesia, 2021, pp. 1–5. DOI: https://doi.org/10.1109/ICEEIE52663.2021.9616869

L. García, L. Parra, J. M. Jimenez, J. Lloret, and P. Lorenz, "IoT-Based Smart Irrigation Systems: An Overview on the Recent Trends on Sensors and IoT Systems for Irrigation in Precision Agriculture," Sensors, vol. 20, no. 4, Feb. 2020, Art. no. 1042. DOI: https://doi.org/10.3390/s20041042

J. Ludeña-Choez, J. J. Choquehuanca-Zevallos, and E. Mayhua-López, "Sensor nodes fault detection for agricultural wireless sensor networks based on NMF," Computers and Electronics in Agriculture, vol. 161, pp. 214–224, June 2019. DOI: https://doi.org/10.1016/j.compag.2018.06.033

A. H. Blasi, M. A. Abbadi, and R. Al-Huweimel, "Machine Learning Approach for an Automatic Irrigation System in Southern Jordan Valley," Engineering, Technology & Applied Science Research, vol. 11, no. 1, pp. 6609–6613, Feb. 2021. DOI: https://doi.org/10.48084/etasr.3944

A. Gaddam, T. Wilkin, M. Angelova, and J. Gaddam, "Detecting Sensor Faults, Anomalies and Outliers in the Internet of Things: A Survey on the Challenges and Solutions," Electronics, vol. 9, no. 3, Mar. 2020, Art. no. 511. DOI: https://doi.org/10.3390/electronics9030511

A. R. de A. Zanella, E. da Silva, and L. C. P. Albini, "CEIFA: A multi-level anomaly detector for smart farming," Computers and Electronics in Agriculture, vol. 202, Nov. 2022, Art. no. 107279. DOI: https://doi.org/10.1016/j.compag.2022.107279

R. Benameur, A. Dahane, B. Kechar, and A. E. H. Benyamina, "An Innovative Smart and Sustainable Low-Cost Irrigation System for Anomaly Detection Using Deep Learning," Sensors, vol. 24, no. 4, Feb. 2024, Art. no. 1162. DOI: https://doi.org/10.3390/s24041162

R. Sahu and P. Tripathi, "A novel data healing framework for outlier detection and recovery approach in the internet of agriculture things sensor network," Quality & Quantity, Oct. 2025. DOI: https://doi.org/10.1007/s11135-025-02402-5

B. Agbo, H. Al-Aqrabi, R. Hill, and T. Alsboui, "Missing Data Imputation in the Internet of Things Sensor Networks," Future Internet, vol. 14, no. 5, May 2022, Art. no. 143. DOI: https://doi.org/10.3390/fi14050143

F. Lalande and K. Doya, "Numerical Data Imputation: Choose kNN over Deep Learning," in 5th International Conference on Similarity Search and Applications, Bologna, Italy, 2022, pp. 3–10. DOI: https://doi.org/10.1007/978-3-031-17849-8_1

Y.-F. Zhang, P. J. Thorburn, W. Xiang, and P. Fitch, "SSIM—A Deep Learning Approach for Recovering Missing Time Series Sensor Data," IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6618–6628, Aug. 2019. DOI: https://doi.org/10.1109/JIOT.2019.2909038

F. Chen, D. Wang, S. Lei, J. He, Y. Fu, and C.-T. Lu, "Adaptive graph convolutional imputation network for environmental sensor data recovery," Frontiers in Environmental Science, vol. 10, Nov. 2022, Art. no. 1025268. DOI: https://doi.org/10.3389/fenvs.2022.1025268

D. Jiang and S. Zhang, "An explainable missing data imputation method and its application in soft sensing," Measurement, vol. 253, Sept. 2025, Art. no. 117692. DOI: https://doi.org/10.1016/j.measurement.2025.117692

F. T. Teshome, H. K. Bayabil, B. Schaffer, Y. Ampatzidis, and G. Hoogenboom, "Improving soil moisture prediction with deep learning and machine learning models," Computers and Electronics in Agriculture, vol. 226, Nov. 2024, Art. no. 109414. DOI: https://doi.org/10.1016/j.compag.2024.109414

E. Bwambale, F. K. Abagale, and G. K. Anornu, "Data-driven model predictive control for precision irrigation management," Smart Agricultural Technology, vol. 3, Feb. 2023, Art. no. 100074. DOI: https://doi.org/10.1016/j.atech.2022.100074

A. A. Abdelmoneim, H. N. Kimaita, C. M. A. Kalaany, B. Derardja, G. Dragonetti, and R. Khadra, "IoT Sensing for Advanced Irrigation Management: A Systematic Review of Trends, Challenges, and Future Prospects," Sensors, vol. 25, no. 7, Apr. 2025, Art. no. 2291. DOI: https://doi.org/10.3390/s25072291

A. Kaur, D. P. Bhatt, and L. Raja, "Soil Moisture, Air temperature, humidity, and Motor on/off Monitoring data." Mendeley Data, July 24, 2023.

A. Kaur, D. P. Bhatt, and L. Raja, "Developing a Hybrid Irrigation System for Smart Agriculture Using IoT Sensors and Machine Learning in Sri Ganganagar, Rajasthan," Journal of Sensors, vol. 2024, no. 1, Jan. 2024, Art. no. 6676907. DOI: https://doi.org/10.1155/2024/6676907

H. Hairani, T. Widiyaningtyas, D. D. Prasetya, G. Y. Pratama, K. Hidjah, and S. Soraya, "Multi-class Classification of Obesity Levels Using Gradient Boosting, Random Forest, and C4.5," in 2025 9th International Conference On Electrical, Electronics And Information Engineering, Mataram, Indonesia, 2025, pp. 1–5. DOI: https://doi.org/10.1109/ICEEIE66203.2025.11253679

S. Robo, T. Widiyaningtyas, and W. Sakti, "HCF-MFGB: Hybrid Collaborative Filtering Based on Matrix Factorization and Gradient Boosting," Computers, Materials & Continua, vol. 86, no. 2, pp. 1–19, Dec. 2025. DOI: https://doi.org/10.32604/cmc.2025.073011

T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794. DOI: https://doi.org/10.1145/2939672.2939785

D. M. P. Murti, U. Pujianto, A. P. Wibawa, and M. I. Akbar, "K-Nearest Neighbor (K-NN) based Missing Data Imputation," in 2019 5th International Conference on Science in Information Technology, Yogyakarta, Indonesia, 2019, pp. 83–88. DOI: https://doi.org/10.1109/ICSITech46713.2019.8987530

Z. Jinbo, L. Yufu, and M. Haitao, "Handling missing data of using the XGBoost-based multiple imputation by chained equations regression method," Frontiers in Artificial Intelligence, vol. 8, Apr. 2025, Art. no. 1553220. DOI: https://doi.org/10.3389/frai.2025.1553220

Q. A. Hidayaturrohman and E. Hanada, "Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure," BioMedInformatics, vol. 4, no. 4, pp. 2201–2212, Nov. 2024. DOI: https://doi.org/10.3390/biomedinformatics4040118

S. van Buuren, Flexible Imputation of Missing Data, 2nd ed. New York, NY, USA: Chapman and Hall/CRC, 2018. DOI: https://doi.org/10.1201/9780429492259

S. Jäger, A. Allhorn, and F. Bießmann, "A Benchmark for Data Imputation Methods," Frontiers in Big Data, vol. 4, July 2021, Art. no. 693674. DOI: https://doi.org/10.3389/fdata.2021.693674

T. Zonta, C. A. da Costa, R. da Rosa Righi, M. J. de Lima, E. S. da Trindade, and G. P. Li, "Predictive maintenance in the Industry 4.0: A systematic literature review," Computers & Industrial Engineering, vol. 150, Dec. 2020, Art. no. 106889. DOI: https://doi.org/10.1016/j.cie.2020.106889

W. Zhang, D. Yang, and H. Wang, "Data-Driven Methods for Predictive Maintenance of Industrial Equipment: A Survey," IEEE Systems Journal, vol. 13, no. 3, pp. 2213–2227, Sept. 2019. DOI: https://doi.org/10.1109/JSYST.2019.2905565

R. Little and D. Rubin, "Front Matter," in Statistical Analysis with Missing Data, 3rd ed., Hoboken, NJ, USA: John Wiley & Sons, Ltd, 2019, pp. i–xii.

P. C. Austin, I. R. White, D. S. Lee, and S. van Buuren, "Missing Data in Clinical Research: A Tutorial on Multiple Imputation," Canadian Journal of Cardiology, vol. 37, no. 9, pp. 1322–1331, Sept. 2021. DOI: https://doi.org/10.1016/j.cjca.2020.11.010

J. R. Carpenter and M. Smuk, "Missing data: A statistical framework for practice," Biometrical Journal, vol. 63, no. 5, pp. 915–947, June 2021. DOI: https://doi.org/10.1002/bimj.202000196

Downloads

How to Cite

[1]
M. Walid, M. Ashar, and H. W. Herwanto, “Evaluating the Data Imputation Impact on Gradient Boosting Model Predictive Performance in Sensor Failure Recovery for Smart Irrigation Systems”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 2, pp. 34452–34459, Apr. 2026.

Metrics

Abstract Views: 84
PDF Downloads: 52

Metrics Information