The Impact of Data Splitting on Graph-Based Dropout Prediction Using Subgraph Matching and Graph Edit Distance

Authors

  • Meilia Nur Indah Susanti Computer Science Department, Universitas Bina Nusantara, Jakarta, Indonesia | Faculty of Energy Telematics, Institut Teknologi PLN, West Jakarta, Indonesia
  • Yaya Heryadi Computer Science Department, Universitas Bina Nusantara, Jakarta, Indonesia
  • Yusep Rosmansyah School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
  • Widodo Budiharto School of Computer Science, Universitas Bina Nusantara, Jakarta, Indonesia
Volume: 16 | Issue: 2 | Pages: 33916-33924 | April 2026 | https://doi.org/10.48084/etasr.17152

Abstract

Student dropout remains a persistent issue in higher education, affecting institutional effectiveness and student success rates. This paper proposes a graph-based predictive model that employs subgraph matching and Graph Edit Distance (GED) to identify students at high risk of dropout. By modeling students and courses as an undirected bipartite graph, the system detects structural similarities between student profiles. The proposed model was evaluated using a dataset of 282 students from a private university in Indonesia under three data-splitting scenarios: 70/30, 79/21, and 89/11. Evaluation metrics include precision, recall, F1-score, and accuracy. The model achieved its best performance at the 89/11 split, with an accuracy of 91%, precision of 1.00, recall of 0.88, and an F1-score of 0.94. Results suggest that increasing the proportion of training data enhances generalization and prediction accuracy. GED demonstrated effectiveness in capturing subtle structural distinctions among student–course relationships, enabling early dropout risk identification. The primary contribution of this study is the development of a graph-analytic framework for dropout prediction, offering an alternative to traditional models such as logistic regression and decision trees that lack relational awareness. Future work will incorporate behavioral and socio-economic attributes to further improve prediction outcomes.

Keywords:

data split, dropout prediction, graph-based modeling, Graph Edit Distance (GED), subgraph matching

Downloads

Download data is not yet available.

References

K. Oqaidi, S. Aouhassi, and K. Mansouri, "Towards a Students' Dropout Prediction Model in Higher Education Institutions Using Machine Learning Algorithms," International Journal of Emerging Technologies in Learning, vol. 17, no. 18, pp. 103–117, Sept. 2022. DOI: https://doi.org/10.3991/ijet.v17i18.25567

B. Alsubhi et al., "Effective Feature Prediction Models for Student Performance," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11937–11944, Oct. 2023. DOI: https://doi.org/10.48084/etasr.6345

E. J. Lizarte Simón and J. Gijón Puerta, "Prediction of early dropout in higher education using the SCPQ," Cogent Psychology, vol. 9, no. 1, Dec. 2022, Art. no. 2123588. DOI: https://doi.org/10.1080/23311908.2022.2123588

D. González-González, M. Arias-Corona, A. Cárdenas-Cruz, and A. Vicente-Bújez, "The impact of academic dropout at the University of Granada and proposals for prevention," Frontiers in Education, vol. 8, Feb. 2023, Art. no. 1110491. DOI: https://doi.org/10.3389/feduc.2023.1110491

W. Villegas-Ch, J. Govea, and S. Revelo-Tapia, "Improving Student Retention in Institutions of Higher Education through Machine Learning: A Sustainable Approach," Sustainability, vol. 15, no. 19, Oct. 2023, Art. no. 14512. DOI: https://doi.org/10.3390/su151914512

K. M. Sujon et al., "The Effects of Imbalanced Datasets on Machine Learning Algorithms in Predicting Student Performance," JOIV : International Journal on Informatics Visualization, vol. 8, no. 3–2, pp. 1599–1605, Nov. 2024. DOI: https://doi.org/10.62527/joiv.8.3-2.2449

N. Mduma, K. Kalegele, and D. Machuve, "Machine learning approach for reducing students dropout rates," International Journal of Advanced Computer Research, vol. 9, no. 42, pp. 156–169, May 2019. DOI: https://doi.org/10.19101/IJACR.2018.839045

C. C. Gray and D. Perkins, "Utilizing early engagement and machine learning to predict student outcomes," Computers & Education, vol. 131, pp. 22–32, Apr. 2019. DOI: https://doi.org/10.1016/j.compedu.2018.12.006

L. Kemper, G. Vorhoff, and B. U. Wigger, "Predicting student dropout: A machine learning approach," European Journal of Higher Education, vol. 10, no. 1, pp. 28–47, Jan. 2020. DOI: https://doi.org/10.1080/21568235.2020.1718520

M. A. Hassan, A. H. Muse, and S. Nadarajah, "Predicting Student Dropout Rates Using Supervised Machine Learning: Insights from the 2022 National Education Accessibility Survey in Somaliland," Applied Sciences, vol. 14, no. 17, Aug. 2024, Art. no. 7593. DOI: https://doi.org/10.3390/app14177593

Y. Zhang, Y. Yun, H. Dai, J. Cui, and X. Shang, "Graphs Regularized Robust Matrix Factorization and Its Application on Student Grade Prediction," Applied Sciences, vol. 10, no. 5, Mar. 2020, Art. no. 1755. DOI: https://doi.org/10.3390/app10051755

Q. Hu and H. Rangwala, "Academic Performance Estimation with Attention-based Graph Convolutional Networks," in Proceedings of The 12th International Conference on Educational Data Mining, Montréal, Canada, 2019, pp. 69–78.

M. Anwar, A. E. Hassanien, V. Snás̃el, and S. H. Basha, "Subgraph Query Matching in Multi-Graphs Based on Node Embedding," Mathematics, vol. 10, no. 24, Dec. 2022, Art. no. 4830. DOI: https://doi.org/10.3390/math10244830

C. Piao, T. Xu, X. Sun, Y. Rong, K. Zhao, and H. Cheng, "Computing Graph Edit Distance via Neural Graph Matching," Proceedings of the VLDB Endowment, vol. 16, no. 8, pp. 1817–1829, Apr. 2023. DOI: https://doi.org/10.14778/3594512.3594514

L. T. M. Blessing and A. Chakrabarti, DRM, a Design Research Methodology. London, UK: Springer, 2009. DOI: https://doi.org/10.1007/978-1-84882-587-1

A. Haque et al., "Implication of Different Data Split Ratio on the Performance of Model in Price Prediction of Used Vehicles Using Regression Analysis," Data and Metadata, vol. 3, pp. 425–425, Jan. 2024. DOI: https://doi.org/10.56294/dm2024425

K. Ayyavvu, B. I. Panneer, A. Sreenivasan, and A. K. A. Muthukrishnan, "Heart disease prediction using machine learning," AIP Conference Proceedings, vol. 2857, no. 1, Aug. 2023, Art. no. 020065. DOI: https://doi.org/10.1063/5.0165188

D. Jha et al., "Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning," Nature Communications, vol. 10, no. 1, Nov. 2019, Art. no. 5316. DOI: https://doi.org/10.1038/s41467-019-13297-w

X. Gao, B. Xiao, D. Tao, and X. Li, "A survey of graph edit distance," Pattern Analysis and Applications, vol. 13, no. 1, pp. 113–129, Feb. 2010. DOI: https://doi.org/10.1007/s10044-008-0141-y

C. Solnon, "AllDifferent-based filtering for subgraph isomorphism," Artificial Intelligence, vol. 174, no. 12, pp. 850–864, Aug. 2010. DOI: https://doi.org/10.1016/j.artint.2010.05.002

Y. Liu, "ORB Feature Based Neighbor Graph Construction Method for Graph Regularized Non-Negative Matrix Factorization," ICIC Express Letters Part B: Applications, vol. 7, no. 10, pp. 2197–2203, Oct. 2016.

C. Schröer, F. Kruse, and J. M. Gómez, "A Systematic Literature Review on Applying CRISP-DM Process Model," Procedia Computer Science, vol. 181, pp. 526–534, Jan. 2021. DOI: https://doi.org/10.1016/j.procs.2021.01.199

D. B. Blumenthal, N. Boria, J. Gamper, S. Bougleux, and L. Brun, "Comparing heuristics for graph edit distance computation," The VLDB Journal, vol. 29, no. 1, pp. 419–458, July 2019. DOI: https://doi.org/10.1007/s00778-019-00544-1

A. Vanacore, M. S. Pellegrino, and A. Ciardiello, "Fair evaluation of classifier predictive performance based on binary confusion matrix," Computational Statistics, vol. 39, no. 1, pp. 363–383, Feb. 2024. DOI: https://doi.org/10.1007/s00180-022-01301-9

L. Lavazza and S. Morasca, "Common Problems With the Usage of F-Measure and Accuracy Metrics in Medical Research," IEEE Access, vol. 11, pp. 51515–51526, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3278996

A. Sanfeliu and K.-S. Fu, "A distance measure between attributed relational graphs for pattern recognition," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 3, pp. 353–362, May 1983. DOI: https://doi.org/10.1109/TSMC.1983.6313167

Downloads

How to Cite

[1]
M. N. I. Susanti, Y. Heryadi, Y. Rosmansyah, and W. Budiharto, “The Impact of Data Splitting on Graph-Based Dropout Prediction Using Subgraph Matching and Graph Edit Distance”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 2, pp. 33916–33924, Apr. 2026.

Metrics

Abstract Views: 76
PDF Downloads: 38

Metrics Information