The Impact of Data Splitting on Graph-Based Dropout Prediction Using Subgraph Matching and Graph Edit Distance
Received: 24 December 2025 | Revised: 20 January 2026 and 27 January 2026 | Accepted: 29 January 2026 | Online: 4 April 2026
Corresponding author: Meilia Nur Indah Susanti
Abstract
Student dropout remains a persistent issue in higher education, affecting institutional effectiveness and student success rates. This paper proposes a graph-based predictive model that employs subgraph matching and Graph Edit Distance (GED) to identify students at high risk of dropout. By modeling students and courses as an undirected bipartite graph, the system detects structural similarities between student profiles. The proposed model was evaluated using a dataset of 282 students from a private university in Indonesia under three data-splitting scenarios: 70/30, 79/21, and 89/11. Evaluation metrics include precision, recall, F1-score, and accuracy. The model achieved its best performance at the 89/11 split, with an accuracy of 91%, precision of 1.00, recall of 0.88, and an F1-score of 0.94. Results suggest that increasing the proportion of training data enhances generalization and prediction accuracy. GED demonstrated effectiveness in capturing subtle structural distinctions among student–course relationships, enabling early dropout risk identification. The primary contribution of this study is the development of a graph-analytic framework for dropout prediction, offering an alternative to traditional models such as logistic regression and decision trees that lack relational awareness. Future work will incorporate behavioral and socio-economic attributes to further improve prediction outcomes.
Keywords:
data split, dropout prediction, graph-based modeling, Graph Edit Distance (GED), subgraph matchingDownloads
References
K. Oqaidi, S. Aouhassi, and K. Mansouri, "Towards a Students' Dropout Prediction Model in Higher Education Institutions Using Machine Learning Algorithms," International Journal of Emerging Technologies in Learning, vol. 17, no. 18, pp. 103–117, Sept. 2022. DOI: https://doi.org/10.3991/ijet.v17i18.25567
B. Alsubhi et al., "Effective Feature Prediction Models for Student Performance," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11937–11944, Oct. 2023. DOI: https://doi.org/10.48084/etasr.6345
E. J. Lizarte Simón and J. Gijón Puerta, "Prediction of early dropout in higher education using the SCPQ," Cogent Psychology, vol. 9, no. 1, Dec. 2022, Art. no. 2123588. DOI: https://doi.org/10.1080/23311908.2022.2123588
D. González-González, M. Arias-Corona, A. Cárdenas-Cruz, and A. Vicente-Bújez, "The impact of academic dropout at the University of Granada and proposals for prevention," Frontiers in Education, vol. 8, Feb. 2023, Art. no. 1110491. DOI: https://doi.org/10.3389/feduc.2023.1110491
W. Villegas-Ch, J. Govea, and S. Revelo-Tapia, "Improving Student Retention in Institutions of Higher Education through Machine Learning: A Sustainable Approach," Sustainability, vol. 15, no. 19, Oct. 2023, Art. no. 14512. DOI: https://doi.org/10.3390/su151914512
K. M. Sujon et al., "The Effects of Imbalanced Datasets on Machine Learning Algorithms in Predicting Student Performance," JOIV : International Journal on Informatics Visualization, vol. 8, no. 3–2, pp. 1599–1605, Nov. 2024. DOI: https://doi.org/10.62527/joiv.8.3-2.2449
N. Mduma, K. Kalegele, and D. Machuve, "Machine learning approach for reducing students dropout rates," International Journal of Advanced Computer Research, vol. 9, no. 42, pp. 156–169, May 2019. DOI: https://doi.org/10.19101/IJACR.2018.839045
C. C. Gray and D. Perkins, "Utilizing early engagement and machine learning to predict student outcomes," Computers & Education, vol. 131, pp. 22–32, Apr. 2019. DOI: https://doi.org/10.1016/j.compedu.2018.12.006
L. Kemper, G. Vorhoff, and B. U. Wigger, "Predicting student dropout: A machine learning approach," European Journal of Higher Education, vol. 10, no. 1, pp. 28–47, Jan. 2020. DOI: https://doi.org/10.1080/21568235.2020.1718520
M. A. Hassan, A. H. Muse, and S. Nadarajah, "Predicting Student Dropout Rates Using Supervised Machine Learning: Insights from the 2022 National Education Accessibility Survey in Somaliland," Applied Sciences, vol. 14, no. 17, Aug. 2024, Art. no. 7593. DOI: https://doi.org/10.3390/app14177593
Y. Zhang, Y. Yun, H. Dai, J. Cui, and X. Shang, "Graphs Regularized Robust Matrix Factorization and Its Application on Student Grade Prediction," Applied Sciences, vol. 10, no. 5, Mar. 2020, Art. no. 1755. DOI: https://doi.org/10.3390/app10051755
Q. Hu and H. Rangwala, "Academic Performance Estimation with Attention-based Graph Convolutional Networks," in Proceedings of The 12th International Conference on Educational Data Mining, Montréal, Canada, 2019, pp. 69–78.
M. Anwar, A. E. Hassanien, V. Snás̃el, and S. H. Basha, "Subgraph Query Matching in Multi-Graphs Based on Node Embedding," Mathematics, vol. 10, no. 24, Dec. 2022, Art. no. 4830. DOI: https://doi.org/10.3390/math10244830
C. Piao, T. Xu, X. Sun, Y. Rong, K. Zhao, and H. Cheng, "Computing Graph Edit Distance via Neural Graph Matching," Proceedings of the VLDB Endowment, vol. 16, no. 8, pp. 1817–1829, Apr. 2023. DOI: https://doi.org/10.14778/3594512.3594514
L. T. M. Blessing and A. Chakrabarti, DRM, a Design Research Methodology. London, UK: Springer, 2009. DOI: https://doi.org/10.1007/978-1-84882-587-1
A. Haque et al., "Implication of Different Data Split Ratio on the Performance of Model in Price Prediction of Used Vehicles Using Regression Analysis," Data and Metadata, vol. 3, pp. 425–425, Jan. 2024. DOI: https://doi.org/10.56294/dm2024425
K. Ayyavvu, B. I. Panneer, A. Sreenivasan, and A. K. A. Muthukrishnan, "Heart disease prediction using machine learning," AIP Conference Proceedings, vol. 2857, no. 1, Aug. 2023, Art. no. 020065. DOI: https://doi.org/10.1063/5.0165188
D. Jha et al., "Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning," Nature Communications, vol. 10, no. 1, Nov. 2019, Art. no. 5316. DOI: https://doi.org/10.1038/s41467-019-13297-w
X. Gao, B. Xiao, D. Tao, and X. Li, "A survey of graph edit distance," Pattern Analysis and Applications, vol. 13, no. 1, pp. 113–129, Feb. 2010. DOI: https://doi.org/10.1007/s10044-008-0141-y
C. Solnon, "AllDifferent-based filtering for subgraph isomorphism," Artificial Intelligence, vol. 174, no. 12, pp. 850–864, Aug. 2010. DOI: https://doi.org/10.1016/j.artint.2010.05.002
Y. Liu, "ORB Feature Based Neighbor Graph Construction Method for Graph Regularized Non-Negative Matrix Factorization," ICIC Express Letters Part B: Applications, vol. 7, no. 10, pp. 2197–2203, Oct. 2016.
C. Schröer, F. Kruse, and J. M. Gómez, "A Systematic Literature Review on Applying CRISP-DM Process Model," Procedia Computer Science, vol. 181, pp. 526–534, Jan. 2021. DOI: https://doi.org/10.1016/j.procs.2021.01.199
D. B. Blumenthal, N. Boria, J. Gamper, S. Bougleux, and L. Brun, "Comparing heuristics for graph edit distance computation," The VLDB Journal, vol. 29, no. 1, pp. 419–458, July 2019. DOI: https://doi.org/10.1007/s00778-019-00544-1
A. Vanacore, M. S. Pellegrino, and A. Ciardiello, "Fair evaluation of classifier predictive performance based on binary confusion matrix," Computational Statistics, vol. 39, no. 1, pp. 363–383, Feb. 2024. DOI: https://doi.org/10.1007/s00180-022-01301-9
L. Lavazza and S. Morasca, "Common Problems With the Usage of F-Measure and Accuracy Metrics in Medical Research," IEEE Access, vol. 11, pp. 51515–51526, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3278996
A. Sanfeliu and K.-S. Fu, "A distance measure between attributed relational graphs for pattern recognition," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 3, pp. 353–362, May 1983. DOI: https://doi.org/10.1109/TSMC.1983.6313167
Downloads
How to Cite
License
Copyright (c) 2026 Meilia Nur Indah Susanti, Yaya Heryadi, Yusep Rosmansyah, Widodo Budiharto

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
