An Optimized Data Partitioning Framework for Benchmarking Ensemble and Linear Regression Models in Used-Vehicle Price Prediction

Authors

  • Pathamakorn Netayawijit Department of Information Systems, Faculty of Business Administration and Information Technology, Rajamangala University of Technology Isan, Khon Kaen Campus, Khon Kaen, Thailand https://orcid.org/0009-0001-1424-5725
  • Wirapong Chansanam Department of Information Science, Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen, Thailand
  • Kanda Sorn-In Department of Technology and Engineering, Faculty of Interdisciplinary Studies, Khon Kaen University, Nong Khai Campus, Nong Khai, Thailand https://orcid.org/0009-0003-3595-0858
Volume: 16 | Issue: 2 | Pages: 32800-32809 | April 2026 | https://doi.org/10.48084/etasr.16878

Abstract

Accurate used-vehicle price prediction is essential for consumers, dealers, and financial institutions, as pricing dynamics involve complex and non-linear relationships influenced by vehicle condition, depreciation patterns, and heterogeneous market factors. While ensemble learning models have demonstrated strong predictive capabilities, existing studies rarely compare them systematically with linear regression under multiple data partitioning strategies. This study proposes an Optimized Data Partitioning Framework (ODPF) to evaluate model performance and stability across four train–test split ratios (50–50, 60–40, 70–30, and 80–20) using a leakage-free preprocessing pipeline. The framework incorporates a variance-based stability index to quantify the effect of sampling variability, a methodological dimension largely absent from prior vehicle-pricing research. Six algorithms (Linear Regression, Decision Tree, Support Vector Regression (SVR), Random Forest, XGBoost, and LightGBM) were evaluated under consistent preprocessing and experimental conditions. The results indicate that ensemble methods outperform Linear Regression across all evaluation metrics, and Random Forest demonstrated the strongest performance, with a Root Mean Square Error (RMSE) and a coefficient of determination () of 274.26 and 0.9995, respectively. XGBoost and LightGBM also exhibited high accuracy ( > 0.998), whereas SVR showed limited generalization (RMSE = 1232.73; = 0.0800) for the sales analytics dataset. Stability analysis across repeated sampling identifies the 80–20 split as the most reliable configuration, exhibiting lower performance variance and stronger generalization consistency. Overall, the findings indicate that algorithm selection has a greater influence on predictive accuracy than the partition ratio alone, providing practical guidance for developing robust pricing models in the automotive domain.

Keywords:

machine learning, ensemble methods, vehicle price prediction, linear regression, data split optimization

Downloads

Download data is not yet available.

References

P. Yin, J. Cheng, and M. Peng, "Analyzing the Passenger Flow of Urban Rail Transit Stations by Using Entropy Weight-Grey Correlation Model: A Case Study of Shanghai in China," Mathematics, vol. 10, no. 19, Sept. 2022, Art. no. 3506. DOI: https://doi.org/10.3390/math10193506

A. Haque et al., "Implication of Different Data Split Ratio on the Performance of Model in Price Prediction of Used Vehicles Using Regression Analysis," Data and Metadata, vol. 3, Jan. 2024, Art. no. 425. DOI: https://doi.org/10.56294/dm2024425

N. Pal, P. Arora, P. Kohli, D. Sundararaman, and S. S. Palakurthy, "How Much Is My Car Worth? A Methodology for Predicting Used Cars’ Prices Using Random Forest," in Advances in Information and Communication Networks, vol. 886, K. Arai, S. Kapoor, and R. Bhatia, Eds. Cham: Springer International Publishing, 2019, pp. 413–422. DOI: https://doi.org/10.1007/978-3-030-03402-3_28

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. DOI: https://doi.org/10.1023/A:1010933404324

T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, Aug. 2016, pp. 785–794. DOI: https://doi.org/10.1145/2939672.2939785

Y. Ju, G. Sun, Q. Chen, M. Zhang, H. Zhu, and M. U. Rehman, "A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting," IEEE Access, vol. 7, pp. 28309–28318, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2901920

C. Selvarathi, G. Bhava Dharani, and R. Pavithra, "Survey on Pre-Owned Car Price Prediction Using Random Forest Algorithm," in ICT for Intelligent Systems, vol. 361, J. Choudrie, P. N. Mahalle, T. Perumal, and A. Joshi, Eds. Singapore: Springer Nature Singapore, 2023, pp. 177–189. DOI: https://doi.org/10.1007/978-981-99-3982-4_15

P. Arora, H. Gupta, and A. Singh, "Forecasting Resale Value of the Car: Evaluating the Proficiency Under the Impact of Machine Learning Model," Materials Today: Proceedings, vol. 69, pp. 441–445, 2022. DOI: https://doi.org/10.1016/j.matpr.2022.09.074

C. Longani, S. Prasad Potharaju, and S. Deore, "Price Prediction for Pre-Owned Cars Using Ensemble Machine Learning Techniques," in Advances in Parallel Computing, M. Rajesh, K. Vengatesan, M. Gnanasekar, Sitharthan.R, A. B. Pawar, P. N. Kalvadekar, and P. Saiprasad, Eds. IOS Press, 2021 pp. 178 - 187. DOI: https://doi.org/10.3233/APC210194

S. Pudaruth, "Predicting the Price of Used Cars Using Machine Learning Techniques," International Journal of Information & Computation Technology, vol. 4, no. 7, pp. 753–764, 2014.

K. Wang, "The Construction of Fuzzy Prediction Model of Stock Price Rise and Fall Based on Machine Learning Technology," Journal of Combinatorial Mathematics and Combinatorial Computing, vol. 120, no. 1, pp. 125–136, June 2024. DOI: https://doi.org/10.61091/jcmcc120-11

N. Monburinon, P. Chertchom, T. Kaewkiriya, S. Rungpheung, S. Buya, and P. Boonpou, "Prediction of prices for used car by using regression models," in 2018 5th International Conference on Business and Industrial Research, Bangkok, May 2018, pp. 115–119. DOI: https://doi.org/10.1109/ICBIR.2018.8391177

P. Venkatasubbu and M. Ganesh, "Used Cars Price Prediction using Supervised Learning Techniques," International Journal of Engineering and Advanced Technology, vol. 9, no. 1s3, pp. 216–223, Dec. 2019. DOI: https://doi.org/10.35940/ijeat.A1042.1291S319

K. Samruddhi and R. Ashok Kumar, "Used Car Price Prediction using K-Nearest Neighbor Based Model," International Journal of Innovative Research in Applied Sciences and Engineering, vol. 4, no. 2, pp. 629–632, Aug. 2020. DOI: https://doi.org/10.29027/IJIRASE.v4.i2.2020.629-632

H. Farman, S. Ahmed, M. H. Mughal, Q. -Ul-Ain Mastoi, and G. S. Lalwani, "Car Price Prediction and Recognition Using Deep Learning and Computer Vision Algorithms," Sir Syed University Research Journal of Engineering & Technology, vol. 15, no. 1, pp. 1–14, June 2025. DOI: https://doi.org/10.33317/ssurj.647

N. Sun, H. Bai, Y. Geng, and H. Shi, "Price Evaluation Model in Second-Hand Car System Based on BP Neural Network Theory," in 2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Kanazawa, Japan, June 2017, pp. 431–436. DOI: https://doi.org/10.1109/SNPD.2017.8022758

Ch. R. Madhuri, G. Anuradha, and M. V. Pujitha, "House Price Prediction Using Regression Techniques: A Comparative Study," in 2019 International Conference on Smart Structures and Systems, Chennai, India, Mar. 2019, pp. 1–5. DOI: https://doi.org/10.1109/ICSSS.2019.8882834

N. S. Bhatt, T. Nath Pandey, S. R. Reddy, B. Jayasurya, B. B. Dash, and S. Shekhar Patra, "An Empirical Analysis of Machine Learning Algorithms for Used Car Price Prediction System," in 2023 Global Conference on Information Technologies and Communications, Bangalore, India, Dec. 2023, pp. 1–5. DOI: https://doi.org/10.1109/GCITC60406.2023.10426270

G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning, vol. 103. New York City, NY, USA: Springer New York, 2013. DOI: https://doi.org/10.1007/978-1-4614-7138-7

J. Brownlee, Data Preparation for Machine Learning: Data Cleaning, Feature Selection, Data Transforms in Python. San Francisco, CA, USA: Machine Learning Mastery, 2020.

A. J. Smola and B. Schölkopf, "A Tutorial on Support Vector Regression," Statistics and Computing, vol. 14, no. 3, pp. 199–222, Aug. 2004. DOI: https://doi.org/10.1023/B:STCO.0000035301.49549.88

A. Panichella, "A Systematic Comparison of Search-Based Approaches for LDA Hyperparameter Tuning," Information and Software Technology, vol. 130, Feb. 2021, Art. no. 106411. DOI: https://doi.org/10.1016/j.infsof.2020.106411

S. Thabresh, "Car Sales Data - EDA." Kaggle, 2023. [Online]. Available: https://www.kaggle.com/code/thabresh/car-sales-data-eda.

I. Surjandari et al., "Stacked Generalization with Sequential-Model Based Optimization for Estimating Used Car Valuation in Indonesia," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17239–17247, Oct. 2024. DOI: https://doi.org/10.48084/etasr.8226

S. Dhanka, A. Sharma, A. Kumar, S. Maini, and H. Vundavilli, "Advancements in Hybrid Machine Learning Models for Biomedical Disease Classification Using Integration of Hyperparameter-Tuning and Feature Selection Methodologies: A Comprehensive Review," Archives of Computational Methods in Engineering, Jun. 2025. DOI: https://doi.org/10.1007/s11831-025-10309-5

P. Yadav, S. C. Sharma, R. Mahadeva, and S. P. Patole, "Exploring Hyper-Parameters and Feature Selection for Predicting Non-Communicable Chronic Disease Using Stacking Classifier," IEEE Access, vol. 11, pp. 80030–80055, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3299332

N. S. K. M. K. Tirumanadham, T. S, and S. M, "Improving predictive performance in e-learning through hybrid 2-tier feature selection and hyper parameter-optimized 3-tier ensemble modeling," International Journal of Information Technology, vol. 16, no. 8, pp. 5429–5456, Dec. 2024. DOI: https://doi.org/10.1007/s41870-024-02038-y

Downloads

How to Cite

[1]
P. Netayawijit, W. Chansanam, and K. Sorn-In, “An Optimized Data Partitioning Framework for Benchmarking Ensemble and Linear Regression Models in Used-Vehicle Price Prediction”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 2, pp. 32800–32809, Apr. 2026.

Metrics

Abstract Views: 222
PDF Downloads: 121

Metrics Information