Edge-Based Deep Learning for Traffic Object Detection in Driving Scenarios

Hanif Rahmat; Suprapto; Moh. Edi Wibowo

doi:10.48084/etasr.17760

Authors

Hanif Rahmat Department of Computer Science and Electronics, Universitas Gadjah Mada, Sleman, Indonesia https://orcid.org/0009-0009-7207-6155
Suprapto Department of Computer Science and Electronics, Universitas Gadjah Mada, Sleman, Indonesia
Moh. Edi Wibowo Department of Computer Science and Electronics, Universitas Gadjah Mada, Sleman, Indonesia

Volume: 16 | Issue: 3 | Pages: 36864-36869 | June 2026 | https://doi.org/10.48084/etasr.17760

Received: 26 January 2026 | Revised: 7 April 2026 | Accepted: 17 April 2026 | Online: 6 June 2026

Corresponding author: Suprapto

Abstract

Object detection is an important task in autonomous driving. Accuracy and processing speed often become a trade-off in edge deep learning-based object detection implementations. This study aimed to comparatively analyze the performance of deep learning-based object detection methods at the edge in detecting traffic-related objects in driving scenarios. This study contributes to the field by implementing fine-tuning methods on a custom dataset, on driving scenarios in Yogyakarta, Indonesia, and deploying them onto a resource-constrained edge computing device. PyTorch is used as the deep learning framework, with TorchVision as the main library, and Open Neural Network Exchange (ONNX) is used to convert PyTorch models into a standardized graph for edge implementation. The experimental results show that the two-stage detector Faster R-CNN ResNet50 with an input size of 800 outperforms other methods, achieving the best mAP of 37.6% and mAPS of 21.1%, but its inference time was the longest, reaching 78.89 s. In contrast, Faster R-CNN MobileNetV3-Large with an input size of 320 achieved the fastest inference time (0.58 s) with significantly lower mAP, mAP0.5, and mAP0.75 (11.2%, 19.5%, 11.9%, respectively). Conditional DETR achieved a moderate inference time (17.31 s) and considerable mAP (32.7%). FCOS with an input size of 800 had a higher mAP than Conditional DETR (34.8% vs. 32.7%) and an inference time twice as fast as Faster R-CNN ResNet50 (38.92 vs. 78.89 s). Therefore, Conditional DETR and FCOS are preferable for resource-constrained edge computing implementations.

Keywords:

deep learning, object detection, edge computing, traffic, driving scenarios, fine-tuning

References

D. P. F. Möller and R. E. Haas, "Advanced Driver Assistance Systems and Autonomous Driving," in Guide to Automotive Connectivity and Cybersecurity, Springer International Publishing, 2019, pp. 513–580.

S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, June 2017.

W. Liu et al., "SSD: Single Shot MultiBox Detector," in Computer Vision – ECCV 2016, vol. 9905, B. Leibe, J. Matas, N. Sebe, and M. Welling, Springer International Publishing, 2016, pp. 21–37.

T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, Feb. 2020.

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," in Computer Vision – ECCV 2020, vol. 12346, A. Vedaldi, H. Bischof, T. Brox, and J. M. Frahm, Springer International Publishing, 2020, pp. 213–229.

D. Meng et al., "Conditional DETR for Fast Training Convergence." arXiv, 2021.

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable DETR: Deformable Transformers for End-to-End Object Detection," ICLR 2021 - 9th International Conference on Learning Representations, Austria, arXiv, 2020.

G. Zhang, Z. Luo, Y. Yu, K. Cui, and S. Lu, "Accelerating DETR Convergence via Semantic-Aligned Matching," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, June 2022, pp. 939–948.

P. Kakchingtabam, B. K. Sakshi, P. Jadav, and H. Harshavardhan, "A Comprehensive Survey of Transfer Learning Techniques and Applications Across Domains," in 2025 5th International Conference on Soft Computing for Security Applications (ICSCSA), Salem, India, Aug. 2025, pp. 1186–1195.

M. Wazid, A. K. Das, V. Chamola, and Y. Park, "Uniting cyber security and machine learning: Advantages, challenges and future research," ICT Express, vol. 8, no. 3, pp. 313–321, Sept. 2022.

S. A. El-Ghany, M. Elmogy, and A. A. A. El-Aziz, "A fully automatic fine tuned deep learning model for knee osteoarthritis detection and progression analysis," Egyptian Informatics Journal, vol. 24, no. 2, pp. 229–240, July 2023.

C. Amisse, M. E. Jijón-Palma, and J. A. S. Centeno, "Fine-Tuning Deep Learning Models For Pedestrians Detection," Boletim de Ciências Geodésicas, vol. 27, no. 2, 2021, Art. no. e2021013.

H. Rahmat, S. Wahjuni, and H. Rahmawan, "Performance Analysis of Deep Learning-based Object Detectors on Raspberry Pi for Detecting Melon Leaf Abnormality," International Journal on Advanced Science, Engineering and Information Technology, vol. 12, no. 2, pp. 572–579, Apr. 2022.

H. Rahmat, Suprapto, and M. E. Wibowo, "Concatenated feature pyramid network for robust object detection in urban driving scenarios: A case study in Yogyakarta, Indonesia," Applied Soft Computing, vol. 196, June 2026, Art. no. 115098.

T. Y. Lin et al., "Microsoft COCO: Common Objects in Context," in Computer Vision – ECCV 2014, vol. 8693, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Springer International Publishing, 2014, pp. 740–755.

B. Karbouj, G. A. Topalian-Rivas, and J. Krüger, "Comparative Performance Evaluation of One-Stage and Two-Stage Object Detectors for Screw Head Detection and Classification in Disassembly Processes," Procedia CIRP, vol. 122, pp. 527–532, 2024.

H. Bi, V. Wen, and Z. Xu, "Comparing one-stage and two-stage learning strategy in object detection," Applied and Computational Engineering, vol. 5, no. 1, pp. 171–177, May 2023.