A Hardware-Aware Analysis of PTQ and QAT Quantized CNNs for Object Detection on FPGA

Noura Jariri; Kaoutar Allabouche; Mohamed Benaly; Mohammed Chaman; Rania Majdoubi; Abdelkader Hadjoudja

doi:10.48084/etasr.17475

Authors

Noura Jariri Laboratory of Electronic Systems Information Processing Mechanics and Energetics, Ibn Tofail University, Kenitra, Morocco
Kaoutar Allabouche Laboratory of Electronic Systems Information Processing Mechanics and Energetics, Ibn Tofail University, Kenitra, Morocco
Mohamed Benaly Innovate Systems Engineering Laboratory (ISI), National School of Applied Sciences of Tetouan (ENSA-Te), Abdelmalek Essaadi University, Tetouan, Morocco
Mohammed Chaman Laboratory of Electronic Systems Information Processing Mechanics and Energetics, Ibn Tofail University, Kenitra, Morocco
Rania Majdoubi Laboratory of Electronic Systems Information Processing Mechanics and Energetics, Ibn Tofail University, Kenitra, Morocco
Abdelkader Hadjoudja Laboratory of Electronic Systems Information Processing Mechanics and Energetics, Ibn Tofail University, Kenitra, Morocco

Volume: 16 | Issue: 2 | Pages: 34498-34504 | April 2026 | https://doi.org/10.48084/etasr.17475

Received: 11 January 2026 | Revised: 15 February 2026 | Accepted: 25 February 2026 | Online: 14 March 2026

Corresponding author: Noura Jariri

Abstract

Real-time object detection on embedded platforms is critical for safety-critical and industrial applications, but FPGA deployment remains challenging due to constraints on numerical precision, latency, and hardware resources. Although quantization is widely used to enable efficient FPGA inference, its impact on object-detection models combining classification and bounding-box regression has not been systematically analyzed within an hls4ml based workflow. This work compares Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) for deploying a lightweight CNN-based detector on an FPGA. An FP32 model is quantized to INT16 and INT8 using QKeras and subsequently converted to fixed-point hardware representations with hls4ml. The results show that PTQ severely degrades detection performance, reducing classification accuracy to approximately 10% and mean IoU below 0.30. In contrast, QAT preserves near-floating-point performance, achieving ≈94% accuracy and ≈0.89 IoU at the software level for both INT16 and INT8. However, default HLS fixed-point configurations introduce software-hardware discrepancies, particularly in classification. A regression-aware refinement that increases fractional precision in the bounding-box head restores hardware-level localization accuracy (IoU ≈0.89), while residual classification gaps remain due to fixed-point constraints. These findings demonstrate that reliable FPGA-based object detection requires both QAT and hardware-aware fixed-point design, providing practical guidelines for low-precision deployment using hls4ml.

Keywords:

quantization, CNN, object detection, PTQ, QAT, hls4ml, FPGA

Downloads

Download data is not yet available.

References

S. Pouyanfar et al., "A Survey on Deep Learning: Algorithms, Techniques, and Applications," ACM Computing Surveys, vol. 51, no. 5, pp. 1–36, Sept. 2019. DOI: https://doi.org/10.1145/3234150

S. Dargan, M. Kumar, M. R. Ayyagari, and G. Kumar, "A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning," Archives of Computational Methods in Engineering, vol. 27, no. 4, pp. 1071–1092, Sept. 2020. DOI: https://doi.org/10.1007/s11831-019-09344-w

S. Dong, P. Wang, and K. Abbas, "A survey on deep learning and its applications," Computer Science Review, vol. 40, May 2021, Art. no. 100379. DOI: https://doi.org/10.1016/j.cosrev.2021.100379

J. He, J. Jiang, and C. Zhang, "A survey of lightweight methods for object detection networks," Array, vol. 29, Mar. 2026, Art. no. 100589. DOI: https://doi.org/10.1016/j.array.2025.100589

S. H. Hozhabr and R. Giorgi, "A Survey on Real-Time Object Detection on FPGAs," IEEE Access, vol. 13, pp. 38195–38238, 2025. DOI: https://doi.org/10.1109/ACCESS.2025.3544515

T. Saidani, R. Ghodhbani, A. Alhomoud, A. Alshammari, H. Zayani, and M. Ben Ammar, "Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA Platform," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 13066–13071, Feb. 2024. DOI: https://doi.org/10.48084/etasr.6761

B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference." arXiv, 2017. DOI: https://doi.org/10.1109/CVPR.2018.00286

G. Alsuhli, V. Sakellariou, H. Saleh, M. Al-Qutayri, B. Mohammad, and T. Stouraitis, "A Survey and Comparative Analysis of Number Systems for Deep Neural Networks," Proceedings of the IEEE, vol. 113, no. 2, pp. 172–207, Feb. 2025. DOI: https://doi.org/10.1109/JPROC.2025.3578756

A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, "A Survey of Quantization Methods for Efficient Neural Network Inference." arXiv, 2021. DOI: https://doi.org/10.1201/9781003162810-13

L. Wei, Z. Ma, C. Yang, and Q. Yao, "Advances in the Neural Network Quantization: A Comprehensive Review," Applied Sciences, vol. 14, no. 17, Aug. 2024, Art. no. 7445. DOI: https://doi.org/10.3390/app14177445

H. C. Moon, S. Lee, J. Jeong, and S. Kim, "YOLOv6+: simple and optimized object detection model for INT8 quantized inference on mobile devices," Signal, Image and Video Processing, vol. 19, no. 8, Aug. 2025, Art. no. 665. DOI: https://doi.org/10.1007/s11760-025-04234-0

D. Wu, Y. Wang, Y. Fei, and G. Gao, "A Novel Mixed-Precision Quantization Approach for CNNs," IEEE Access, vol. 13, pp. 49309–49319, 2025. DOI: https://doi.org/10.1109/ACCESS.2025.3551802

L. Huang et al., "HQOD: Harmonious Quantization for Object Detection." arXiv, 2024. DOI: https://doi.org/10.1109/ICME57554.2024.10687589

A. Chen, "Comparative Analysis of YOLO Variants Based on Performance Evaluation for Object Detection," ITM Web of Conferences, vol. 70, 2025, Art. no. 03008. DOI: https://doi.org/10.1051/itmconf/20257003008

A. Kumar and S. Srivastava, "Object Detection System Based on Convolution Neural Networks Using Single Shot Multi-Box Detector," Procedia Computer Science, vol. 171, pp. 2610–2617, 2020. DOI: https://doi.org/10.1016/j.procs.2020.04.283

M. Wang, H. Sun, J. Shi, X. Liu, B. Zhang, and X. Cao, "Q-YOLO: Efficient Inference for Real-time Object Detection." arXiv, 2023. DOI: https://doi.org/10.1007/978-3-031-47665-5_25

Z. Jiang, C. Li, T. Qu, C. He, and D. Wang, "MSQuant: Efficient Post-Training Quantization for Object Detection via Migration Scale Search," Electronics, vol. 14, no. 3, Jan. 2025, Art. no. 504. DOI: https://doi.org/10.3390/electronics14030504

C. U. Oflamaz and M. E. Yalçın, "HADQ-Net: A Power-Efficient and Hardware-Adaptive Deep Convolutional Neural Network Translator Based on Quantization-Aware Training for Hardware Accelerators," Electronics, vol. 14, no. 18, Sept. 2025, Art. no. 3686. DOI: https://doi.org/10.3390/electronics14183686

T. Aarrestad et al., "Fast convolutional neural networks on FPGAs with hls4ml," Machine Learning: Science and Technology, vol. 2, no. 4, Dec. 2021, Art. no. 045015. DOI: https://doi.org/10.1088/2632-2153/ac0ea1

S. Curzel, N. Ghielmetti, M. Fiorito, and F. Ferrandi, "De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml." arXiv, Mar. 24, 2021.

F. Fahim et al., "hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices." arXiv, Mar. 23, 2021.

H. Zhu, H. Wei, B. Li, X. Yuan, and N. Kehtarnavaz, "A Review of Video Object Detection: Datasets, Metrics and Methods," Applied Sciences, vol. 10, no. 21, Nov. 2020, Art. no. 7834. DOI: https://doi.org/10.3390/app10217834

R. Padilla, S. L. Netto, and E. A. B. Da Silva, "A Survey on Performance Metrics for Object-Detection Algorithms," in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), July 2020, pp. 237–242. DOI: https://doi.org/10.1109/IWSSIP48289.2020.9145130

O. B. H. Salah, S. Messaoud, M. A. Hajjaji, M. Atri, and N. Liouane, "Post-training quantization for efficient FPGA-based neural network acceleration," Integration, vol. 105, Nov. 2025, Art. no. 102508. DOI: https://doi.org/10.1016/j.vlsi.2025.102508

C. Ding, S. Wang, N. Liu, K. Xu, Y. Wang, and Y. Liang, "REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs." arXiv, 2019. DOI: https://doi.org/10.1145/3289602.3293904

S. E. Chang et al., "Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework." arXiv, 2020. DOI: https://doi.org/10.1109/HPCA51647.2021.00027

C. Sun et al., "HGQ: High Granularity Quantization for Real-time Neural Networks on FPGAs," in Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Feb. 2026, pp. 79–91. DOI: https://doi.org/10.1145/3748173.3779200

M. Tasci, A. Istanbullu, V. Tumen, and S. Kosunalp, "FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs," Applied Sciences, vol. 15, no. 2, Jan. 2025, Art. no. 688. DOI: https://doi.org/10.3390/app15020688

M. Jaiswal, V. Sharma, A. Sharma, S. Saini, and R. Tomar, "Quantized CNN-based efficient hardware architecture for real-time hand gesture recognition," Microelectronics Journal, vol. 151, Sept. 2024, Art. no. 106345. DOI: https://doi.org/10.1016/j.mejo.2024.106345