A Hardware Platform for Smart Video Monitoring Based on ESP32-CAM and Mojo FPGA (Spartan-6) with Event Activation Triggered by a PIR Sensor
Received: 24 February 2026 | Revised: 7 April 2026 | Accepted: 17 April 2026 | Online: 6 June 2026
Corresponding author: Aigerim Rakhysh
Abstract
This article presents an event-based edge video surveillance architecture for Internet of Things (IoT) systems, in which the Passive Infrared (PIR) sensor initiates frame capture, the ESP32-CAM module (OV2640) performs control and network publishing, and the Mojo Field-Programmable Gate Array (FPGA) performs hardware-accelerated preprocessing and detection. Communication between the ESP32-CAM and FPGA is carried out via the Universal Asynchronous Receiver–Transmitter (UART) interface at a rate of 921,600 baud (optionally using INT/READY signals), whereas detection results are transmitted to the cloud via Wi-Fi using Message Queuing Telemetry Transport (MQTT) or Hypertext Transfer Protocol (HTTP). Two modes of operation are experimentally evaluated: ESP32-CAM only and ESP32-CAM + FPGA. In terms of end-to-end latency (from PIR triggering to acknowledgment received), a reduction in latency quantiles is observed. For N = 80 events, the P50 decreases from 620 to 480 ms, the P95 from 980 to 780 ms, and the P99 from 1,300 to 1,050 ms, confirming the performance gains achieved by offloading computation to dedicated hardware. The UART load is quantitatively characterized based on the transmitted data type. For compressed frames, the average packet size is approximately 1,200 B (P95: 1,600 B) at a rate of 25 packets/s, corresponding to approximately 30,000 B/s. For Regions of Interest (ROIs) or compact features, the average packet size is approximately 180 B (P95: 240 B) at 40 packets/s, corresponding to approximately 7,200 B/s. For detection results (bounding boxes, confidence values, and flags), the average packet size is approximately 64 B (P95: 96 B) at 40 packets/s, corresponding to approximately 2,560 B/s. These results demonstrate the advantage of transmitting feature-level data instead of full-frame data in bandwidth-constrained scenarios. Detection performance is evaluated using annotated views (60 windows per view). For human movement at a distance of 2 m (front view), accuracy/recall/F1-score values of 0.93/0.95/0.94 are achieved, whereas at 4 m (side view), the values are 0.87/0.85/0.86. The False Alarm Rate (FAR) is 0.15 and 0.25, respectively. For scenes without target movement, the FAR ranges from 0.03 to 0.17, depending on background conditions (idle scene, background movement, and lighting changes). The results demonstrate that the combination of the PIR sensor, ESP32-CAM, and FPGA provides an effective trade-off between latency, communication overhead, and detection performance, making it suitable as a minimal yet extensible platform for distributed security systems and industrial event-based monitoring.
Keywords:
ESP32, ESP32-CAM, Mojo FPGA, video surveillance, hardware monitoringReferences
L. O. M. Ali, A. A. Mochtar, and F. Djamaluddin, "Design and Development of Motion Control for a Metal Waste Cleaning-24 Robot Using ESP32 and PID Control," Engineering, Technology & Applied Science Research, vol. 16, no. 1, pp. 31770–31778, Feb. 2026.
S. T. Nowroz, N. M. Saleh, S. Shakur, S. Banerjee, and F. Amsaad, "A Benchmark Reference for ESP32-CAM Module." arXiv, May 29, 2025.
P. R. C. Abordo et al., "Smart surveillance system using ESP32 and camera-based motion detection with IM technology," International Journal of Research Studies in Educational Technology, vol. 8, no. 2, pp. 63–74, July 2024.
K. Okokpujie, I. P. Okokpujie, F. T. Young, and R. E. Subair, "Development of an Affordable Real-Time IoT-Based Surveillance System Using ESP32 and TWILIO API," Journal of Safety and Security Engineering, vol. 13, no. 6, pp. 1069–1075, Dec. 2023.
A. Zhaxalikov, A. Mombekov, and Z. Sotsial, "Surveillance Camera Using Wi-Fi Connection," Procedia Computer Science, vol. 231, pp. 721–726, Jan. 2024.
F. Hahn, S. Valle, R. Rendón, O. Oyorzabal, and A. Astudillo, "Mango Fruit Fly Trap Detection Using Different Wireless Communications," Agronomy, vol. 13, no. 7, June 2023, Art. no. 1736.
C. L. Kok, J. B. Heng, Y. Y. Koh, and T. H. Teo, "Energy-, Cost-, and Resource-Efficient IoT Hazard Detection System with Adaptive Monitoring," Sensors, vol. 25, no. 6, Mar. 2025, Art. no. 1761.
K. Koszewski et al., "Utilizing IoT Sensors and Spatial Data Mining for Analysis of Urban Space Actors’ Behavior in University Campus Space Design," Sensors, vol. 25, no. 5, Feb. 2025, Art. no. 1393.
M. R. Z. Chowdhury, A. Seum, M. R. Talukder, R. A. Amin, F. S. Hossain, and R. Obermaisser, "Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review," Signals, vol. 6, no. 4, Oct. 2025, Art. no. 53.
O. Al-Shamma and M. A. Fadhel, "Trusted outdoor multi-camera tracking system powered by FPGA," Journal of Engineering Research, vol. 13, no. 4, pp. 3092–3106, Dec. 2025.
C. W. Heng, C. Uttraphan, C. C. Choon, and K. B. Ching, "Optimizing FPGA-based YOLO series accelerators: A survey of techniques," Neurocomputing, vol. 650, Oct. 2025, Art. no. 130874.
Z. Yan, B. Zhang, and D. Wang, "An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications," Micromachines, vol. 15, no. 9, Sept. 2024, Art. no. 1164.
K. Zeng, Q. Ma, J. W. Wu, Z. Chen, T. Shen, and C. Yan, "FPGA-based accelerator for object detection: a comprehensive survey," The Journal of Supercomputing, vol. 78, no. 12, pp. 14096–14136, Aug. 2022.
S. H. Hozhabr and R. Giorgi, "A Survey on Real-Time Object Detection on FPGAs," IEEE Access, vol. 13, pp. 38195–38238, 2025.
S. M. Sali, M. Meribout, and A. A. Majeed, "Real Time FPGA Based CNNs for Detection, Classification, and Tracking in Autonomous Systems: State of the Art Designs and Optimizations." arXiv, Sept. 04, 2025.
T. Kryjak, "Event-Based Vision on FPGAs - a Survey," in 2024 27th Euromicro Conference on Digital System Design, Paris, France, 2024, pp. 541–550.
G. Gallego et al., "Event-Based Vision: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, Jan. 2022.
K. S. Velaga, Y. Guo, and W. Yu, "Edge AI for Smart Cities: Foundations, Challenges, and Opportunities," Smart Cities, vol. 8, no. 6, Dec. 2025, Art. no. 211.
A. Trigkas, D. Piromalis, and P. Papageorgas, "Edge Intelligence in Urban Landscapes: Reviewing TinyML Applications for Connected and Sustainable Smart Cities," Electronics, vol. 14, no. 14, July 2025, Art. no. 2890.
S. Adilzhanova, A. Rakhysh, M. Kunelbayev, G. Amirkhanova, and D. Sybanova, "Digital Representations in IoT: Cryptographic Tools for Improved Security," Journal of Advances in Information Technology, vol. 17, no. 2, pp. 390–404, 2026.
T. Zhukabayeva, L. Zholshiyeva, N. Karabayev, S. Khan, and N. Alnazzawi, "Cybersecurity Solutions for Industrial Internet of Things–Edge Computing Integration: Challenges, Threats, and Future Directions," Sensors, vol. 25, no. 1, Jan. 2025, Art. no. 213.
P. Lech, B. Marciniak, and K. Okarma, "A Low-Cost Energy-Efficient IoT Camera Trap Network for Remote Forest Surveillance," Electronics, vol. 14, no. 21, Oct. 2025, Art. no. 4266.
Y. Gao, S. Wang, and H. K.-H. So, "REMOT: A Hardware-Software Architecture for Attention-Guided Multi-Object Tracking with Dynamic Vision Sensors on FPGAs," in Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 2022, pp. 158–168.
V. A. Méndez-Lópe, C. Soubervielle-Montalvo, A. S. Núñez-Varela, O. E. Pérez-Cham, and J. E. González-Galván, "A survey on FPGA-based design methodologies for visual object tracking," in V Congreso Internacional y XIII Congreso Nacional de Ciencias de la Computación, Puebla, Mexico, 2023, pp. 102–113.
A. O. Elfaki, W. Messoudi, A. Bushnag, S. Abuzneid, and T. Alhmiedat, "A Smart Real-Time Parking Control and Monitoring System," Sensors, vol. 23, no. 24, Dec. 2023, Art. no. 9741.
R. Al Amin and R. Obermaisser, "Real-Time Object Detection and Classification using YOLO for Edge FPGAs," in 2025 International Symposium ELMAR, Zadar, Croatia, 2025, pp. 291–295.
Downloads
How to Cite
License
Copyright (c) 2026 Saltanat Adilzhanova, Gulshat Amirkhanova, Murat Kunelbayev, Aigerim Rakhysh

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
