An Efficient Lightweight CNN Autoencoder for Skeleton-Based Video Anomaly Detection
Received: 24 March 2026 | Revised: 26 April 2026 | Accepted: 1 May 2026 | Online: 6 June 2026
Corresponding author: Mostafa Ibrahim Labib
Abstract
Anomaly detection based on human motion has been an active area of research in computer vision, evolving from traditional handcrafted methods to deep learning–based representations. This paper proposes a lightweight Convolutional Neural Network (CNN) autoencoder designed for anomaly detection using skeleton image representations. Unlike prior Long Short-Term Memory (LSTM)-based approaches that depend on temporal coordinate sequences, the proposed approach encodes skeletal joint relations as 2D spatial maps, enabling convolutional learning of local and global structure. The model was trained exclusively on normal samples from structured datasets generated from pose-estimation outputs to reconstruct typical poses, with reconstruction error used as an anomaly score. Skeleton images are preprocessed before training using grayscale conversion, Gaussian blurring, and resizing to ensure consistency, reduce noise, and improve efficiency. Experimental results achieved an accuracy of 59.6%, precision of 61.0%, and recall of 95.5%, yielding an overall F1-score of 74.5%. These results demonstrate that the CNN-based autoencoder successfully identifies most anomalous poses while maintaining reasonable precision, validating the benefits of spatial learning and augmentation for skeleton image anomaly detection.
Keywords:
Convolutional Neural Network (CNN) autoencoder, data augmentation, lightweight deep learning, quantizationReferences
B. Ren, M. Liu, R. Ding, and H. Liu, "A Survey on 3D Skeleton-Based Action Recognition Using Learning Method," Cyborg and Bionic Systems, vol. 5, May 2024, Art. no. 0100.
P. K. Mishra, A. Mihailidis, and S. S. Khan, "Skeletal Video Anomaly Detection Using Deep Learning: Survey, Challenges, and Future Directions," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1073–1085, Apr. 2024.
J. Liu, "Algorithm for Skeleton Action Recognition by Integrating Attention Mechanism and Convolutional Neural Networks," International Journal of Advanced Computer Science and Applications, vol. 14, no. 8, pp. 604–613, Aug. 2023.
A. Alqahtani, "An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data," Multimedia Tools and Applications, vol. 84, no. 28, pp. 34401–34435, Aug. 2025.
Z. Liu, X. Wu, J. Wu, X. Wang, and L. Yang, "Language-guided Open-world Video Anomaly Detection under Weak Supervision." arXiv, Mar. 17, 2025.
I.-C. Hwang and H.-S. Kang, "Anomaly Detection Based on a 3D Convolutional Neural Network Combining Convolutional Block Attention Module Using Merged Frames," Sensors, vol. 23, no. 23, Dec. 2023, Art. no. 9616.
W. Pang, Q. He, Y. Li, and N. Ahmed, "Detecting video anomalies by jointly utilizing appearance and skeleton information," Expert Systems with Applications, vol. 246, July 2024, Art. no. 123135.
C. Xin, S. Kim, Y. Cho, and K. S. Park, "Enhancing Human Action Recognition with 3D Skeleton Data: A Comprehensive Study of Deep Learning and Data Augmentation," Electronics, vol. 13, no. 4, Feb. 2024, Art. no. 747.
R. Wu et al., "DA-Flow: Dual Attention Normalizing Flow for Skeleton-Based Video Anomaly Detection," IEEE Transactions on Multimedia, vol. 27, pp. 8847–8858, 2025.
J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, "Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction," in 21th International Conference on Artificial Neural Networks, Espoo, Finland, 2011, pp. 52–59.
"Action Recognition Datasets: 'NTU RGB+D' Dataset and 'NTU RGB+D 120' Dataset.” Rapid-Rich Object Search Lab. [Online]. Available: https://rose1.ntu.edu.sg/dataset/actionRecognition/.
J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y. Duan, and A. C. Kot, "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2684–2701, Oct. 2020.
W. Liu, "StevenLiuWen/ano_pred_cvpr2018." Mar. 12, 2026, [Online]. Available: https://github.com/StevenLiuWen/ano_pred_cvpr2018.
W. Liu, W. Luo, D. Lian, and S. Gao, "Future Frame Prediction for Anomaly Detection - A New Baseline," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6536–6545.
D. Manju et al., "Early Anomalus Action Detection in Surveillance Video Using MRCNN-LSTM Classification," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25668–25676, Aug. 2025.
Downloads
How to Cite
License
Copyright (c) 2026 Mostafa Ibrahim Labib, Fatma Harby Mohamed

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
