An Efficient Lightweight CNN Autoencoder for Skeleton-Based Video Anomaly Detection

Mostafa Ibrahim Labib; Fatma Harby Mohamed

doi:10.48084/etasr.18897

Authors

Mostafa Ibrahim Labib Department of Computer Science, Higher Future Institute for Specialized Technological Studies, Egypt
Fatma Harby Mohamed Department of Computer Science, Higher Future Institute for Specialized Technological Studies, Egypt

Volume: 16 | Issue: 3 | Pages: 36849-36855 | June 2026 | https://doi.org/10.48084/etasr.18897

Received: 24 March 2026 | Revised: 26 April 2026 | Accepted: 1 May 2026 | Online: 6 June 2026

Corresponding author: Mostafa Ibrahim Labib

Abstract

Anomaly detection based on human motion has been an active area of research in computer vision, evolving from traditional handcrafted methods to deep learning–based representations. This paper proposes a lightweight Convolutional Neural Network (CNN) autoencoder designed for anomaly detection using skeleton image representations. Unlike prior Long Short-Term Memory (LSTM)-based approaches that depend on temporal coordinate sequences, the proposed approach encodes skeletal joint relations as 2D spatial maps, enabling convolutional learning of local and global structure. The model was trained exclusively on normal samples from structured datasets generated from pose-estimation outputs to reconstruct typical poses, with reconstruction error used as an anomaly score. Skeleton images are preprocessed before training using grayscale conversion, Gaussian blurring, and resizing to ensure consistency, reduce noise, and improve efficiency. Experimental results achieved an accuracy of 59.6%, precision of 61.0%, and recall of 95.5%, yielding an overall F1-score of 74.5%. These results demonstrate that the CNN-based autoencoder successfully identifies most anomalous poses while maintaining reasonable precision, validating the benefits of spatial learning and augmentation for skeleton image anomaly detection.

Keywords:

Convolutional Neural Network (CNN) autoencoder, data augmentation, lightweight deep learning, quantization

References

B. Ren, M. Liu, R. Ding, and H. Liu, "A Survey on 3D Skeleton-Based Action Recognition Using Learning Method," Cyborg and Bionic Systems, vol. 5, May 2024, Art. no. 0100.

P. K. Mishra, A. Mihailidis, and S. S. Khan, "Skeletal Video Anomaly Detection Using Deep Learning: Survey, Challenges, and Future Directions," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1073–1085, Apr. 2024.

J. Liu, "Algorithm for Skeleton Action Recognition by Integrating Attention Mechanism and Convolutional Neural Networks," International Journal of Advanced Computer Science and Applications, vol. 14, no. 8, pp. 604–613, Aug. 2023.

A. Alqahtani, "An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data," Multimedia Tools and Applications, vol. 84, no. 28, pp. 34401–34435, Aug. 2025.

Z. Liu, X. Wu, J. Wu, X. Wang, and L. Yang, "Language-guided Open-world Video Anomaly Detection under Weak Supervision." arXiv, Mar. 17, 2025.

I.-C. Hwang and H.-S. Kang, "Anomaly Detection Based on a 3D Convolutional Neural Network Combining Convolutional Block Attention Module Using Merged Frames," Sensors, vol. 23, no. 23, Dec. 2023, Art. no. 9616.

W. Pang, Q. He, Y. Li, and N. Ahmed, "Detecting video anomalies by jointly utilizing appearance and skeleton information," Expert Systems with Applications, vol. 246, July 2024, Art. no. 123135.

C. Xin, S. Kim, Y. Cho, and K. S. Park, "Enhancing Human Action Recognition with 3D Skeleton Data: A Comprehensive Study of Deep Learning and Data Augmentation," Electronics, vol. 13, no. 4, Feb. 2024, Art. no. 747.

R. Wu et al., "DA-Flow: Dual Attention Normalizing Flow for Skeleton-Based Video Anomaly Detection," IEEE Transactions on Multimedia, vol. 27, pp. 8847–8858, 2025.

J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, "Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction," in 21th International Conference on Artificial Neural Networks, Espoo, Finland, 2011, pp. 52–59.

"Action Recognition Datasets: 'NTU RGB+D' Dataset and 'NTU RGB+D 120' Dataset.” Rapid-Rich Object Search Lab. [Online]. Available: https://rose1.ntu.edu.sg/dataset/actionRecognition/.

J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y. Duan, and A. C. Kot, "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2684–2701, Oct. 2020.

W. Liu, "StevenLiuWen/ano_pred_cvpr2018." Mar. 12, 2026, [Online]. Available: https://github.com/StevenLiuWen/ano_pred_cvpr2018.

W. Liu, W. Luo, D. Lian, and S. Gao, "Future Frame Prediction for Anomaly Detection - A New Baseline," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6536–6545.

D. Manju et al., "Early Anomalus Action Detection in Surveillance Video Using MRCNN-LSTM Classification," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 25668–25676, Aug. 2025.