Physics-Informed Deep Learning for Human Action Recognition: A Biomechanical Approach

Zineb Haimer; Khalid Mateur; Youssef Farhan; Abdessalam Ait Madi

doi:10.48084/etasr.16856

Authors

Zineb Haimer Advanced Systems Engineering Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco
Khalid Mateur Advanced Systems Engineering Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco
Youssef Farhan Advanced Systems Engineering Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco
Abdessalam Ait Madi Advanced Systems Engineering Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco

Volume: 16 | Issue: 2 | Pages: 33854-33865 | April 2026 | https://doi.org/10.48084/etasr.16856

Received: 10 December 2025 | Revised: 20 January 2026 | Accepted: 28 January 2026 | Online: 4 April 2026

Corresponding author: Zineb Haimer

Abstract

Human action recognition systems traditionally rely on learning statistical patterns from visual data without explicit modeling of the physical laws governing human motion. This paper presents a physics-informed neural network architecture that integrates biomechanical modeling directly into the learning process. This approach computes kinematic features (joint angles) and kinetic features (torque, energy) from estimated poses and fuses them with visual motion features within a Transformer encoder. A multi-objective loss function encourages physically plausible representations by penalizing biomechanically infeasible poses and energetically unrealistic movements. Testing the proposed method in police traffic gesture recognition achieved 96.11% classification accuracy while maintaining biomechanical feasibility (0.998 average feasibility score). The integration of physics-based features enables the disambiguation of visually similar gestures through their underlying physical signatures. This approach produces interpretable physical measurements that can be validated against biomechanical principles, making it particularly suitable for safety-critical applications where model transparency is essential.

Keywords:

physics-informed neural networks, action recognition, gesture recognition, biomechanics, transformer networks, computer vision

Downloads

Download data is not yet available.

References

K. Simonyan and A. Zisserman, "Two-Stream Convolutional Networks for Action Recognition in Videos." arXiv, 2014.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning Spatiotemporal Features with 3D Convolutional Networks." arXiv, 2014. DOI: https://doi.org/10.1109/ICCV.2015.510

J. Carreira and A. Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset." arXiv, 2017. DOI: https://doi.org/10.1109/CVPR.2017.502

C. Feichtenhofer, H. Fan, J. Malik, and K. He, "SlowFast Networks for Video Recognition." arXiv, 2018. DOI: https://doi.org/10.1109/ICCV.2019.00630

A. Dosovitskiy et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv, 2020.

G. Bertasius, H. Wang, and L. Torresani, "Is Space-Time Attention All You Need for Video Understanding?" arXiv, 2021.

A. O. Hashi, S. Z. M. Hashim, and A. B. Asamah, "Dynamic Adaptation in Deep Learning for Enhanced Hand Gesture Recognition," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15836–15841, Aug. 2024. DOI: https://doi.org/10.48084/etasr.7670

S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, Aug. 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735

S. Yan, Y. Xiong, and D. Lin, "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition." arXiv, 2018. DOI: https://doi.org/10.1609/aaai.v32i1.12328

H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, "Revisiting Skeleton-based Action Recognition." arXiv, 2021. DOI: https://doi.org/10.1109/CVPR52688.2022.00298

C. Prabha, R. Singh, M. Malik, M. R. Pradhan, and B. Acharya, "Advanced Gesture Recognition in Gaming: Implementing EfficientNetV2-B1 for ‘Rock, Paper, Scissors,’" Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23386–23392, June 2025. DOI: https://doi.org/10.48084/etasr.10373

M. Raissi, P. Perdikaris, N. Ahmadi, and G. E. Karniadakis, "Physics-Informed Neural Networks and Extensions." arXiv, 2024.

Y. Xia, X. Zhou, E. Vouga, Q. Huang, and G. Pavlakos, "Reconstructing Humans with a Biomechanically Accurate Skeleton," in 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 5355–5365. DOI: https://doi.org/10.1109/CVPR52734.2025.00504

T. Xiao and Y. F. Fu, "Biomechanical Modeling of Human Body Movement," Journal of Biometrics & Biostatistics, vol. 7, no. 3, 2016. DOI: https://doi.org/10.4172/2155-6180.1000309

G. Jocher, J. Qiu, and A. Chaurasia, "Ultralytics YOLO." Jan. 2023, [Online]. Available: https://github.com/ultralytics/ultralytics.

D. A. Winter, Biomechanics and Motor Control of Human Movement. John Wiley & Sons, 2009. DOI: https://doi.org/10.1002/9780470549148

P. De Leva, "Adjustments to Zatsiorsky-Seluyanov’s segment inertia parameters," Journal of Biomechanics, vol. 29, no. 9, pp. 1223–1230, Sept. 1996. DOI: https://doi.org/10.1016/0021-9290(95)00178-6

A. Vaswani et al., "Attention is All you Need," in Advances in Neural Information Processing Systems, 2017, vol. 30.

I. Loshchilov and F. Hutter, "Decoupled Weight Decay Regularization." arXiv, 2017.

"police_officers_dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/zaynabh/police-officers-dataset.

"zc402/traffic-gesture-datasets." May 07, 2025, [Online]. Available: https://github.com/zc402/traffic-gesture-datasets.

R. Padilla, S. L. Netto, and E. A. B. Da Silva, "A Survey on Performance Metrics for Object-Detection Algorithms," in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), July 2020, pp. 237–242. DOI: https://doi.org/10.1109/IWSSIP48289.2020.9145130