Temporal Modeling and Real-Time Recognition Approaches in SLR Systems
This article is dedicated to analyzing advanced approaches in temporal modeling and real-time gesture recognition within sign language recognition (SLR) systems. Sign glosses are expressed through the spatio-temporal characteristics of visual information, which requires the use of sequence-processing models for their automatic recognition. The study primarily evaluates the effectiveness of three key models: Long Short-Term Memory (LSTM) networks, Temporal Convolutional Networks (TCN), and Transformer-based architectures.
The article also examines methods applied for real-time analysis of sign glosses, including:
Sliding window segmentation of video streams;
Self-attention mechanisms for identifying dependencies between gestures;
Gloss mapping algorithms for linking sign movements to linguistic units;
Ontological integration techniques for enhancing semantic accuracy.
Practical results indicate that combining temporal modeling with semantic analysis and contextual verification algorithms ensures continuous and high-accuracy recognition of sign movements. In particular, multimodal systems (video + sensor + gloss) utilizing Transformer-based approaches achieved superior performance in real-time conversion of continuous sign gloss streams into text.
The findings of this study hold practical significance for the development of smart assistive devices for automatic sign language translation, interactive interfaces for hearing-impaired users, and specialized SLR platforms for educational and instructional purposes.