Paper Details
Abstract
In this work, we present a Tiny Transformer architecture optimized for real-time glossto-text translation. Unlike conventional Transformer models, our approach explicitly addresses the structural and grammatical divergences between sign language glosses and natural language text, enabling efficient translation in low-resource settings. To bridge this gap, we introduce a pre-processing pipeline that includes gloss prefix removal, whquestion reordering, auxiliary verb insertion, and syntactic restructuring. These steps help the model learn more effective mappings from gloss sequences to fluent English sentences. Experimental evaluations show that our Tiny Transformer achieves competitive BLEU score improvements of up to 5 points compared to the original Transformer, while reducing inference latency by half (35 ms on average) relative to the 4-layer model. This efficiency makes it highly suitable for deployment on edge devices and in real-time applications.