Paper Details

Abstract

Language models (LMs) play a crucial role in low-resource automatic speech recognition (ASR), particularly in code-switching scenarios where multiple languages are interleaved within a single utterance or conversation. These scenarios are often challenged by a lack of sufficient annotated training data. To mitigate this limitation, common strategies include leveraging multilingual pretraining, fine-tuning on linguistically related languages, and generating synthetic code-switched text from monolingual data. In this paper, we propose a novel method, Semi-Supervised Text Generation (SSTG), aimed at enhancing Mandarin-English code-switching speech recognition (CSSR). Our approach utilizes a semi-supervised acoustic model to generate synthetic code-switched transcriptions from untranscribed audios. The SEAME Mandarin-English code-switching corpus is used as supervised training data, while Part IV of the National Speech Corpus (NSC) serves as the source of untranscribed input. Experimental results demonstrate that the quality of the generated text is comparable to that of manually annotated transcripts, highlighting the effectiveness of our approach in improving language modeling for code-switching speech recognition.

Keywords
Language Model Semi-Supervised Learning Code-Switching Speech Recognition Text Generation
Contact Information
Cao Hong Nga (Corresponding Author)
FPT University, Can Tho, Vietnam., Vietnam
0942150108

All Authors (1)

Cao Hong Nga C

Affiliation: FPT University, Can Tho, Vietnam.

Country: Vietnam

Email: caohongnga@gmail.com

Phone: 0942150108