Recognition of target domain Japanese speech using language model replacement

Mori, Daiki; Ohta, Kengo; 西村, 良太; ニシムラ, リョウタ; Nishimura, Ryota; Ogawa, Atsunori; 北岡, 教英; キタオカ, ノリヒデ; Kitaoka, Norihide

doi:https://doi.org/10.1186/s13636-024-00360-8

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Recognition of target domain Japanese speech using language model replacement

https://tokushima-u.repo.nii.ac.jp/records/2012784

名前 / ファイル	ライセンス	アクション
eurasip_2024_40.pdf (1.8 MB)

Item type

文献 / Documents(1)

公開日

2025-03-26

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

出版社版DOI

関連名称

10.1186/s13636-024-00360-8

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

タイトル

Recognition of target domain Japanese speech using language model replacement

著者

Mori, Daiki
Ohta, Kengo
西村, 良太

WEKO 942
徳島大学教育研究者総覧 346405/profile-ja.html

ja	西村, 良太
ja-Kana	ニシムラ, リョウタ
en	Nishimura, Ryota

Search repository

Ogawa, Atsunori
北岡, 教英

WEKO 728
e-Rad_Researcher 10333501

ja	北岡, 教英
ja-Kana	キタオカ, ノリヒデ
en	Kitaoka, Norihide

Search repository

抄録

内容記述

End-to-end (E2E) automatic speech recognition (ASR) models, which consist of deep learning models, are able to perform ASR tasks using a single neural network. These models should be trained using a large amount of data; however, collecting speech data which matches the targeted speech domain can be difficult, so speech data is often used that is not an exact match to the target domain, resulting in lower performance. In comparison to speech data, in-domain text data is much easier to obtain. Thus, traditional ASR systems use separately trained language models and HMM-based acoustic models. However, it is difficult to separate language information from an E2E ASR model because the model learns both acoustic and language information in an integrated manner, making it very difficult to create E2E ASR models for specialized target domain which are able to achieve sufficient recognition performance at a reasonable cost. In this paper, we propose a method of replacing the language information within pre-trained E2E ASR models in order to achieve adaptation to a target domain. This is achieved by deleting the “implicit” language information contained within the ASR model by subtracting the source-domain language model trained with a transcription of the ASR’s training data in a logarithmic domain. We then integrate a target domain language model through addition in the logarithmic domain. This subtraction and addition to replace of the language model is based on Bayes’ theorem. In our experiment, we first used two datasets of the Corpus of Spontaneous Japanese (CSJ) to evaluate the effectiveness of our method. We then we evaluated our method using the Japanese Newspaper Article Speech (JNAS) and CSJ corpora, which contain audio data from the read speech and spontaneous speech domain, respectively, to test the effectiveness of our proposed method at bridging the gap between these two language domains. Our results show that our proposed language model replacement method achieved better ASR performance than both non-adapted (baseline) ASR models and ASR models adapted using the conventional Shallow Fusion method.

キーワード

主題

End-to-end speech reccognition

キーワード

主題

Implicit language information

キーワード

主題

Language model replacement

書誌情報

en : EURASIP Journal on Audio, Speech, and Music Processing

巻 2024, p. 40, 発行日 2024-07-20

収録物ID

収録物識別子タイプ

EISSN

収録物識別子

16874722

出版者

Springer Nature

出版者

BioMed Central

権利情報

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

EID

識別子

412238

言語

eng

戻る

views

See details

	Views

Versions

Ver.1

2025-03-04 01:17:24.340121

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Recognition of target domain Japanese speech using language model replacement

× Mori, Daiki

× Ohta, Kengo

× 西村, 良太

× Ogawa, Atsunori

× 北岡, 教英

Versions

Share

Cite as

エクスポート