End-to-end recognition of streaming Japanese speech using CTC and local attention

Chen, Jiahao; 西村, 良太; ニシムラ, リョウタ; Nishimura, Ryota; 北岡, 教英; キタオカ, ノリヒデ; Kitaoka, Norihide

doi:https://doi.org/10.1017/ATSIP.2020.23

インデックスツリー

RootNode

アイテム

End-to-end recognition of streaming Japanese speech using CTC and local attention

https://tokushima-u.repo.nii.ac.jp/records/2008800

名前 / ファイル	ライセンス	アクション
atsip_9_e25.pdf (421 KB)

Item type

文献 / Documents(1)

公開日

2021-06-11

アクセス権

open access

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

出版社版DOI

関連名称

10.1017/ATSIP.2020.23

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

タイトル

End-to-end recognition of streaming Japanese speech using CTC and local attention

タイトル別表記

その他のタイトル

E2E SPEECH RECOGNITION WITH CTC AND LOCAL ATTENTION

著者

Chen, Jiahao
西村, 良太

WEKO 942
徳島大学教育研究者総覧 346405/profile-ja.html

ja	西村, 良太 ISNI
ja-Kana	ニシムラ, リョウタ
en	Nishimura, Ryota

Search repository

北岡, 教英

WEKO 728
e-Rad 10333501

ja	北岡, 教英 ISNI
ja-Kana	キタオカ, ノリヒデ
en	Kitaoka, Norihide

Search repository

抄録

内容記述

Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.

キーワード

主題

CTC

キーワード

主題

Local attention

キーワード

主題

Speech recognition

キーワード

主題

Streaming recognition

書誌情報

en : APSIPA Transactions on Signal and Information Processing

巻 9, p. e25, 発行日 2020-11-23

収録物ID

収録物識別子タイプ

ISSN

収録物識別子

20487703

出版者

Cambridge University Press

権利情報

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

EID

識別子

372885

言語

eng

戻る

views

See details

	Views

Versions

Ver.1

2024-11-22 07:44:53.654908

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

End-to-end recognition of streaming Japanese speech using CTC and local attention

× Chen, Jiahao

× 西村, 良太

× 北岡, 教英

Versions

Share

Cite as

エクスポート