WEKO3
アイテム
Detection of Arbitrary Wake Words by Coupling a Phoneme Predictor and a Phoneme Sequence Detector
https://tokushima-u.repo.nii.ac.jp/records/2012765
https://tokushima-u.repo.nii.ac.jp/records/201276560dda734-33f1-4ee1-9020-f3fa17239149
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Item type | 文献 / Documents(1) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2025-03-18 | |||||||||||
アクセス権 | ||||||||||||
アクセス権 | open access | |||||||||||
アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||||||||
資源タイプ | ||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||
資源タイプ | journal article | |||||||||||
出版社版DOI | ||||||||||||
関連識別子 | http://dx.doi.org/10.1561/116.20240014 | |||||||||||
関連名称 | 10.1561/116.20240014 | |||||||||||
出版タイプ | ||||||||||||
出版タイプ | VoR | |||||||||||
出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |||||||||||
タイトル | ||||||||||||
タイトル | Detection of Arbitrary Wake Words by Coupling a Phoneme Predictor and a Phoneme Sequence Detector | |||||||||||
タイトル別表記 | ||||||||||||
その他のタイトル | Detection of Arbitrary Wake Words | |||||||||||
著者 |
西村, 良太
× 西村, 良太× Uno, Takaaki
× Yamamoto, Taiki
× Ohta, Kengo
× 北岡, 教英 |
|||||||||||
抄録 | ||||||||||||
内容記述 | Most wake word (WW) detection systems used in smartphones and smart speakers only detect specific, predefined WWs such as “Hey, Siri” or “OK, Google”. To build such a system, a large speech corpus consisting of many examples of the selected WWs must be collected to train the model. If we want the device to detect a different WW, collection of a new speech corpus and re-training of the model are required. In this study, we propose a system which is capable of detecting any chosen WW without additional model training or a corpus of WW utterances, allowing users to select and use their preferred WW. Our system consists of a phoneme predictor (PP) and a phoneme sequence detector (PSD). The PP predicts phoneme sequences using acoustic features of the input speech, and outputs phoneme probability distributions. The acoustic models in the PP are trained using the Connectionist Temporal Classification (CTC) loss criterion. The PSD takes the output of the PP as input, and predicts the probability of whether or not the WW has been input. In our evaluation experiments, we performed six-phoneme WW detection. Our results showed that the proposed method achieved 90% WW detection accuracy. |
|||||||||||
キーワード | ||||||||||||
主題 | Wake word | |||||||||||
キーワード | ||||||||||||
主題 | CTC | |||||||||||
キーワード | ||||||||||||
主題 | end-to-end modeling | |||||||||||
キーワード | ||||||||||||
主題 | phoneme sequence detector | |||||||||||
書誌情報 |
en : APSIPA Transactions on Signal and Information Processing 巻 13, 号 1, p. e14, 発行日 2024-08-22 |
|||||||||||
収録物ID | ||||||||||||
収録物識別子タイプ | EISSN | |||||||||||
収録物識別子 | 20487703 | |||||||||||
出版者 | ||||||||||||
出版者 | Cambridge University Press | |||||||||||
権利情報 | ||||||||||||
権利情報 | This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, for non-commercial use, provided the original work is properly cited. | |||||||||||
EID | ||||||||||||
識別子 | 412237 | |||||||||||
言語 | ||||||||||||
言語 | eng |