WEKO3
アイテム
Dual-TBNet : Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition
https://tokushima-u.repo.nii.ac.jp/records/2011247
https://tokushima-u.repo.nii.ac.jp/records/2011247a1f7b074-c26a-4e6e-9870-9dfb1e757619
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]()
Download is available from 2025/6/1.
|
|
Item type | 文献 / Documents(1) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2023-10-19 | |||||||||||||
アクセス権 | ||||||||||||||
アクセス権 | embargoed access | |||||||||||||
資源タイプ | ||||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||||
資源タイプ | journal article | |||||||||||||
出版社版DOI | ||||||||||||||
識別子タイプ | DOI | |||||||||||||
関連識別子 | https://doi.org/10.1109/TASLP.2023.3282092 | |||||||||||||
言語 | ja | |||||||||||||
関連名称 | 10.1109/TASLP.2023.3282092 | |||||||||||||
出版タイプ | ||||||||||||||
出版タイプ | NA | |||||||||||||
出版タイプResource | http://purl.org/coar/version/c_be7fb7dd8ff6fe43 | |||||||||||||
タイトル | ||||||||||||||
タイトル | Dual-TBNet : Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition | |||||||||||||
言語 | en | |||||||||||||
著者 |
Liu, Zheng
× Liu, Zheng
× 康, 鑫× 任, 福継
WEKO
401
|
|||||||||||||
抄録 | ||||||||||||||
内容記述タイプ | Abstract | |||||||||||||
内容記述 | Speech emotion recognition has always been one of the topics that have attracted a lot of attention from many researchers. In traditional feature fusion methods, the speech features used only come from the data set, and the weak robustness of features can easily lead to overfitting of the model. In addition, these methods often use simple concatenation to fuse features, which will cause the loss of speech information. In this article, to solve the above problems and improve the recognition accuracy, we utilize self-supervised learning to enhance the robustness of speech features and propose a feature fusion model(Dual-TBNet) that consists of two 1D convolutional layers, two Transformer modules and two bidirectional long short-term memory (BiLSTM) modules. Our model uses 1D convolution to take features of different segment lengths and dimension sizes as input, uses the attention mechanism to capture the correspondence between the two features, and uses the bidirectional time series module to enhance the contextual information of the fused features. We designed a total of four fusion models to fuse five pre-trained features and acoustic features. In the comparison experiments, the Dual-TBNet model achieved a recognition accuracy and F1 score of 95.7% and 95.8% on the CASIA dataset, 66.7% and 65.6% on the eNTERFACE05 dataset, 64.8% and 64.9% on the IEMOCAP dataset, 84.1% and 84.3% on the EMO-DB dataset and 83.3% and 82.1% on the SAVEE dataset. The Dual-TBNet model effectively fuses acoustic features of different lengths and dimensions with pre-trained features, enhancing the robustness of the features, and achieved the best performance. | |||||||||||||
言語 | en | |||||||||||||
キーワード | ||||||||||||||
言語 | en | |||||||||||||
主題Scheme | Other | |||||||||||||
主題 | Speech emotion recognition | |||||||||||||
キーワード | ||||||||||||||
言語 | en | |||||||||||||
主題Scheme | Other | |||||||||||||
主題 | affective computing | |||||||||||||
キーワード | ||||||||||||||
言語 | en | |||||||||||||
主題Scheme | Other | |||||||||||||
主題 | speech representation learning | |||||||||||||
キーワード | ||||||||||||||
言語 | en | |||||||||||||
主題Scheme | Other | |||||||||||||
主題 | feature fusion transformer | |||||||||||||
書誌情報 |
en : IEEE/ACM Transactions on Audio, Speech, and Language Processing 巻 31, p. 2193-2203, 発行日 2023-06-01 |
|||||||||||||
収録物ID | ||||||||||||||
収録物識別子タイプ | ISSN | |||||||||||||
収録物識別子 | 23299290 | |||||||||||||
収録物ID | ||||||||||||||
収録物識別子タイプ | ISSN | |||||||||||||
収録物識別子 | 23299304 | |||||||||||||
収録物ID | ||||||||||||||
収録物識別子タイプ | NCID | |||||||||||||
収録物識別子 | AA12669539 | |||||||||||||
出版者 | ||||||||||||||
出版者 | IEEE | |||||||||||||
言語 | en | |||||||||||||
権利情報 | ||||||||||||||
言語 | en | |||||||||||||
権利情報 | © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | |||||||||||||
EID | ||||||||||||||
識別子 | 397694 | |||||||||||||
識別子タイプ | URI | |||||||||||||
言語 | ||||||||||||||
言語 | eng |