Diagnostic Evaluation of Policy-Gradient-Based Ranking

Yu, Hai-Tao; Huang, Degen; 任, 福継; ニン, フジ; Ren, Fuji; Li, Lishuang

doi:https://doi.org/10.3390/electronics11010037

インデックスツリー

RootNode

アイテム

Diagnostic Evaluation of Policy-Gradient-Based Ranking

https://tokushima-u.repo.nii.ac.jp/records/2010059

名前 / ファイル	ライセンス	アクション
electronics_11_1_37.pdf (582 KB)

Item type

文献 / Documents(1)

公開日

2022-07-14

アクセス権

open access

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

出版社版DOI

関連名称

10.3390/electronics11010037

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

タイトル

Diagnostic Evaluation of Policy-Gradient-Based Ranking

著者

Yu, Hai-Tao
Huang, Degen
任, 福継

WEKO 401
徳島大学教育研究者総覧 19966/profile-ja.html
e-Rad 20264947

ja	任, 福継 ISNI
ja-Kana	ニン, フジ
en	Ren, Fuji

Search repository

Li, Lishuang

抄録

内容記述

Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient.

キーワード

主題

learning-to-rank

キーワード

主題

policy gradient

キーワード

主題

reinforcement learning

キーワード

主題

adversarial learning

キーワード

主題

ranking sampling

書誌情報

en : Electronics

巻 11, 号 1, p. 37, 発行日 2021-12-23

収録物ID

収録物識別子タイプ

ISSN

収録物識別子

20799292

出版者

MDPI

権利情報

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

EID

識別子

382584

言語

eng

戻る

views

See details

	Views

Versions

Ver.1

2024-12-12 07:39:04.344129

Show All versions

Cite as

Yu, Hai-Tao, Huang, Degen, 任, 福継, Li, Lishuang, 2021, Diagnostic Evaluation of Policy-Gradient-Based Ranking: MDPI, 37– p.

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Diagnostic Evaluation of Policy-Gradient-Based Ranking

× Yu, Hai-Tao

× Huang, Degen

× 任, 福継

× Li, Lishuang

Versions

Share

Cite as

エクスポート