Design and analysis of the bilinear bandit algorithms쌍선형 밴딧 해법에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 57
  • Download : 0
We consider the bilinear bandit problem where the learner chooses a pair of arms, each from two different action spaces of dimension $d_1$ and $d_2$, respectively. The learner then receives a reward whose expectation is a bilinear function of the two chosen arms with an unknown matrix parameter $\Theta^*\in\mathbb{R}^{d_1 \times d_2}$ with rank $r$. Despite abundant applications such as drug discovery, the optimal regret rate is unknown for this problem, though it was conjectured to be $\tilde O(\sqrt{d_1d_2(d_1+d_2)r T})$ by \citet{jun2019bilinear} where $\tilde O$ ignores polylogarithmic factors in $T$. In this paper, we make progress towards closing the gap between the upper and lower bound on the optimal regret. First, we reject the conjecture above by proposing algorithms that achieve the regret $\tilde O(\sqrt{d_1 d_2 (d_1+d_2) T})$ using the fact that the action space dimension $O(d_1+d_2)$ is significantly lower than the matrix parameter dimension $O(d_1 d_2)$. Second, we additionally devise an algorithm with better empirical performance than previous algorithms. Finally, we propose an algorithm for the problem of the bilinear bandit with warm-up vectors.
Advisors
Kang, Wanmoresearcher강완모researcher
Description
한국과학기술원 :수리과학과,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 수리과학과, 2022.2,[iv, 54 p. :]

URI
http://hdl.handle.net/10203/308550
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=996370&flag=dissertation
Appears in Collection
MA-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0