Multi-armed Bandit with Additional Observations

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 468
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorYun, Donggyuko
dc.contributor.authorProutiere, Alexandreko
dc.contributor.authorAhn, Sumyeongko
dc.contributor.authorShin, Jinwooko
dc.contributor.authorYi, Yungko
dc.date.accessioned2019-01-22T08:38:07Z-
dc.date.available2019-01-22T08:38:07Z-
dc.date.created2018-11-28-
dc.date.created2018-11-28-
dc.date.created2018-11-28-
dc.date.issued2018-03-
dc.identifier.citationProceedings of the ACM on Measurement and Analysis of Computing Systems, v.2, no.1, pp.1 - 22-
dc.identifier.issn2476-1249-
dc.identifier.urihttp://hdl.handle.net/10203/249061-
dc.description.abstractWe study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. In the case of stochastic rewards, we develop a new algorithm KL-UCB-AO which is asymptotically optimal when the time horizon grows large, by smartly identifying the optimal set of the arms to be explored using the given budget of additional observations. In the case of adversarial rewards, we propose H-INF, an algorithm with order-optimal regret. H-INF exploits a two-layered structure where in each layer, we run a known optimal MAB algorithm. Such a hierarchical structure facilitates the regret analysis of the algorithm, and in turn, yields order-optimal regret. We apply the framework of MAB with additional observations to the design of rate adaptation schemes in 802.11-like wireless systems, and to that of online advertisement systems. In both cases, we demonstrate that our algorithms leverage additional observations to significantly improve the system performance. We believe the techniques developed in this paper are of independent interest for other MAB problems, e.g., contextual or graph-structured MAB.-
dc.languageEnglish-
dc.publisherACM Association for Computing Machinery-
dc.titleMulti-armed Bandit with Additional Observations-
dc.typeArticle-
dc.type.rimsART-
dc.citation.volume2-
dc.citation.issue1-
dc.citation.beginningpage1-
dc.citation.endingpage22-
dc.citation.publicationnameProceedings of the ACM on Measurement and Analysis of Computing Systems-
dc.identifier.doi10.1145/3179416-
dc.contributor.localauthorYi, Yung-
dc.contributor.nonIdAuthorYun, Donggyu-
dc.contributor.nonIdAuthorProutiere, Alexandre-
dc.contributor.nonIdAuthorAhn, Sumyeong-
dc.contributor.nonIdAuthorShin, Jinwoo-
dc.description.isOpenAccessN-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0