DSpace at KOASAS: Learning to Schedule Network Resources Throughput and Delay Optimally Using Q(+)-Learning

DSpace at KOASAS

RIMS Collection RIMS Journal Papers

Learning to Schedule Network Resources Throughput and Delay Optimally Using Q(+)-Learning

Cited 12 time in

Cited 0 time in

Hit : 336
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Bae, Jeongmin	ko
dc.contributor.author	Lee, Joohyun	ko
dc.contributor.author	Chong, Song	ko
dc.date.accessioned	2021-05-25T08:50:24Z	-
dc.date.available	2021-05-25T08:50:24Z	-
dc.date.created	2021-05-25	-
dc.date.created	2021-05-25	-
dc.date.created	2021-05-25	-
dc.date.created	2021-05-25	-
dc.date.issued	2021-04	-
dc.identifier.citation	IEEE-ACM TRANSACTIONS ON NETWORKING, v.29, no.2, pp.750 - 763	-
dc.identifier.issn	1063-6692	-
dc.identifier.uri	http://hdl.handle.net/10203/285347	-
dc.description.abstract	As network architecture becomes complex and the user requirement gets diverse, the role of efficient network resource management becomes more important. However, existing throughput-optimal scheduling algorithms such as the max-weight algorithm suffer from poor delay performance. In this paper, we present reinforcement learning-based network scheduling algorithms for a single-hop downlink scenario which achieve throughput-optimality and converge to minimal delay. To this end, we first formulate the network optimization problem as aMarkov decision process ( MDP) problem. Then, we introduce a new state-action value function called Q(+)-function and develop a reinforcement learning algorithm called Q(+)-learning with UCB (Upper Confidence Bound) exploration which guarantees small performance loss during a learning process. We also derive an upper bound of the sample complexity in our algorithm, which is more efficient than the best known bound from Q-learning with UCB exploration by a factor of gamma(2) where gamma is the discount factor of the MDP problem. Finally, via simulation, we verify that our algorithm shows a delay reduction of up to 40.8% compared to the max-weight algorithm over various scenarios. We also show that the Q(+)-learning with UCB exploration converges to an gamma-optimal policy 10 times faster than Q-learning with UCB.	-
dc.language	English	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Learning to Schedule Network Resources Throughput and Delay Optimally Using Q(+)-Learning	-
dc.type	Article	-
dc.identifier.wosid	000641964600020	-
dc.identifier.scopusid	2-s2.0-85100475440	-
dc.type.rims	ART	-
dc.citation.volume	29	-
dc.citation.issue	2	-
dc.citation.beginningpage	750	-
dc.citation.endingpage	763	-
dc.citation.publicationname	IEEE-ACM TRANSACTIONS ON NETWORKING	-
dc.identifier.doi	10.1109/TNET.2021.3051663	-
dc.contributor.localauthor	Chong, Song	-
dc.contributor.nonIdAuthor	Lee, Joohyun	-
dc.description.isOpenAccess	N	-
dc.type.journalArticle	Article	-
dc.subject.keywordAuthor	Network resource management	-
dc.subject.keywordAuthor	throughput and delay optimality	-
dc.subject.keywordAuthor	reinforcement learning	-
dc.subject.keywordAuthor	upper confidence bound	-
dc.subject.keywordPlus	ALLOCATION	-
dc.subject.keywordPlus	OPTIMIZATION	-
dc.subject.keywordPlus	MODEL	-

Appears in Collection: AI-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 12 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning to Schedule Network Resources Throughput and Delay Optimally Using Q(+)-Learning

This item is cited by other documents in WoS

KOASAS

Communities & Collections