DSpace at KOASAS: Learning Collarative Policies to Solve NP-hard Routing Problems

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Learning Collarative Policies to Solve NP-hard Routing Problems

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 59
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Minsu	ko
dc.contributor.author	Kim, Joungho	ko
dc.contributor.author	Park, Jinkyoo	ko
dc.date.accessioned	2023-02-07T07:00:45Z	-
dc.date.available	2023-02-07T07:00:45Z	-
dc.date.created	2023-01-25	-
dc.date.issued	2021-12-06	-
dc.identifier.citation	35th Conference on Neural Information Processing Systems, NeurIPS 2021	-
dc.identifier.uri	http://hdl.handle.net/10203/305073	-
dc.description.abstract	Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).	-
dc.language	English	-
dc.publisher	Neural Information Processing Systems Foundation	-
dc.title	Learning Collarative Policies to Solve NP-hard Routing Problems	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	35th Conference on Neural Information Processing Systems, NeurIPS 2021	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Virtual	-
dc.contributor.localauthor	Kim, Joungho	-
dc.contributor.nonIdAuthor	Park, Jinkyoo	-

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning Collarative Policies to Solve NP-hard Routing Problems

KOASAS

Communities & Collections