DSpace at KOASAS: 제약을 갖는 POMDP를 위한 휴리스틱 검색 가치 반복 알고리즘

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

제약을 갖는 POMDP를 위한 휴리스틱 검색 가치 반복 알고리즘Heuristic search value iteration for constrained POMDPs

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 604
Download : 0

Export

고봉석 / Goh, Bong-Seok

본 논문에서는 제약을 갖는 부분 관찰 의사결정 과정(constrained partially observable Markov decision process; CPOMDP)을 위한 휴리스틱 검색 가치 반복(heuristic search value iteration; HSVI) 알고리즘인 CHSVI (constrained HSVI)를 제안한다. HSVI는 부분 관찰 의사결정 과정(partially observable Markov decision process; POMDP)의 최적 정책(optimal policy)을 구하는 효율적인 알고리즘 중 하나이다. HSVI는 점-기반 백업(point-based backup)을 통해 최적 정책을 구한다. 점-기반 백업에 사용될 상태확률분포(belief)를 수집하기 위해 가치 함수(value function)의 상계(upper bound)와 하계(lower bound)를 이용하는 휴리스틱 탐색을 수행한다. CHSVI에서도 이와 마찬가지로 가치 함수의 상계와 하계를 이용하여 휴리스틱 탐색을 수행하는데 이 때 제약이 고려된 상계와 하계를 나타낼 필요가 있다. 또한 CPOMDP의 최적 정책이 제약을 갖는 MDP (constrained Markov decision process; CMDP)와 같이 비결정적 정책(randomized policy)일 수 있으므로 이를 고려해야 한다. 본 논문에서는 CPOMDP 가치 함수의 상계와 하계에 대한 표현, 초기화, 측정, 갱신 등을 다루며, 비결정적 정책이 반영된 휴리스틱 탐색을 제안한다. 이를 기반으로 CHSVI 알고리즘을 제시하고 CHSVI로부터 얻어지는 최적 정책의 성능을 실험을 통해 확인한다.

Advisors: 김기응 researcher; Kim, Kee-Eung

Description: 한국과학기술원 : 전산학과,

Publisher: 한국과학기술원

Issue Date: 2013

Identifier: 567066/325007 / 020114292

Language: kor

Description: 학위논문(석사) - 한국과학기술원 : 전산학과, 2013.8, [ iii, 24 p. ]

Keywords: 강화학습; reinforcement learning

URI: http://hdl.handle.net/10203/196869

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=567066&flag=dissertation

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

제약을 갖는 POMDP를 위한 휴리스틱 검색 가치 반복 알고리즘Heuristic search value iteration for constrained POMDPs

KOASAS

Communities & Collections