Use of lagged information in partially observablemarkov decision process간접관측이 가능한 마코브 의사 결정과정의 지연정보 이용

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 578
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorKim, Soung-Hie-
dc.contributor.advisor김성희-
dc.contributor.authorJeong, Byung-Ho-
dc.contributor.author정병호-
dc.date.accessioned2011-12-14T02:37:25Z-
dc.date.available2011-12-14T02:37:25Z-
dc.date.issued1989-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=61352&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/40393-
dc.description학위논문(박사) - 한국과학기술원 : 산업공학과, 1989.2, [ [iv], 124 p. ]-
dc.description.abstractThis thesis studies the control of a finite state, discrete time Markov process with only incomplete state observation. This problem is generally called by Partially Observable Markov Decision Process(POMDP). The performance of such system is affected by the measurement quality of state observation, i.e., uncertainty of state. Thus, in order to reduce the uncertainty of state, we have better to obtain additional information concerning every state of Markov process if possible and valuable. Among various cases with different additional information structure, this study focuses on the case that we can obtain uncertain delayed observation of state after one transition. In other words, our interest exists in reducing state uncertainty inherent in general POMDP by using a lagged information and in controlling Markov process with two types of observation obtained from each other information sources. That is, this study could be considered as Markov Decision Process(MDP) with lagged and current partial observations. This thesis consists of three main parts. First, a finite horizon POMDP with lagged and current partial observations is considered. An algorithm for finding an optimal policy and minimum expected total cost of the policy is developed. Second, the thesis considers a POMDP with only the current observation for the case in which the system has an infinite number of time period. An algorithm finding an optimal stationary policy that minimizes the expected discounted cost. The algorithm is a modified version of the well known policy iteration algorithm. The modification focuses on the value determination routine of the policy iteration algorithm. Some properties of the approximated functions for the expected discounted cost of a stationary policy are investigated. The expected discounted cost of a stationary policy is approximated based on theses properties. That is, the value determination step adopts with the successive approximation concept. Lastly, this ...eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.titleUse of lagged information in partially observablemarkov decision process-
dc.title.alternative간접관측이 가능한 마코브 의사 결정과정의 지연정보 이용-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN61352/325007-
dc.description.department한국과학기술원 : 산업공학과, -
dc.identifier.uid000835365-
dc.contributor.localauthorKim, Soung-Hie-
dc.contributor.localauthor김성희-
Appears in Collection
IE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0