This thesis studies the control of a finite state, discrete time Markov process with only incomplete state observation. This problem is generally called by Partially Observable Markov Decision Process(POMDP). The performance of such system is affected by the measurement quality of state observation, i.e., uncertainty of state. Thus, in order to reduce the uncertainty of state, we have better to obtain additional information concerning every state of Markov process if possible and valuable. Among various cases with different additional information structure, this study focuses on the case that we can obtain uncertain delayed observation of state after one transition. In other words, our interest exists in reducing state uncertainty inherent in general POMDP by using a lagged information and in controlling Markov process with two types of observation obtained from each other information sources. That is, this study could be considered as Markov Decision Process(MDP) with lagged and current partial observations.
This thesis consists of three main parts. First, a finite horizon POMDP with lagged and current partial observations is considered. An algorithm for finding an optimal policy and minimum expected total cost of the policy is developed.
Second, the thesis considers a POMDP with only the current observation for the case in which the system has an infinite number of time period. An algorithm finding an optimal stationary policy that minimizes the expected discounted cost. The algorithm is a modified version of the well known policy iteration algorithm. The modification focuses on the value determination routine of the policy iteration algorithm. Some properties of the approximated functions for the expected discounted cost of a stationary policy are investigated. The expected discounted cost of a stationary policy is approximated based on theses properties. That is, the value determination step adopts with the successive approximation concept.
Lastly, this ...