Recently, the importance of the recommender system has grown as personal preferences become more diversified. This dissertation proposes reinforcement learning algorithms for the recommender system that the system improves the performance by itself over time. I firstly model recommender systems as Markov decision processes that the agent iteratively selects items to recommend among the possible candidates and receives the selection results. In the first part, I introduce a concept of bankruptcy in the recommender system, so the agent needs to consider a trade-off between avoiding bankruptcy and maximizing the benefits. I propose a variant of the multi-armed bandit algorithm, which achieves the asymptotically optimal performances without any bankruptcy. In the second part, I develop a deep reinforcement learning algorithm where the agent recommends multiple items simultaneously. The algorithm takes advantage of graph neural networks and the idea of greedy algorithm to relieve the scalability issue of selecting the best combination of the multiple items once. I theoretically prove that the applied ideas never degrades the performance. The proposed algorithms in the dissertation are tested in various environments, and the results insist that the algorithms successfully improve their item selection policy over time.