The automated guided vehicle (AGV), a typical form of automated material handling system, generally utilizes electric power from an internally mounted battery pack. AGVs need to occasionally visit a battery station and swap the battery to manage their state of charge. An AGV system therefore needs a swapping policy, which determines when a vehicle should proceed to a battery station for battery replacement. In real industrial practice, most swapping policies are conservative and are based heuristically on the experiences of decision makers, which results in production inefficiency. The objective of this research is to develop a swapping strategy to improve the AGV system production efficiency. The proposed swapping policy is based on sequential decisions that consider current and future situations, and utilizes a Markov decision process framework and deep reinforcement learning. We present the results of numerical experiments to demonstrate the superior performance of the proposed swapping policy compared with heuristic policies. We also analyze the properties of the proposed swapping policy, and the results demonstrate its application potential for AGV systems.