In this paper, we first propose a policy gradient reinforcement learning (RL)-based optimal decoupling capacitor (decap) design method for 2.5-D/3-D integrated circuits (ICs) using a transformer network. The proposed method can provide an optimal decap design that meets target impedance. Unlike previous value-based RL methods with simple value approximators such as multi-layer perceptron (MLP) and convolutional neural network (CNN), the proposed method directly parameterizes policy using an attention-based transformer network model. The model is trained through the policy gradient algorithm so that it can achieve larger action space, i.e. search space. For verification, we applied the proposed method to a test hierarchical power distribution network (PDN). We compared convergence results depending on the action space with the previous value-based RL method. As a result, it is validated that the proposed method can cover ×4 times larger action space than that of the previous work.