This paper presents a scalable inference accelerator called a deep-learning specific instruction-set processor (DSIP) to support various convolutional neural networks (CNNs). For CNNs requiring a large amount of computations and memory accesses, a programmable inference system called master-slave instruction set architecture (ISA) is newly proposed to achieve high flexibility, processing speed, and energy efficiency. The master is responsible for sending and receiving feature maps in order to deal with neural networks in a scalable way, and the slave performs CNN operations, such as multiply accumulate, max pooling, and activation functions, on the features received from the master. The master-slave ISA maximizes computation speed by overlapping the off-chip data transmission and the CNN operations, and reduces power consumption by performing the convolution incrementally to reuse input and partial-sum data as maximally as possible. An inference system can be configured by connecting multiple DSIPs in a form of either 1-D or 2-D chain structure in order to enhance computation speed further. To evaluate the proposed accelerator, a prototype chip is implemented and evaluated for AlexNet. Compared to the state-of-the-art accelerator, the DSIP-based system enhances the energy efficiency by 2.17x.