Superpixel generation is a common preprocessing step in vision processing aimed at dividing an image into non-overlapping regions. Simple Linear Iterative Clustering (SLIC) is a commonly used superpixel algorithm that offers a good balance between performance and accuracy. However, the algorithm's high computational and memory bandwidth requirements result in performance and energy efficiency that do not meet the requirements of real-time embedded applications. In this work, we explore the design of an energy-efficient superpixel accelerator for real-time computer vision applications. We propose a novel algorithm, Subsampled SLIC (S-SLIC), that uses pixel subsampling to reduce the memory bandwidth by 1.8×. We integrate S-SLIC into an energy-efficient superpixel accelerator and perform an in-depth design space exploration to optimize the design. We completed a detailed design in a 16nm FinFET technology using commercially-available EDA tools for high-level synthesis to map the design automatically from a C-based representation to a gate-level implementation. The proposed S-SLIC accelerator achieves real-time performance (30 frames per second) with 250× better energy efficiency than an optimized SLIC software implementation running on a mobile GPU.