Training deep neural network (DNN) models is a resource-intensive, iterative process. For this reason, nowadays, complex optimizers like Adam are widely adopted as it increases the speed and efficiency of training. These optimizers, however, employ additional variables and raise the memory demand 2× to 3× of model parameters, worsening the memory capacity bottleneck. Moreover, as the size of DNN models is projected to grow even further, it is not practical to assume that the future models will fit in accelerator memory. This has triggered various efforts to offload models to flash-based storage. However, when the model, especially the optimizer, is offloaded to flash, the limited I/O bandwidth severely slows down the overall training process. To this end, we present OptimStore, a solid-state drive (SSD) system with on-die processing (ODP) architectures for gradient descent-based machine learning models. OptimStore accelerates the training process of such large-scale models by processing model optimization in the storage device, specifically inside the flash dies. ODP capability of OptimStore eliminates the heavy data movement over external interconnect and internal flash channels. Overall, OptimStore achieves, on average, a 2.8× speedup and a 3.6× improved energy efficiency in the weight update stage over baseline SSD offloading.