This paper presents a parallel multiuser detector architecture for low-latency interleave division multiple access. To enable P-parallel processing, an interleaving pattern is divided into P disjoint subpatterns, and all the subpatterns are designed to be identical without degrading error-rate performance noticeably. Since the subpatterns are all disjoint, they can be processed in parallel. Besides, by exploiting that they access the same address of separate memory banks at the same time, the banks are integrated into one to minimize the silicon area and the power consumption. As a result, the proposed architecture reduces the latency by a factor of P at the expense of a little hardware overhead. A prototype 2-parallel 16-user detector in a 65-nm CMOS completes the entire detection procedure two times earlier than the state-of-the-art nonparallel detector, while occupying only 12% more silicon area and dissipating 20% more power.