We consider ﬂexible decoder implementation of low density parity check (LDPC) codes via compute-uniﬁed-device-architecture (CUDA) programming on graphics processing unit (GPU), a research subject of considerable recent interest. An important issue in LDPC decoder design based on CUDA-GPU is realizing coalesced memory access, a technique that reduces memory transaction time considerably. In previous works along this direction, it has not been possible to achieve coalesced memory access in both the read and write operations due to the asymmetric nature of the bipartite graph describing the LDPC code structure. In this paper, a new algorithm is proposed that
enables coalesced memory access in both the read and write operations for one half of the decoding process – either the bit-to-check or the check-to-bit message passing. For the remaining half of the decoding step our scheme requires address transformation in both the read and write operations but one translating array is sufﬁcient. We also describe the use of on-chip shared memory and texture cache. Overall, experimental results show that proposed GPU-based LDPC decoder achieves more than 234x-speedup compared to CPU-based LDPC decoders and also outperforms existing GPU-based decoders by a signiﬁcant margin.