A new discrete wavelet transform (DWT) architecture is proposed to realize a memory-efficient 2-D DWT processor. The proposed DWT processor conforms to dual-line scanning to remove the transpose buffer. In the previous single-line DWT architectures, the transpose buffer size is proportional to the row size of the image. The conventional dual-line DWT architecture is constructed by using the convolution-based filter structure and replicates registers to alternatively deal with two lines, resulting in a long delay, as well as a number of operators and registers. The proposed architecture is based on the lifting-based DWT to embed the additional registers in the middle of the DWT operation. In addition, the computation topology is optimized for the proposed dual-line DWT architecture to achieve almost the same hardware cost and critical path as the single-line DWT architecture.