This paper proposes a new K-best detection method that is efficient in area and energy consumption. To reduce the complexity required in tree expansion and sorting, some children of a candidate are not expanded if they are estimated as inferior ones, and they are not considered in the sorting. In addition, we propose an efficient pipeline scheduling called early forwarding to reduce the overall processing latency and the number of pipeline registers. Employing the relaxed expansion and the early forwarding makes it possible to reduce chip area and power consumption as well as latency. Targeting 4 x 4 16 quadrature amplitude modulation systems, a multiple-input-multiple-output detector integrating four K-best detection units is implemented to validate the proposed method. The four units are operating in an interleaved manner to achieve high throughput. In a 0.18-mu m CMOS technology, the entire detector occupies 1.9 mm(2) and shows a throughput of 584 Mbits/s. The energy consumed in the proposed detector is 443 pJ per bit at 1.8 V.