The two-dimensional discrete cosine transform (2-D DCT) has been widely recognized as a key processing unit for image data compression/decompression. In this paper, the implementation of a 200 MHz 13.3 mm(2) 8 x 8 2-D DCT macrocell capable of HDTV rates, based on a direct realization of the DCT, and using distributed arithmetic is presented. The macrocell, fabricated using 0.8 mu m base-rule CMOS technology and 0.5 mu m MOSFET's, performs the DCT processing with 1 sample-(pixel)-per-clock throughput. The fast speed and small area are achieved by a novel sense-amplifying pipeline flip-flop (SA-F/F) circuit technique in combination with nMOS differential logic. The SA-F/F, a class of delay flip-flops, can be used as a differential synchronous sense-amplifier, and can amplify dual-rail inputs with swings lower than 100 mV. A 1.6 ns 20 bit carry skip adder used in the DCT macrocell, which was designed by the same scheme, is also described. The adder is 50% faster and 30% smaller than a conventional CMOS carry look ahead adder, which reduces the macrocell size by 15% compared to a conventional CMOS implementation.