A unified vision and graphics processor with three layers is shown to provide a fast pipeline for augmented reality. In the image-level layer, a 153.6 GOPS massively parallel processing unit with eight SIMD processors, each containing 128 processing elements, performs highly data-parallel operations. In the sub-image layer, a rasterizer and a pixel arranger respectively generate and reduce data-level parallelism. In the descriptor-level layer, a pose estimation engine executes sequential programs. Our processor can provide images for augmented reality at 100 fps, for a power consumption of 413 mW. This is 39% faster than a comparable smartphone implementation. Our chip is fabricated in a 0.18 mu m CMOS process and contains 0.95 M gates.