In this paper, a 3-D vertex processor with a floating-point four-threaded and four-issue expanded VLIW architecture and vertex caches for mobile multimedia applications is proposed. The multi-threaded datapath prevents data hazards, and the multi-issue expanded VLIW architecture enables the processor to have an opportunity to execute instructions in parallel and a well-balanced way. The efficient vertex caches are proposed and implemented for the embedded vertex processors to accelerate its geometry operations and to save bandwidth between hosts and vertex processors. The proposed architecture with the vertex caches reduces the average total energy dissipation of 44.7% compared to a conventional single-threaded SIMD architecture, and the proposed vertex processor achieves 120 Mvertices/s of geometry performance which is 3.3 times faster than the previous result, and it supports OpenGL ES 2.0 and Vertex Shader Model 3.0. The processor is implemented in a 0.18-mu m 1P4M CMOS process, and the operating frequency is 100 MHz.