A 195 mW, 9.1 Mvertices/s fully programmable 3-D graphics processor is designed and implemented for mobile devices. The mobile unified shader provides programmable per-vertex operations and per-pixel operations in a single hardware and thus, it achieves 35% area and 28% power reduction compared with previous architecture. The pixel-vertex multi-threading enhances the 3-D graphics performance by enabling to compute the per-vertex operations and the per-pixel operations at the same time. By adopting the pixel-vertex multi-threading, 94% of the per-vertex operations are interleaved into the per-pixel operations and enhances 3-D graphics performance in real applications. The logarithmic lighting engine and specialized lighting instruction improve the vertex throughput including transform and OpenGL lighting up to 9.1 Mvertices/s, which is 2.5 times higher performance compared with previous works. The proposed 3-D graphics processor is implemented in 3.3 mm x 3.0 mm using 0.13 mu m CMOS process and it was successfully demonstrated on the system evaluation board.