Significant advances have recently been made in the development of computational methods for predicting 3D scene structure from a single monocular image. However, their computational complexity severely limits the adoption of such technologies to various computer vision and pattern recognition applications. In this paper, we address the problem of inferring 3D scene geometry from a single monocular image of man-made environments. Our goal is to estimate the 3D structure of a scene in real-time with a level of accuracy useful in certain real applications. Towards this end, we decompose the three-dimensional world space into a set of geometrically inspired primitive subspaces. One important advantage of our approach is that the complex estimation problem can be systematically broken down into a sequence of subproblems, which are easier to solve and more reliable even with the presence of occlusion or clutter, without loss of generality. The proposed algorithm also serves as the technical foundation for effective representation of the 3D scene geometry based on a simple description of the textural patterns present in the image and their spatial arrangement. Extensive experiments have been conducted on a large scale challenging dataset of real-world images. Our results demonstrate that the proposed method remarkably outperforms the recent state-of-the-art algorithms with respect to speed and accuracy. (C) 2012 Elsevier Ltd. All rights reserved.