In this paper, we propose a method to track a multi-view camera for modeling indoor environment without calibration patterns. A multi-view camera is more convenient for modeling background or objects in speed and usage than expensive 3D scanner. However it requires a good initial pose and motion of a multi-view camera because the initial pose has an effect on overall accuracy. Thus, we use structural constraints of a multi-view camera and coplanar calibration pattern to provide a good initial poses. Then, we estimate camera motion by calculating rigid-body transformation between corresponding 3D points in each point clouds set. Finally we perform bundle adjustment in order to optimize all poses of the camera. Since it gives absolute camera motion in a room without scene constraints, the proposed technique is more useful than conventional pose estimation for modeling indoor environment. The proposed method can be used to accurately augment virtual objects.