We present a hybrid approach that segments an object by using both color and depth information obtained from views captured from a low-cost RGBD camera and sparsely-located color cameras. Our system begins with generating dense depth information of each target image by using Structure from Motion and Joint Bilateral Upsampling. We formulate the multi-view object segmentation as the Markov Random Field energy optimization on the graph constructed from the superpixels. To ensure inter-view consistency of the segmentation results between color images that have too few color features, our local mapping method generates dense inter-view geometric correspondences by using the dense depth images. Finally, the pixel-based optimization step refines the boundaries of the results obtained from the superpixel-based binary segmentation. We evaluate the validity of our method under various capture conditions such as numbers of views, rotations, and distances between cameras. We compared our method with the state-of-the-art methods that use the standard multi-view datasets. The comparison verified that the proposed method works very efficiently especially in a sparse wide-baseline capture environment.