We introduce a new method to find a salient viewpoint with a deep representation, according to ease of semantic segmentation. The main idea in our segmentation network is to utilize the multipath network with informative two views. In order to collect training samples, we assume all the information of designed components and even error tolerances are available. Before installing the actual camera layout, we simulate different model descriptions in a physically correct way and determine the best viewing parameters to retrieve a correct instance model from an established database. By selecting the salient viewpoint, we better understand fine-grained shape variations with specular materials. From the fixed top-view, our system initially predicts a 3-DoF pose of a testing object in a data-driven way, and precisely align the model with a refined semantic mask. Under various conditions of our system setup, the presented method is experimentally validated. A robotic assembly task with our vision solution is also successfully demonstrated.