Our final goal is grasping the object in underwater based on the RGB camera installed in AUV. In this thesis, we made the base framework for grasping the object. So we propose the method of improving underwater object detection and pose estimation sequentially.
Firstly, underwater images are affected by the various optical variation such as color distortion, intensity degeneration, haze, and so on. So to recognize the object in underwater images, we have to add additional process to remove the optical variation for the accurate object detection. Especially, in deep learning-based object detection model, the training set applied above process is the most effective to obtain the outstanding performance. In this thesis, we propose the novel method of generating the underwater dataset. This dataset reflects the various optical conditions which are color distortion, intensity degeneration, haze effect. Also, the object occlusion is included in our dataset generation process. In the experiment, we evaluate the suitability of our dataset for the underwater environment to determine if our dataset reflects the underwater environment.
As the next step, we introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input. Our approach initially trains a Variational AutoEncoder (VAE) to learn the code for each object, which is then further refined by a novel rotational primitive decoder. Doing so substantially improves the orientation estimation in a direct regression fashion as well as overall pose estimation performance. To better capture the representation of the learned code, we concatenate the sampled codes prior to the orientation estimation. Lastly, translation is inferred using an object relocalization module. Because of the enhanced rotational discriminative code, high accuracy is achieved for symmetric and occluded objects. In addition, to make a more accurate pose estimation result, we propose RGB-based pose refinement network.