From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구

dc.description.abstractThis dissertation considers unimodal and multimodal learning with adaptive alignment for 2D-3D visual recognition on images and point clouds. Regarding unimodality on 2D images, we investigate object detection and instance segmentation tasks, which are commonly formulated by a two-stage pipeline of RPN and R-CNN. We propose Cascade RPN with Adaptive Convolution to ensure alignment between features and reference boxes which is required for progressive refinement. For the R-CNN, we revisit Cascade Mask R-CNN and propose SCNet to align sample distribution between training and inference in existing cascade architectures. For unimodality on 3D point clouds, we propose SoftGroup to perform grouping on soft scores to avoid error propagation from hard semantic prediction into instance segmentation. SoftGroup is further extended to SoftGroup++ for scalable 3D instance segmentation with an adaptive strategy to reduce time complexity and search space. Finally, we propose Bird Eye View (BEV) fusion for multimodal object detection that aligns image and point features via BEV projection followed by weighted fusion to address the limitation of sparse points for far objects. Extensive experiments on various standard benchmarked datasets demonstrate the superiority and generality of the proposed methods.-
dc.subjectUnimodal▼aMultimodal▼aAdaptive alignment▼a2D-3D visual recognition▼aDeep neural network-
dc.subject유니모달▼a멀티모달▼a적응적 정렬▼a비주얼 인식 2D-3D▼a딥 뉴럴 네트워크-
dc.titleFrom unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition-
dc.title.alternative2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구-
