Light-weight encoder-decoder network for depth estimation거리측정을 위한 인코더-디코더 네트워크 경량화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor김준모-
dc.contributor.authorKim, S. Hyejin-
dc.contributor.author김혜진-
dc.date.accessioned2024-07-22T19:30:08Z-
dc.date.available2024-07-22T19:30:08Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1044773&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320305-
dc.description학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2022.2,[vi, 53 p. :]-
dc.description.abstractThis dissertation deals with lightweight encoder-decoder structured depth estimation. Through the thesis research, it is found that local texture information is very important even in the last layer of the network for the ligthweight depth estimation network, unlike other lightweight methods of computer vision. In addition, it is found that long range shape information is also important for network performance improvement. Based on this knowledge, this thesis designs RRNet to capture long range shape information by increasing the number of layers without additional layer parameter cost due to RR blocks. In addition, we propose Condensed Dense Connection(CDC) that enables to preserving lightweight local texture information through dense connection and reduced the weight of the decoder by 16 times to the base model. Moreover, CDC plays a regularization role at training the parameter shared RR block. In addition, this network works well on TX2, a mobile GPU. Compared to other compatible networks, the amount of computation and number of parameters is significantly less, and the network shows quite fast performance in terms of computation speed. On CPU, the proposed RRNet can run as fast as the network without depthwise convolution. Recently, recent depth estimation has developed to use the pretrained encoder from ImageNet classification. According to this trend, the second proposed method is a lightweight decoder, which can be applied to various encoders, so that its performance can be incrementally improved as the encoder will be enhanced. The proposed lightweight decoder method utilizes axial attention~\cite{wang2020axial}, which is one of self-attention approaches that are known to take long range shape information. However, this method causes local texture destroyed when all convolutions are replaced with axial attention. Axial attention is applied to all layers.~\cite{wang2020axial} in image segmentation or classification, where their performance has improved because these application do not deal with local texture at the end of the network. In order to overcome this texture vanishing problem this study places the axial attention layer at the front end of the decoder due to the study of StyleGAN, in which the generator fetched the shape features in the first and second layer. In addition, in order to achieve the same effect as applied to multiple layers while not losing local shape information by applying axial attention with as few strokes as possible, upsampling was performed 8 times at a time and the upsampled values are brought from axial attention. By doing this, this thesis proposes a lightweight decoder network that preserves both long range shape information and local texture well. The proposed lightweight study evaluates its performances on the NYU v2 dataset and the KITTI dataset, and the performance has much improved on KITTI greatly. This fact confirms that the proposed method preserves long range shape information well because KITTI has homogeneous and long range shaped objects such as street and wall etc. Finally, this lightweight depth estimation network has been expected to have high utility in a manufacturing environment. So, a dimension measurement is studied by using depth estimation. Dimension estimates in a manufacturing environment are discontinuous. However, depth estimation is a kind of regression problem in general. In addition, it is difficult to measure the exact dimensions depending only on the texture, as the textures in manufacturing objects are much more homogeneous than other situations. To overcome this problem, this thesis proposes a magnifier loss to amplify the minute changes in texture so that accurate dimension can be measured well.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject인코더-디코더 네트워크▼a거리측정 네트워크 경량화▼a연산량▼a거리측정기술▼aCPU▼a모바일 GPU▼a원거리 형태 정보▼a국부 텍스쳐▼a치수측정▼a제조-
dc.subjectEncoder-decoder network▼aLightweight depth estimation▼aComputation▼aCPU▼aMobile Graphical Processing Unit (GPU)▼aLong range shape▼alocal texture▼adimension measurement▼amanufacturing-
dc.titleLight-weight encoder-decoder network for depth estimation-
dc.title.alternative거리측정을 위한 인코더-디코더 네트워크 경량화-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :로봇공학학제전공,-
dc.contributor.alternativeauthorKim, Junmo-
Appears in Collection
RE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0