Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection

Cited 27 time in webofscience Cited 0 time in scopus
  • Hit : 336
  • Download : 0
Multispectral pedestrian detection has received great attention in recent years as multispectral modalities (i.e. color and thermal) can provide complementary visual information. However, there are major inherent issues in multispectral pedestrian detection. First, the cameras of the two modalities have different field-of-views (FoVs), so that image pairs are often miscalibrated. Second, modality discrepancy is observed, because image pairs are captured at different wavelengths. In this paper, to alleviate these issues, we propose a new uncertainty-aware multispectral pedestrian detection framework. In our framework, we consider two types of uncertainties: (1) Region of Interest (RoI) uncertainty and (2) predictive uncertainty. For the miscalibration issue, we propose RoI uncertainty which represents the reliability of the RoI candidates. With the RoI uncertainty, when combining two modal features, we devise uncertainty-aware feature fusion (UFF) module to reduce the effect of RoI features with high RoI uncertainty. We also propose uncertainty-aware cross-modal guiding (UCG) module for the modality discrepancy. In the UCG module, we use the predictive uncertainty, which indicates how reliable the prediction of the RoI feature is. Based on the predictive uncertainty, the UCG module guides the feature distribution of high predictive uncertain (less reliable) modality to resemble that of low predictive uncertain (more reliable) modality. The UCG module can encode more discriminative features by guiding feature distributions of two modalities to be similar. With comprehensive experiments on the public multispectral datasets, we verified that our method reduces the effect of the miscalibration and alleviates the modality discrepancy, outperforming existing state-of-the-art methods.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Issue Date
2022-03
Language
English
Article Type
Article
Citation

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, v.32, no.3, pp.1510 - 1523

ISSN
1051-8215
DOI
10.1109/TCSVT.2021.3076466
URI
http://hdl.handle.net/10203/292815
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 27 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0