The visual object tracking problem in a crowd scene has many challenges such as occlusion, similar objects and complex motion. This study presents a system of which modules are composed of feature tracking and detection methods. The proposed system fuses the two modules by converting the incomparable responses into a same metric domain. According to an explicit combining rule, the results of the modules are combined and learned only when the two modules produce consistent results.
The performance of the proposed algorithm was quantitatively validated and was compared with other modern visual trackers on i-Lids dataset.