Multi-target tracking is an import task within the field of computer vision and is used for automatic surveillance, traffic monitoring, image-based navigation, multi-agent mission and so on. Successful implementation of these multi-target tracking applications should be supported by the ability to continuously detect targets. Therefore, it is necessary for tracking systems to have the capability of detecting targets continuously under various conditions. In this regard, convolutional neural networks (CNNs) have been applied to address the multi-target detection problem.
CNNs are deep neural network (DNN) architectures, which have been developed as an effective class of models for image interpretation, and they have been shown to achieve state-of-art result in image recognition and object region proposal. Therefore, this thesis proposes a combination of CNN-based multi-target detection and a conventional multi-target tracking algorithm for monocular vision systems. CNN-based target detection methods use pre-trained target information so that a target can be detected and classified even if the background image is dynamically changed. Also, based on the feature information of the pre-trained target, the relative distance to the target and the posture of the target can be recognized so that the information obtained using monocular vision can be maximized.
This thesis proposes CNN architectures that can recognize the positions of a single target or multiple targets using only monocular vision sensors. The target information obtained using the proposed CNN model is input as measurements of the linear/nonlinear tracking filter that tracks single-/multi-targets. To track multiple targets accurately, the measurements obtained by CNN should be appropriately assigned to each target. The nearest neighbor method is used for this data association procedure. In this work, three-dimensional simulators were developed and used to analyze the performance of the proposed method. Using these simulators, a virtual indoor flight environment and a virtual space environment were implemented and images for training the proposed CNN model were generated. In addition to 3D simulation, experiments were conducted to analyze the performance of multi-target detecting and tracking in indoor and outdoor environments.