This paper presents a vision-based tracking algorithm for real-time drone. This method consists of cnn based object detection and object tracking using the result of detector. The detector outputs a class label and a binary mask of the object. The tracker uses this binary mask to extract object features from the background. We use this information to estimate the accurate target location and tracking the target to each frame considering the similarity between target and each detected object feature vector. We validate this method using real-time drone.