Advances in deep learning based object detection methods have achieve state-of-the-art detection accuracy in real-time using high-end GPUs. Their application to low-power computing systems (e.g. embedded GPUs on UAVs) is severely limited due to high computational requirements. We train a reinforcement learning agent to decide whether to perform object detection or tracking on a given image to maximize accuracy over execution time using visual differences between input frames. We validate our dynamic detection-tracking switching method on the Stanford Drone datasets for both detection accuracy and speed. Our model obtains comparable accuracy to the detector-only approach while obtaining 4x speedups.