This paper investigates a common task requiring temporal precision: the selection of a rapidly moving target on display by invoking an input event when it is within some selection window. Previous work has explored the relationship between accuracy and precision in this task, but the role of visual cues available to users has remained unexplained. To expand modeling of timing performance to multimodal settings, common in gaming and music, our model builds on the principle of probabilistic cue integration. Maximum likelihood estimation (MLE) is used to model how different types of cues are integrated into a reliable estimate of the temporal task. The model deals with temporal structure (repetition, rhythm) and the perceivable movement of the target on display. It accurately predicts error rate in a range of realistic tasks. Applications include the optimization of difficulty in game-level design.