Computers need to provide information in a way that users can anticipate reliably and react quickly to them. Such temporal associations between output and input reflect a good design and are useful to evaluate user interfaces. However, there has been no general technique for evaluating such temporal associations that would work both in the lab and in-the-wild. We propose a method to estimate the temporal association between a user's input and a computer's output using only a screen capture video and button input logs. In response to a visual stimulus generated from a pixel on the screen, the user's button input may be a simple reaction, or a result of anticipating, or independent of the visual stimulus. Through the expectation-maximization (EM) algorithm, we estimate which association type an output-to-input pair belongs to and also the parameters of the corresponding likelihood distribution. In the first study, we demonstrated that our method could analyze and yield multiple estimates distinguishing different conditions in the self-expanding target acquisition task. In the second study, we found that our estimates correlated with the game score, which is the high-level index of the commercial game. Our method is able to analyze the temporal association in screen-based interactive systems, and provides estimates that can predict the high-level indicator.