In this dissertation, I focus on autoregressive model among neural network-based automatic transcription models. The piano has a characteristic that all sounds are generated only by the note onset and the continuation of the note that occurred in advance, so it is expected that the autoregressive model will have an advantage in inducing a causal relationship in frame-by-frame prediction. I designed the autoregressive prediction model based on a model combining acoustic module and music language module. In order to take advantage of the characteristics of the autoregressive model, a model capable of real-time operation was designed using a unidirectional RNN, and methods to overcome the disadvantages of the autoregressive model, which receives less information and is vulnerable to exposure bias compared to models using a bidirectional RNN, were suggested. For stable learning, I propose a network and learning method that expresses the states of notes in more detail and effectively utilizes recursive information. In addition to this, I induce the model to learn the invariance of the pitch shifting of the piano and the independence of each pitch. To this end, in the acoustic module, neurons are separated for each pitch, and each pitch is processed through a shared network. The music language model is also simplified to model the state progression of each pitch note. As a result, it was shown that the autoregressive model can also produce high performance when appropriately adjusted, and the hypothetically presented factors also showed an effect on performance improvement. In order to confirm the practical performance of the proposed model, the model was verified with multiple datasets with varied recording environments. The effectiveness of the proposed elements were examined through a note-level detailed analysis. The proposed model operated in real time with low complexity and showed equivalent performance to non-real-time models.2018