In this paper, a high performance asynchronous on-chip bus designed in a Globally Asynchronous Locally Synchronous (GALS) style is proposed. The asynchronous on-chip bus is capable of handling multiple outstanding transactions and in-order completion to achieve a high performance, which is implemented with distributed and modularized control unit in a layered interface. The architecture of asynchronous on-chip bus is discussed and implemented for simulations. Simulation results show that throughput of the proposed asynchronous on-chip bus with multiple outstanding transactions and in-order transaction completion is increased by 31.3%, while power consumption overhead is only 6.76%, as compared to an asynchronous on-chip bus with a single outstanding transaction.