In the past few years, an exciting progress has been made on CSMA (Carrier Sense Multiple Access) algorithms that achieve throughput and utility optimality for wireless networks. However, most of these algorithms are known to exhibit poor delay performance making them impractical for implementation. Recently, several papers have addressed the delay issue of CSMA and yet, most of them are limited, in the sense that they focus merely on specific network scenarios with certain conditions rather than general network topology, achieve low delay at the cost of throughput reduction, or lack rigorous provable guarantees. In this paper, we focus on the recent idea of exploiting multiple channels (actually or virtually) for delay reduction in CSMA, and prove that it is per-link delay order-optimal, i.e., O(1)-asymptotic-delay per link, if the number of virtual channels is logarithmic with respect to mixing time of the underlying CSMA Markov chain. The logarithmic number is typically small, i.e., at most linear with respect to the network size. In other words, our contribution provides not only a provable framework for the multiple-channel based CSMA, but also the required explicit number of virtual-multi-channels, which is of great importance for actual implementation. The key step of our analytic framework lies in using quadratic Lyapunov functions in conjunction with (recursively applying) Lindley equation and Azuma's inequality for obtaining an exponential decaying property in certain queueing dynamics. We believe that our technique is of broader interest in analyzing the delay performance of queueing systems with multiple periodic schedulers.