Reducing cache misses without increasing cache associativity is critical for reducing the power consumption and cache access time. This paper has focused on the stack of a program which often occupies more than half of total memory accesses. This paper, as a result, proposes so-called dynamic stack allocation where the stack pointer is shifted at run time to a memory location which is expected to cause least number of cache misses. We implemented the proposed scheme using so-called Dynamic Stack Allocator(DSA) which consists of Cache Miss Predictor(CMP) to compute cache miss probability based on Least Recently Used(LRU) policy and Stack Pointer Manger(SPM) to manage multiple stack locations. We also verified the proposed scheme with both FPGA and ASIC by using iNCITE and Dong-Bu electronics 0.18um process, respectively. Experimental results show that dynamic stack allocation significantly reduces cache misses from 1% to 42% in various benchmarks with relatively small power consumption and no extra delay.