Virtually every vendor of {\em digital signal processors}~(DSPs) supports a Harvard architecture, which provides on-chip multi-memory banks that allow the processor to access multiple words of data from memory in a single instruction cycle. Also, all existing {\em fixed-point} DSPs are known to have irregular architecture with {\em heterogeneous} registers, which contains multiple register files that are distributed and dedicated to different sets of instructions. Although there have been several studies conducted to efficiently assign data to multi-memory banks, most of them assumed processors with relatively simple, homogeneous general-purpose registers. Therefore, several vendor-provided compilers for DSPs that I examined were unable to efficiently assign data to multiple data memory banks; thereby often failing to generate highly optimized code for their machines. As a consequence, programmers for these DSPs often manually assign program variables to memories so as to fully utilize multi-memory banks in their code. This paper reports our recent attempt to address this problem by presenting an algorithm that helps the compiler to efficiently assign data to multi-memory banks. Our algorithm differs from previous work in that it assigns variables to memory banks in separate, {\em decoupled} code generation phases, instead of a single, tightly-coupled phase. The experimental results have revealed that our decoupled algorithm greatly simplifies our code generation process; thus our compiler runs extremely fast, yet generates target code that is comparable in quality to the code generated by a coupled approach. I also presented a runtime environment for dual data memory banks and runtime memory optimization technique. Because this algorithm can be used for decreament of runtime memory, as a result, larger program can be executed in on-chip memory. Therefore, we can get performance enhancement in large programs.