Carry-save-adder (CSA) is one of the most widely used components for fast arithmetic in industry. This paper provides a solution to the problem of finding an optimal-timing allocation of CSAs in arithmetic circuits. Namely, we present a polynomial time algorithm which finds an optimal-timing CSA allocation for a given arithmetic expression. We then extend our result for CSA allocation to the problem of optimizing arithmetic expressions across the boundary of design hierarchy by introducing a new concept, called auxiliary ports. Our algorithm can be used to carry out the CSA allocation step optimally and automatically and this can be done within the context of a standard RTL synthesis environment.