In statistical parametric speech synthesis, a mixture density network is employed to address the limitations of a linear output layer such as pre-computed fixed variances and the unimodal assumption. However, it also has a defect, i.e., it cannot deploy a static-dynamic constraint needed in the training phase for high-quality speech synthesis. To cope with this problem, this paper proposes a training algorithm based on the minimum trajectory error for a mixture density network. And a modulation spectrum-constrained loss function is also proposed to alleviate the over-smoothing effect. The experimental results confirm meaningful improvement both in objective and subjective performance measures.