Crowd counting is one of the most important tasks in visual surveillance applications since it provides useful information such as the number of crowds and their distribution. However, it is very challenging due to severe occlusions, large geometrical deformations, and high visual clutter. To tackle this problem, we propose a novel CNN-based crowd density estimation network consisting of a backbone, decoder, and mapper, and also a multi-step quantization scheme to train the network more effectively. As a backbone network, ResNet is adopted, then the decoder and mapper are added to deal with multi-scale problems of crowd counting and to generate high-resolution density maps. Finally, a multi-step quantization scheme discretizes the continuous space of both predictions and ground truth density maps, and it reduces the search scope of the network and raises their matching ratio. As a result, our method outperforms recent methods in four major datasets.