In this work, we present a highly scalable network intrusion detection system on many-core processors. To maximize the NIDS performance, we take advantage of the underlying hardware and adhere to four design principles: shared-nothing architecture, computation offoading, lightweight data structure, and flow offoading. Through the experimental results, we find that our design choices can significantly improve the NIDS performance (79 Gbps with 1514B synthetic packets). We believe that our design decisions can be easily extended to other many-core processors and programmable NICs.