The Click modular router has been one of the
most popular software router platforms for rapid
prototyping and new protocol development. Unfortunately, its internal architecture has not caught up
with recent hardware advancements, and the performance remains sub-optimal in high-speed networks
despite its benefit of flexible module composition.
In this work, we identify the performance bottlenecks of the existing Click router and extend it to
scale with modern computer systems. Our improvements focus on both I/O and computation batching,
and include various optimizations for multi-core
systems and multi-queue network cards. We find
that these techniques improve the performance by
almost a factor of 10, and the maximum throughput
reaches 28 Gbps of minimum-sized IPv4 packet
forwarding speed on a single machine.