For secure computing against malicious attacks, symmetric security algorithms are commonly deployed on high-performance embedded systems such as network routers, database servers, UTM systems, etc. Consequently, high-performance security algorithms are critical in order not to degrade overall performance of those systems. This paper aims at optimizing block cipher algorithms by fully utilizing hardware resources. The target algorithms in this paper are ARIA, the Korean standard block cipher, and AES algorithms, which are used on those embedded systems for high performance. For this end, this paper proposes several low-level techniques for improving performance of ARIA and AES at the software level.
In order to enhance performance of ARIA, we propose three techniques. First, we apply software pipelining technique into ARIA so as to enhance the instruction-level parallelism. Second, we design 64-bit S-box to utilize the 64-bit processing and reduce the number of instructions. Finally, low-level optimization techniques are applied to reduce instructions and instruction dependencies. By combining all three techniques, we are able to improve the ARIA performance up to 42 percent over a compiler-generated optimal code.
There are two main efforts to improve performance of AES algorithm. The first effort is to optimize AES at assembly level, and the second way to apply the bitslicing technique which is based on SSE technology and supports multiple block encryption process. Since the former technique uses general purpose registers and the latter one does XMM registers, this paper proposes to overlap those techniques so as to maximize utilization of CPU resources.