Field-programmable gate array (FPGA) is a promising solution in designing a hardware accelerator due to its programming flexibility and fast development cycle. However, FPGA has design restrictions due to the device’s limited hardware resources. To overcome this, latest FPGAs have adopted a multi-die architecture that employs multiple dies in a single package to provide abundant hardware resources with high yield and cost benefit. However, the multi-die architecture causes critical timing issues when signal paths cross the die-boundaries, adding another design challenge in using FPGA. One standard solution is to have enough pipeline registers in the cross-die paths and apply proper floorplanning, but it requires understanding of physical-level design with tedious engineering effort. In this paper, we propose an open-source shell generation framework for high-performance design on multi-die FPGAs, which alleviate tedious engineering efforts for FPGA designers. Based on the user’s design requirement, it generates an optimized shell for the target FPGA via die-level kernel encapsulation, automated system bus pipelining, customized floorplanning, and scalable clocking scheme. To evaluate our shell generation, we compare its implementation results against Xilinx’s Vitis framework. As a result, the framework saves the shell’s logic utilization by 20% on average, guaranteeing the same functionality and maximum external bandwidths for target boards. To show its real-world practicality, we use the framework for the design of machine learning accelerator that contains multiple systolic-array processors. It achieves 22.92% higher memory frequency than Vitis, guaranteeing the same kernel frequency for the accelerator design over 90% logic utilization at once without any back-end engineering effort.