Orthogonal Frequency Division Multiplexing (OFDM) has gained considerable attention in recent years. In this paper, we have implemented various FFT architectures on 64-point FFT and present its implementation results. Implementation was processed as two ways. One is HW implementation based on FPGA chip, Xilinx Virtex2 xc2v6000, the other is SW implementation based on DSP chip and ARM core, which is TMS320C6416 and ARM922T. Various FFT architectures are implemented by verilogHDL on xc2v600 Conventional FFT codes and algorithms are coded by C/C++ language on TMS320C64 and ARM922T. The minimum processing times for 64-point FFT were 0.0167 us, 4.58 us and 28.27 us on xc2v6000, TMS320C6416, and ARM922T, respectively. We showed that the criteria point among FPGA, DSP, and ARM in terms of latency, and presented appropriate target devices according to the system timing constraint.