In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppress-ing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocksfor aggregatinginformationin thespatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal con-former with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three differ-ent characteristics of audio signals enables more compre-hensive enhancement in noisy and reverberant environ-ments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.