Tucker decomposition is used extensively for modeling multi-dimensional data represented as tensors. Owing to the increasing magnitude of nonzero values in real-world tensors, a growing demand has emerged for expeditious and scalable Tucker decomposition techniques. Several graphics processing unit (GPU)-accelerated techniques have been proposed for Tucker decomposition to decrease the decomposition speed. However, these approaches often encounter difficulties in handling extensive tensors owing to their huge memory demands, which exceed the available capacity of GPU memory. This study presents an expandable GPU-based technique for Tucker decomposition called GPUTucker. The proposed method meticulously partitions sizable tensors into smaller sub-tensors, which are referred to as tensor blocks, and effectively implements the GPU-based data pipeline by handling these tensor blocks asynchronously. Extensive experiments demonstrate that GPUTucker outperforms state-of-the-art Tucker decomposition methods in terms of the decomposition speed and scalability.