Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) as well as the relationship across different objects (i.e., the edges that connect nodes), achieving state-of-the-art performance on a wide range of graph-based tasks. Despite its strengths, utilizing these algorithms in a production environment faces several key challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale, requiring substantial storage space for training. Unfortunately, existing ML frameworks based on the in-memory processing model significantly hamper the productivity of algorithm developers as it mandates the overall working set to fit within DRAM capacity constraints. In this work, we first study state-of-the-art, large-scale GNN training algorithms. We then conduct a detailed characterization on utilizing capacity-optimized non-volatile memory solutions for storing memory-hungry GNN data, exploring the feasibility of SSDs for large-scale GNN training.