As the complexity of deep learning (DL) models scales up, computer architects are faced with a memory "capacity" wall, where the limited physical memory inside the accelerator device constrains the algorithm that can be trained and deployed. This article summarizes our recent work on designing an accelerator-centric, disaggregated memory system for DL.