Deep learning is one of the major promising machine learning methodologies. Deep learning is widely used, e.g., in image recognition, voice recognition, and natural language processing. In order to improve learning accuracy, deep neural networks have evolved by (i) increasing the number of layers and also by (ii) increasing the number of parameters in massive models. This implies that distributed deep learning platforms need to evolve to deal with huge/complex deep learning models and process with high performance computing resources for massive training data. The problems that the distributed deep learning platforms should address is to communicate deep learning parameters at high speed between distributed deep learning processes and to reduce the parameter traffic.To exchange deep learning parameters fast, we have to overcome inherent inefficiency of existing communication libraries and protocols.First, this thesis proposes a novel virtual shared memory framework, called Soft Memory Box~(SMB), which enables distributed processes in the computing servers share the memory of remote servers with lower overheads so as to improve communication performance. Second, this thesis proposes a new distributed deep learning platform, named as ShmCaffe, which utilizes remote shared memory for communication overhead reduction in massive deep neural network training parameter sharing. ShmCaffe is designed based on the SMB, a virtual shared memory framework. In the ShmCaffe platform, the remote shared memory is used as a shared buffer for asynchronous massive parameter sharing among many distributed deep learning processes. Moreover, a hybrid method that combines asynchronous and synchronous parameter update methods is also discussed in this platform to improve scalability. According to the first performance evaluation results, the communication time of the SMB is 2.1 times faster than that of the massage passing interface (MPI) in the scenario where computation and communication is sequential. In addition, in the parallel computation-communication scenario, the communication time of the SMB-based asynchronous parameter update becomes 2 through 7 times faster than that using the MPI depending on deep learning models and the number of deep learning workers. As a result of second evaluation, This paper verifies that the Inception_v1 model training using ShmCaffe converge by varying the number of workers. The scalability of ShmCaffe is evaluated by comparing the Inception_v1 training time of asynchrnous ShmCaffe and hybrid ShmCaffe. ShmCaffe is 10.1 times faster than Caffe, 2.8 times faster than Caffe-MPI, and 2.6 times faster than Tensorflow in the training of Inception_v1 with 16 GPUs. The main benefits of communication traffic, and by scaling out the deep learning workers. As a results, ShmCaffe improves the productivity of deep learning network developer, reduce the cost by increasing the utilization of the computation resources, and overcome heterogeneity of GPU servers.