(A) fast distributed deep learning platform based on virtual shared memory framework for high performance computing system고성능 컴퓨팅 시스템을 위한 가상 공유 메모리 프레임워크 기반 고속 분산 딥러닝 플랫폼

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 901
  • Download : 0
Deep learning is one of the major promising machine learning methodologies. Deep learning is widely used, e.g., in image recognition, voice recognition, and natural language processing. In order to improve learning accuracy, deep neural networks have evolved by (i) increasing the number of layers and also by (ii) increasing the number of parameters in massive models. This implies that distributed deep learning platforms need to evolve to deal with huge/complex deep learning models and process with high performance computing resources for massive training data. The problems that the distributed deep learning platforms should address is to communicate deep learning parameters at high speed between distributed deep learning processes and to reduce the parameter traffic.To exchange deep learning parameters fast, we have to overcome inherent inefficiency of existing communication libraries and protocols.First, this thesis proposes a novel virtual shared memory framework, called Soft Memory Box~(SMB), which enables distributed processes in the computing servers share the memory of remote servers with lower overheads so as to improve communication performance. Second, this thesis proposes a new distributed deep learning platform, named as ShmCaffe, which utilizes remote shared memory for communication overhead reduction in massive deep neural network training parameter sharing. ShmCaffe is designed based on the SMB, a virtual shared memory framework. In the ShmCaffe platform, the remote shared memory is used as a shared buffer for asynchronous massive parameter sharing among many distributed deep learning processes. Moreover, a hybrid method that combines asynchronous and synchronous parameter update methods is also discussed in this platform to improve scalability. According to the first performance evaluation results, the communication time of the SMB is 2.1 times faster than that of the massage passing interface (MPI) in the scenario where computation and communication is sequential. In addition, in the parallel computation-communication scenario, the communication time of the SMB-based asynchronous parameter update becomes 2 through 7 times faster than that using the MPI depending on deep learning models and the number of deep learning workers. As a result of second evaluation, This paper verifies that the Inception_v1 model training using ShmCaffe converge by varying the number of workers. The scalability of ShmCaffe is evaluated by comparing the Inception_v1 training time of asynchrnous ShmCaffe and hybrid ShmCaffe. ShmCaffe is 10.1 times faster than Caffe, 2.8 times faster than Caffe-MPI, and 2.6 times faster than Tensorflow in the training of Inception_v1 with 16 GPUs. The main benefits of communication traffic, and by scaling out the deep learning workers. As a results, ShmCaffe improves the productivity of deep learning network developer, reduce the cost by increasing the utilization of the computation resources, and overcome heterogeneity of GPU servers.
Advisors
Kang, Sungwonresearcher강성원researcher
Description
한국과학기술원 :정보통신공학과,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 정보통신공학과, 2018.8,[vii, 104 p. :]

Keywords

High performance computing▼adistributed computing▼asoft memory box▼ashared memory▼adeep neural network▼adistributed deep learning; 고성능 컴퓨팅▼a분산 컴퓨팅▼a소프트 메모리 박스▼a공유 메모리▼a심층신경망▼a분산 딥러닝

URI
http://hdl.handle.net/10203/265371
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828239&flag=dissertation
Appears in Collection
ICE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0