Showing results 1 to 2 of 2
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep Learning Lee, Cheolhyung; Cho, Kyunghyun; Kang, Wanmo, Thirty-second Conference on Neural Information Processing Systems, Neural Information Processing Systems (NIPS) Foundation, 2018-12-08 |
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models Lee, Cheolhyoung; Cho, Kyunghyun; Kang, Wanmo, International Conference on Learning Representations (ICLR), International Conference on Learning Representations, 2020-04-30 |
Discover