Alpha-stable convergence of heavy-/light-tailed infinitely wide neural networks

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 62
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJung, Paulko
dc.contributor.authorLee, Hoilko
dc.contributor.authorLee, Jihoko
dc.contributor.authorYang, Hongseokko
dc.date.accessioned2024-01-10T05:04:01Z-
dc.date.available2024-01-10T05:04:01Z-
dc.date.created2023-11-08-
dc.date.created2023-11-08-
dc.date.issued2023-12-
dc.identifier.citationADVANCES IN APPLIED PROBABILITY, v.55, no.4, pp.1415 - 1441-
dc.identifier.issn0001-8678-
dc.identifier.urihttp://hdl.handle.net/10203/317634-
dc.description.abstractWe consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric alpha-stable distribution, where alpha in (0,2] may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric alpha-stable distribution having the same alpha parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric alpha-stable distributions, alpha in [0,2].-
dc.languageEnglish-
dc.publisherAPPLIED PROBABILITY TRUST-
dc.titleAlpha-stable convergence of heavy-/light-tailed infinitely wide neural networks-
dc.typeArticle-
dc.identifier.wosid001168005000006-
dc.identifier.scopusid2-s2.0-85165334144-
dc.type.rimsART-
dc.citation.volume55-
dc.citation.issue4-
dc.citation.beginningpage1415-
dc.citation.endingpage1441-
dc.citation.publicationnameADVANCES IN APPLIED PROBABILITY-
dc.identifier.doi10.1017/apr.2023.3-
dc.contributor.localauthorYang, Hongseok-
dc.contributor.nonIdAuthorJung, Paul-
dc.contributor.nonIdAuthorLee, Jiho-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorHeavy-tailed distribution-
dc.subject.keywordAuthorstable process-
dc.subject.keywordAuthormulti-layer perceptrons-
dc.subject.keywordAuthorinfinite-width limit-
dc.subject.keywordAuthorweak convergence-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0