Large deviation behavior of the largest eigenvalue lambda(1) of Wigner matrices including those arising from an Erdos-Renyi random graph G(n, p) with i.i.d. random conductances on the edges has been the topic of considerable interest. However, despite several recent advances, not much is known when the underlying graph is sparse i.e., p -> 0, except the recent works (Bhattacharya et al., Ann Probab 49(4):1847-1885, 202 land Bhattacharya and Ganguly, SIAM J Discret Math, 2020) which consider the simpler case of the graph without additional edge weights. Under sufficiently general conditions on the conductance distribution, one expects the 'dense' behavior as long as the average degree np is at least logarithmic in n. In this article we focus on the case of constant average degree i.e., p = d/n for some fixed d > 0 with standard Gaussian weights. Results in Bandeira and Van Handel (Ann Probab 44(4):2479-2506, 2016) about general non-homogeneous Gaussian matrices imply that in this regime lambda(1) scales like root log n. We prove the following results towards a precise understanding of the large deviation behavior in this setting. 1. (Upper tail probabilities and structure theorem): For delta > 0, we pin down the exact exponent psi(delta) such that P(lambda(1) >= root 2(1 + delta) log n) = n(-psi(delta)+o(1)). Further, we show that conditioned on the upper tail event, with high probability, a unique maximal clique emerges with a very precise delta dependent size (takes either one or two possible values) and the Gaussian weights are uniformly high in absolute value on the edges in the clique. Finally, we also prove an optimal localization result for the leading eigenvector, showing that it allocates most of its mass on the aforementioned clique which is spread uniformly across its vertices. 2. (Lower tail probabilities): The exact stretched exponential behavior P(lambda(1) <= root 2(1 - delta) log n) = exp (-n(l(delta)+o(1))) is also established. As an immediate corollary, one obtains that lambda(1) is typically (1+ o(1))root 2 log n, a result which surprisingly appears to be new. A key ingredient in our proofs is an extremal spectral theory for weighted graphs obtained by an l(1)-reduction of the standard l(2)-variational formulation of the largest eigenvalue via the classical Motzkin-Straus theorem , which could be of independent interest.