Rate Adaptation (RA) is a fundamental mechanism in 802.11 systems. It allows transmitters to adapt the coding and modulation scheme as well as the MIMO transmission mode to the radio channel conditions, to learn and track the (mode, rate) pair providing the highest throughput. The design of RA mechanisms has been mainly driven by heuristics. In contrast, we rigorously formulate RA as an online stochastic optimization problem. We solve this problem and present G-ORS (Graphical Optimal Rate Sampling), a family of provably optimal (mode, rate) pair adaptation algorithms. Our main result is that G-ORS outperforms state-of-the-art algorithms such as MiRA and Minstrel HT, as demonstrated by experiments on a 802.11n network test-bed. The design of G-ORS is supported by a theoretical analysis, where we study its performance in stationary radio environments where the successful packet transmission probabilities at the various (mode, rate) pairs do not vary over time, and in non-stationary environments where these probabilities evolve. We show that under G-ORS, the throughput loss due to the need to explore sub-optimal (mode, rate) pairs does not depend on the number of available pairs.