Cooperative training methods for distributed machine learning are typically based on the exchange of local gradients or local model parameters. The latter approach is known as Federated Learning (FL). An alternative solution with reduced communication overhead, referred to as Federated Distillation (FD), was recently proposed that exchanges only averaged model outputs. While prior work studied implementations of FL over wireless fading channels, here we propose wireless protocols for FD and for an enhanced version thereof that leverages an offline communication phase to communicate "mixed-up" covariate vectors. The proposed implementations consist of different combinations of digital schemes based on separate source-channel coding and of over-the-air computing strategies based on analog joint source-channel coding. It is shown that the enhanced version FD has the potential to significantly outperform FL in the presence of limited spectral resources.