In this paper, we describe a user study evaluating the usability of an augmented reality (AR) multimodal interface (MMI). We have developed an AR MMI that combines free-hand gesture and speech input in a natural way using a multimodal fusion architecture. We describe the system architecture and present a study exploring the usability of the AR MMI compared with speech-only and 3D-hand-gesture-only interaction conditions. The interface was used in an AR application for selecting 3D virtual objects and changing their shape and color. For each interface condition, we measured task completion time, the number of user and system errors, and user satisfactions. We found that the MMI was more usable than the gesture-only interface conditions, and users felt that the MMI was more satisfying to use than the speech-only interface conditions; however, it was neither more effective nor more efficient than the speech-only interface. We discuss the implications of this research for designing AR MMI and outline directions for future work. The findings could also be used to help develop MMIs for a wider range of AR applications, for example, in AR navigation tasks, mobile AR interfaces, or AR game applications.