This paper describes a novel vision-based map-referenced navigation method that works in GNSS-denied environments. By visual classification of the underlying environment, we can extract time-invariant and robust features from the scene. The terrain classification is done by semantic segmentation using Fully Convolutional Networks (FCN), of which the training data are collected by combining satellite/aerial imagery and open GIS data. The aerial image and map database are matched using the Iterative Closest Point (ICP) algorithm based on their terrain semantics. Finally, it is integrated with IMU by Extended Kalman Filter to construct a complete navigation solution. The proposed method has been shown to work on simulated flights effectively.