Facial expression conveys non-verbal communication information to help humans better perceive physical or psychophysical situations. Accurate 3D imaging provides stable topographic changes for reading facial expressions. In particular, light-field cameras (LFCs) have a high potential for constructing 3D depth maps thanks to a simple configuration of microlens arrays and an objective lens. Here we report a machine-learned NIR-based light-field camera (NIR-LFC) for facial expression reading by extracting Euclidean distances of 3D facial landmarks in a pairwise fashion. The NIR-LFC contains microlens arrays with asymmetric Fabry-Perot filter and NIR bandpass filter on CMOS image sensor, fully packaged with two VCSELs. The NIR-LFC not only increases image contrast by 2.1 times compared to conventional LFCs, but also reduces reconstruction errors by up to 54%, regardless of ambient light conditions. A multi-layer perceptron (MLP) classifies input vectors, consisting of 78 pairwise distances on a 3D facial depth map of happiness, anger, sadness, and disgust, and exhibits exceptional average accuracy of 0.85 (p<0.05). This LFC provides a new platform for labeling facial expression reading and emotion in point-of-care biomedical, social perception, or human-machine interaction applications.