Maintenance technologies have been progressed from a time-based to a condition-based manner. The fundamental idea of condition-based maintenance (CBM) is built on the real-time diagnosis of impending failures and/or the prognosis of residual lifetime of equipment by monitoring health conditions using various sensors. The success of CBM, therefore, hinges on the capability to develop accurate diagnosis/prognosis models. Even though there may be an unlimited number of methods to implement models, the models can normally be classified into two categories in terms of their origins: using physical principles or historical observations. We have focused on the latter method (sometimes referred as the empirical model based on statistical learning) because of some practical benefits such as context-free applicability, configuration flexibility, and customization adaptability. While several pilot-scale systems using empirical models have been applied to work sites in Korea, it should be noted that these do not seem to be generally competitive against conventional physical models. As a result of investigating the bottlenecks of previous attempts, we have recognized the need for a novel strategy for grouping correlated variables such that an empirical model can accept not only statistical correlation but also some extent of physical knowledge of a system. Detailed examples of problems are as follows: (1) missing of important signals in a group caused by the lack of observations, (2) problems of signals with the time delay, and (3) problems of optimal kernel bandwidth. This paper presents an improved statistical learning framework including the proposed strategy and case studies illustrating the performance of the method. (C) 2010 Elsevier Ltd. All rights reserved.