1.10.2 Use of simplified independent variables
As noted above, performance models are typically developed from data extracted from in-service pavements for use in asset management. These models have the benefit of describing actual pavement performance, but unlike experimental data, the models will have multiple independent variables changing simultaneously.
Because of this effect, defined previously as multi-collinearity, also known as multi-correlation, results have shown that models based on simplified (linear) algorithms using fewer independent variables are typically more suited for predictive purposes. The most appropriate approach is to focus on the influence of fewer, though more statistically significant independent variables, for model development as this has proven relatively successful in developing models that are more robust when transferred to new datasets or new situations. Again caution must be exercised in applying models to new situations that the data range of the new situation is within the range of the data used to develop the model.
More complicated models, that is, models with many independent variables, while providing better fit to the data they were derived from, are more prone to ‘over-fitting’ due to too many variables, and often suffer when applied to a new dataset or a new situation. This is because many of the independent variables are not statistically significant and do not have any explanatory power. Slightly simplified models with lower fit to the data provide greater predictive confidence because they do not suffer from ‘over-fitting’.
In summary, the complex nature of multiple independent variables having correlations, and typically highly variable data, as noted earlier in this section, means that simpler linear based models are usually more robust when applied to new datasets and situations.