# PLS Regression

PLS Regression is a recent technique that generalizes and combines features from Principal Component Analysis and Multiple Regression. It is particularly useful when we need to predict a set of dependent variables from a (very) large set of independent variables (i.e., predictors). Partial Least Squares (PLS) can be a powerful method of analysis because of the minimal demands on measurement scales, sample size, and residual distributions. Although PLS can be used for theory confirmation, it can also be used to suggest where relationships might or might not exist and to suggest propositions for later testing.

Compared to the better known factor-based covariance fitting approach for latent structural modeling (exemplified by software such as LISREL, EQS, COSAN, and EZPATH), the component-based PLS avoids two serious problems: Inadmissible solutions and Factor indeterminacy.

The philosophical distinction between these approaches is whether to use structural equation modeling for theory testing and development or for predictive applications. In situations where prior theory is strong and further testing and development is the goal, covariance based full-information estimation methods (i.e., Maximum Likelihood or Generalized Least Squares) are more appropriate. Yet, due to the indeterminacy of factor score estimations, there exists a loss of predictive accuracy. This, of course, is not of concern in theory testing where structural relationships (i.e., parameter estimation) among concepts is of prime concern.

For application and prediction, a PLS approach is often more suitable. Under this approach, it is assumed that all the measured variance is useful variance to be explained. Since the approach estimates the latent variables as exact linear combinations of the observed measures, it avoids the indeterminacy problem and provides an exact definition of component scores. Using the iterative estimation technique , the PLS approach provides a general model which encompasses, among other techniques, canonical correlation, redundancy analysis, multiple regression, multivariate analysis of variance, and principal components. Because the iterative algorithm generally consists of a series of ordinary least squares analyses, identification is not a problem for recursive models nor does it presume any distributional form for measured variables.

Sample size can be smaller, with a strong rule of thumb suggesting that it be equal to the larger of the following: (1) ten times the scale with the largest number of formative (i.e., causal) indicators (note that scales for constructs designated with reflective indicators can be ignored), or (2) ten times the largest number of structural paths directed at a particular construct in the structural model. A weak rule of thumb, similar to the heuristic for multiple regression would be to use a multiplier of five instead of ten for the preceding formulae.

Second order factors can be approximated using various procedures. One of the easiest to implement is the approach of repeated indicators known as the hierarchical component model. In essence, a second order factor is directly measured by observed variables for all the first order factors. While this approach repeats the number of manifest variables used, the model can be estimated by the standard PLS algorithm. This procedure works best with equal numbers of indicators for each construct.

Finally, PLS is considered better suited for explaining complex relationships. “PLS comes to the fore in larger models, when the importance shifts from individual variables and parameters to packages of variables and aggregate parameters.”