Multiple regression topics

Residual Analysis for checking model assumptions

1) E (e_i ) = 0 . This is addressed by proper model selection.

2) Var ( e_i ) = s_e² is constant. Check residual plots. When there is a problem, transformations or weighted least squares methods may be considered.

3) e_i are normally distributed. Use probability plots of the residuals to check.

4) e_i are independent. For time-ordered data, serial correlation can be assessed with the Durbin-Watson test statistic. Values < 1.5 or > 2.5 are cause for suspicion.

Transformations

In some cases the data may be transformed to meet distributional assumptions. Square root and log transformations are commonly used, and the arcsine of the square root is appropriate for proportions.

Other important issues in checking the regression model

Outlier data points can be detected by various residual plots.

Influential data points can be detected by diagnostic statistics such as the hat matrix value, which measures the leverage of the data value.

Nonlinear regression and other extensions

All of the models we have considered are linear in the parameters b, even

y = b₀ + b₁ x + b₂ x² + b₃ sin(x) + e

Some models are not linear in the parameters, such as

y = a₀ exp(-a₁ x₁) + a₂ exp(-a₃ x₂) + e

For these nonlinear models, we typically use nonlinear least squares procedures, which generally require iterative methods of estimation. Other types of models extend linear models to cases where the response is not normally distributed. These are called generalized linear models. One example of a generalized linear model analysis is in logistic regression, which is appropriate for responses that are binary or proportions.

Another advertisement for using designed experiments for regression analysis:

1) You can set x values so as to minimize collinearity.

2) You can arrange to have replicate x values, allowing lack of fit tests that use pure error terms.