Some reasons for using multiple regression
1) Adjusting for explanatory variables (Seeing the effect of a new teaching method on a post-test score after adjusting for pre-test, gender, etc.)
2) Investigating which variables explain a response. (What habitat variables in the forest account for variation in tree growth?)
3) Prediction (What is the value of the stock market next week?)
4) Parameter estimation (Others have used a particular model before, how does our data agree with theirs?)
For 1, 2, and 3, it is rarely clear as to which model is best. This is most problematic for reason 2 .
Building multiple regression models
1) Select the independent variables
2) Model formation
3) Residual analysis
Selecting the independent variables
1) Known, because the model is already specified
2) Use
Stepwise selection
Let a1 = significance level to enter; a2 = significance level to stay
Forward and backward selection only perform step 2 or step 3, respectively.
Model formation
Using the selected independent variables: What is the correct order of the terms? (Are squared or cross-product terms needed?)
Residual plots and variable selection methods can be used.