Possible reasons:
1. Characterize a relationship (extent, direction, strength)
2. Seek a quantitative formula (prediction)
3. Describe a relationship controlling for other effects (like #1)
4. Determine which variables are important
5. Compare several relationships (do coefficients change by group?)
6. Obtain valid and precise coefficient estimates
Note the distinction between Statistical versus Deterministic Models
Even with only one independent variable, there are two issues: Which is the most appropriate model? and What parameter values give the best fit for the most appropriate model?
Review page 574 on mathematical properties of a straight line
An example: Below some data is displayed for 5 breakfast cereal brands. The name of the cereal is displayed along with its amount of fat per serving and total calories. These data are from a larger dataset which is described and is available at: http://lib.stat.cmu.edu/datasets/1993.expo/
Cereal data Obs name calories fat 1 Crispix 110 0 2 Froot_Loops 110 1 3 Muesli_Raisins, 150 3 4 Oatmeal_Raisin_ 130 2 5 Post_Nat._Raisi 120 1
Here is a plot of the data from SAS: Plot of calories*fat. Legend: A = 1 obs, B = 2 obs, etc. 150 A 140 calories 130 A 120 A 110 A A 0 1 2 3 fat
Does the relationship between Fat and Calories appear linear? Can you suggest an equation for a line to relate Calories to Fat?
How about the equation: Estimated Calories = 110 + 10(Fat) ? How can we decide which slope and intercept are 'best' for predicting Calories from Fat?