Introduction to Regression Analysis

Regression Analysis

Possible reasons:
1. Characterize a relationship (extent, direction, strength)
2. Seek a quantitative formula (prediction)
3. Describe a relationship controlling for other effects (like #1)
4. Determine which variables are important
5. Compare several relationships (do coefficients change by group?)
6. Obtain valid and precise coefficient estimates

Note the distinction between Statistical versus Deterministic Models

Simple Linear Regression

Even with only one independent variable, there are two issues: Which is the most appropriate model? and What parameter values give the best fit for the most appropriate model?

Review page 574 on mathematical properties of a straight line

An example: Below some data is displayed for 5 breakfast cereal brands. The name of the cereal is displayed along with its amount of fat per serving and total calories. These data are from a larger dataset which is described and is available at: http://lib.stat.cmu.edu/datasets/1993.expo/

                                     Cereal data         
                    Obs    name               calories    fat
                     1     Crispix               110       0
                     2     Froot_Loops           110       1
                     3     Muesli_Raisins,       150       3
                     4     Oatmeal_Raisin_       130       2
                     5     Post_Nat._Raisi       120       1

Here is a plot of the data from SAS:


                Plot of calories*fat.  Legend: A = 1 obs, B = 2 obs, etc.
       150 ˆ                                                                    A
           ‚
           ‚
           ‚
       140 ˆ
           ‚
  calories ‚
           ‚
       130 ˆ                                              A
           ‚
           ‚
           ‚
       120 ˆ                        A
           ‚
           ‚
           ‚
       110 ˆ  A                     A
           Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ
              0                     1                     2                     3
                                              fat

Does the relationship between Fat and Calories appear linear? Can you suggest an equation for a line to relate Calories to Fat?

How about the equation: Estimated Calories = 110 + 10(Fat) ? How can we decide which slope and intercept are 'best' for predicting Calories from Fat?