Stat 407/507 Fall 2022 Exam 1 (take home only)

Due on on September 19 on Canvas

Please remember that this is an exam, so you are to do your own work and not communicate with other students about the exam



1. Do salaries differ by region in the U.S.? To address this question a sample of states in each of the five U.S. regions (Midwest, Northeast, Southeast, Southwest, and West) was taken and the annual mean wage from 2018 was recorded for each sampled state. The data are in this file. Here is some code to help read the data into R and SAS. Remember to comment on your work for each problem and sub-problem.

a. Use a boxplot to visualize potential effects of region on wages. What does the plot suggest about group differences and model assumptions?

b. Conduct an analysis of variance (using a one-way ANOVA) to test the null hypothesis of equality of wages by region (show the ANOVA table, including sums of squares, mean squares, F statistic, and P value. Be sure to include a line for total SS and df).

c. Examine a residual by predicted plot and a normal plot of the residuals to assess model assumptions of equal variance and normality.

d. Use the Box-Cox procedure to find a recommended transformation of the response. Would you reject the null hypothesis that the power parameter = 1?

e. Select a transformation of the wages, and repeat the ANOVA on the transformed data. Do the results change? Which analysis is more appropriate? What is your conclusion about equality of wages between regions?


2. A future study is being planned to compare the wages in these regions. If we used our current data (problem 1, on the original scale) to estimate the variance of the wages, how many states do we need to sample to detect a difference if the true mean wages are 47, 57, 45, 46, and 55 (in thousands), at alpha = .05 for a power of 80%?