Stat 407/507 Fall 2022 Exam 4 (take home part)

Due on Canvas on December 13

Please remember that this is an exam, so you are to do your own work and not communicate with other students about the exam


1. With cold weather starting, our food researchers are taking a break from their studies of ramen soup to look at hot chocolate. They are satisfied with their basic recipe for hot chocolate, but they plan to investigate the possible benefits of three additives: mini-marshmallows, peppermint, and Kahlua liqueur. Taste test subjects cannot try all combinations so they used a confounded block design, taking a half-fraction of the 2^3 combinations by confounding the three-way interaction. They recruited eight subjects, and each tried four combinations from one of the two half fractions. The data are in this file. Here is some code to help read in all of the data sets. Remember to show your work (and code) and comment on your results for each problem and sub-problem.

a) Use boxplots to visualize the potential effects of the three additives. What do the plots suggest about group differences and model assumptions?

b) Conduct an analysis to test for the additives and their interactions (show the ANOVA table, including sums of squares, mean squares, F statistic, and P value).

c) Examine residual plots to assess model assumptions. If there are problems, try a transformation to address the problem and redo the analysis.
 
d) For significant interactions use interaction plots for interpretation. For significant main effects (without interaction) get group means to understand the effect.

e) Summarize your conclusions from the experiment. Recommend one or two best combinations and justify your choice(s).

f) Verify that the 3-way interaction was confounded in this design. If the two-way interaction of mini-marshmallows and peppermint had been confounded instead, what would be the two sets of combinations used? (Use the letters K, M, and P to denote the additives)

2. Does the popularity (amount of ticket sales) of a movie differ depending on its genre? To investigate this question a small data set was assembled of movies from 1994 for the genres of drama, action, and thriller. The movie rating was also recorded in case it helps to predict ticket sales. The data are in this file, where the ticket sales are called 'BoxOffice' and are in millions of dollars.

a) Do an analysis of variance to compare the ticket sales for the three genres, ignoring ratings.

b) Produce a plot of ticket sales with ratings, using symbols or colors to label the genre. What does it suggest about the covariate and about group differences?

c) Conduct an analysis of covariance  (show the ANOVA table, including sums of squares, mean squares, F statistics, and P values) to test for differences in ticket sales by genre after adjusting for ratings.

d) Examine residual plots to assess the usual model assumptions. If there are problems, try a transformation to address the problem and redo the analysis.

e) Calculate the adjusted means of ticket sales per genre. Use your own calculation with the formula from the notes to verify that the adjusted mean for drama is correct. How do these three adjusted means compare to the unadjusted means?

f) Test the assumption of parallelism for the ANCOVA (show the ANOVA table, including sums of squares, mean squares, F statistics, and P values). Also discuss whether the 'treatment' may have affected the covariate.

g) Summarize your findings. How did using the ratings help in testing for differences between genres?