Stages of a scientific investigation:

1) Define the problem.

2) Design an experiment (observational study, survey, etc.) to address the problem.

3) Collect data, enter it on the computer, and produce initial graphical displays.

4) Analyze the data. Use graphs to understand and validate the results.

5) Interpret the results and communicate them to others, including graphical displays.

 

The defective baseball problem

Suppose that you are the resident statistician for a professional baseball team. The team receives baseballs from a manufacturer that claims that 1% of the baseballs may have some defect. By examining the records for the last 500 baseballs, you notice that 10 of them were defective. Does the manufacturers claim of a 1% defect rate appear valid?

Solution: The outcome is binary, so we are interested in a proportion. The population parameter of interest is p, the probability that a baseball is defective. If we assume that p is the same for all baseballs produced and that whether one baseball is defective or not is independent of whether other baseballs are defective, then the count of the number of defective baseballs follows a binomial distribution:

P(Y = y | p) = [n!/(y! (n-y)!)] py(1-p)n-y

where Y is the number of defective baseballs out of n baseballs. To conduct a hypothesis test of

H0: p = .01 versus Ha: p > .01 ,

we could calculate the P value directly by ('ge' means greater than or equal to)

P(Y ge 10 | p = .01) = P(Y = 10 | p = .01) + ... + P(Y = 500 | p = .01)

or we can use the fact that if both n p and n (1 - p) are greater than 5, then we can approximate the binomial distribution by a normal distribution. For our example z = 2.25 , so we can calculate P(Yge10 | p = .01) = P( Z > 2.25). This probability can be calculated using a chart for standard normal probabilities. Since the significance level that most people use is a = .05, and we have P<a, we would then reject the null hypothesis and conclude that Ha: p > .01 is correct. At this point meetings occur between businessmen, statisticians, and lawyers.

 

Topics to remember from introductory statistics:

Data description using the sample mean, variance, standard deviation, and range

Sample statistics versus Population parameters

Basic probability

The normal distribution

The binomial distribution

The sampling distribution of a statistic

Confidence intervals, Hypothesis tests, and related terminology

Z tests and t tests for a population mean

Comparing two groups: the independent samples t test and the paired samples t test

The chi-squared test for a population variance, the F test for comparing two population variances.