Chapter 1 highlights

Chapter 1

A hypothesis test for the median of a single sample

Assume we have a continuous pdf for the random variable X.  If the distribution is skewed, it is generally more appropriate to use the median (theta.5) as a location parameter than the mean.   If the distribution is symmetric then the two parameters are equal.

Suppose we wish to test H0: theta.5 = thetaH versus Ha: theta.5 > thetaH

Under H0, X satisfies P(X > thetaH) = P(X < thetaH) = .5.  Thus the number of X's that are greater than thetaH follow a binomial distribution with p=.5, so we can calculate a p-value directly from the binomial distribution, or in large samples use a normal approximation with mean = n/2 and variance = n/4.

 

 

A confidence interval for the median of a single sample

The order statistics from a sample satisfy X(1) < X(2) < ... < X(n) .  To develop a confidence interval for the median, we wish to find an interval with P(X(a) < theta.5 < X(b) ) = 1 - alpha .  When X(a) < theta.5 < X(b) , then at least a of the observations are less than theta.5, and at most b-1 of the observations are less than theta.5 .   Because the number of observations less than theta.5 follows a binomial distribution,  with parameter p=.5, the probability that we want to calculate is given by:

 

 

or in large samples can be calculated using the normal approximation to the binomial distribution.

The empirical cumulative distribution function (ecdf) and a confidence interval for the ecdf at a single point

The ecdf (x) =       =  the proportion of observations <= x  .  Since the number of observations less than x is a binomial random variable with parameter p = F(x), we can calculate the standard deviation of ecdf(x) and develop a confidence interval for F evaluated at x, using either the binomial distribution or (in large samples) a normal approximation to the binomial distribution.