Instructions for using R code for Estimation of disease prevalence from imperfect diagnostic tests from pooled data with varying pool sizes ---------------------------------------------------------------------- To run the R programs, you need to: 1) Install the R package ( go to http://www.r-project.org/ and download from a mirror site near you) 2) After installing R, start R and paste the code from the link "R functions for prevalence estimation using a Metropolis sampler" at http://www.webpages.uidaho.edu/~chrisw/research/prevalence/ into the R window. 3) Enter your data into R in the correct format 4) Run the program ----------------------------------------------------------------------- Here are instructions for steps 3 and 4: Step 3: ------- For each pool size, record the data in the form (poolsize, # negative pools, # positive pools). For the Truckee River data, the results were: Poolsize Number negative Number positive 2 0 1 3 2 0 4 8 2 5 11 1 6 2 0 These data need to be entered into R as a matrix, using the matrix function. Here is an R command to read in the Truckee data above: truckee1 <- matrix(c(2,0,1,3,2,0,4,8,2,5,11,1,6,2,0),nrow=5,ncol=3,byrow=T) Note that the data are entered rowwise, and the nrow, ncol, and byrow commands tell R that the data come from a 5 x 3 matrix. Step 4: ------- Now we can run the program. The primary function is called poolprevsample1, and its arguments are the data set object, values for the alpha and beta parameters of the beta prior distributions for prevalence, sensitivity, and specificity, and how long to run the Metropolis chain (burnin) and how large a sample to collect after burnin. Optionally, the means and standard deviations of the normal proposals can be input, and extra plots can be produced with the output option. If we wanted to analyze the data in the truckee1 data object, using a prevalence prior of alpha = beta = 1, a sensitivity prior of alpha = 24, beta = 6, a specificity prior of alpha = 81, beta = 9, with a burnin of 100000 samples and collection of 50000 samples afterward, using a standard deviation of the normal proposals of .55 for each parameter, we can do this by calling the function with these arguments: poolprevsample1(truckee1,1.,1.,24,6,81,9,100000,50000,std=c(.55,.55,.55)) Below is listed some of the output from running this function. Note that the acceptance rate for the example is about 38%. It is generally a good idea to have the acceptance rate between 33 and 50%. To increase the acceptance rate, lower the value for the standard deviation of the normal proposals from the .55 in the example. To lower the acceptance rate, increase the standard deviation values. -------------------------------------------------------------------------- Example Metropolis run with Truckee data > poolprevsample1(truckee1,1.,1.,24,6,81,9,100000,50000,std=c(.55,.55,.55)) [1] "Quantiles for pi (Prevalence)" 1% 2.5% 5% 10% 25% 50% 0.0003870529 0.0010662614 0.0024566558 0.0051087575 0.0126385942 0.0259105246 75% 90% 95% 97.5% 99% 0.0434183275 0.0631777073 0.0790792017 0.0940010439 0.1132836729 [1] "Quantiles for eta (Sensitivity)" 1% 2.5% 5% 10% 25% 50% 75% 90% 0.5894864 0.6251633 0.6545705 0.6901026 0.7454641 0.7989575 0.8444463 0.8808413 95% 97.5% 99% 0.8994901 0.9152295 0.9314328 [1] "Quantiles for lambda (Specificity)" 1% 2.5% 5% 10% 25% 50% 75% 90% 0.8250863 0.8398550 0.8510695 0.8636489 0.8834727 0.9040362 0.9226832 0.9370757 95% 97.5% 99% 0.9445400 0.9510353 0.9580801 [1] "Mean and SD for pi" [1] 0.03117857 0.02471908 [1] "Mean and SD for eta" [1] 0.79094672 0.07449228 [1] "Mean and SD for lambda" [1] 0.90183952 0.02886135 [1] "Proportion of accepted proposals is " [1] 0.3809333 [1] "total time in minutes is " [1] 1.154167 -------------------------------------------------------------------------- In most cases running the poolprevsample1 function repeatedly should give similar results, but they may be slightly different. For very small data sets, the results, particularly of the higher quantiles (90th percentile, etc.), may fluctuate. If this is observed, the function below may be used to calculate quantities from the posterior distribution via integration of a discretized parameter space. This function takes much longer than poolprevsample1, but will always give the same answer. More on this function to come ... Example Integration of discretized parameter space with Truckee data compenummat1a(cbind(truckee1[,2:3],truckee1[,1]),1.,1.,24,6,81,9) ... $x [1] 0.005 0.010 0.025 0.050 0.100 0.250 0.500 0.750 0.900 0.950 0.975 0.990 0.995 $y [1] NA NA NA NA NA 0.01170293 0.02439053 0.04233997 [9] 0.06310762 0.07797756 0.09266591 0.11235874 0.12785948 [1] "Mean and SD for pi" [1] 0.03275283 0.02507062 [1] "total time in minutes is " [1] 11.81483