Instructions for using R code for Estimation of disease prevalence 
from imperfect diagnostic tests from pooled data with varying pool sizes

 ----------------------------------------------------------------------

To run the R programs, you need to:

1) Install the R package ( go to http://www.r-project.org/ and download from a mirror site near you)

2) After installing R, start R and paste the code from the link
   "R functions for prevalence estimation using a Metropolis sampler"
   at http://www.webpages.uidaho.edu/~chrisw/research/prevalence/ into the R window.

3) Enter your data into R in the correct format

4) Run the program

 -----------------------------------------------------------------------

Here are instructions for steps 3 and 4:

Step 3:
-------

For each pool size, record the data in the form (poolsize, # negative pools, # positive pools).

For the Truckee River data, the results were:

Poolsize    Number negative     Number positive
   2             0                   1
   3             2                   0 
   4             8                   2 
   5             11                  1
   6             2                   0 

These data need to be entered into R as a matrix, using the matrix function.
Here is an R command to read in the Truckee data above:

truckee1 <- matrix(c(2,0,1,3,2,0,4,8,2,5,11,1,6,2,0),nrow=5,ncol=3,byrow=T)

Note that the data are entered rowwise, and the nrow, ncol, and byrow
commands tell R that the data come from a 5 x 3 matrix.

Step 4:
-------

Now we can run the program.  The primary function is called poolprevsample1, and its 
arguments are the data set object, values for the alpha and beta parameters of the 
beta prior distributions for prevalence, sensitivity, and specificity, and how long 
to run the Metropolis chain (burnin) and how large a sample to collect after burnin. 

Optionally, the means and standard deviations of the normal proposals can be input, 
and extra plots can be produced with the output option.

If we wanted to analyze the data in the truckee1 data object, using a prevalence 
prior of alpha = beta = 1, a sensitivity prior of alpha = 24, beta = 6, a specificity 
prior of alpha = 81, beta = 9, with a burnin of 100000 samples and collection of 50000 samples 
afterward, using a standard deviation of the normal proposals of .55 for each parameter,
we can do this by calling the function with these arguments:

poolprevsample1(truckee1,1.,1.,24,6,81,9,100000,50000,std=c(.55,.55,.55))

Below is listed some of the output from running this function.  Note that the acceptance 
rate for the example is about 38%.  It is generally a good idea to have the acceptance 
rate between 33 and 50%.  To increase the acceptance rate, lower the value for the 
standard deviation of the normal proposals from the .55 in the example.  To lower the 
acceptance rate, increase the standard deviation values.

 --------------------------------------------------------------------------

Example Metropolis run with Truckee data

> poolprevsample1(truckee1,1.,1.,24,6,81,9,100000,50000,std=c(.55,.55,.55))

[1] "Quantiles for pi (Prevalence)"
          1%         2.5%           5%          10%          25%          50% 
0.0003870529 0.0010662614 0.0024566558 0.0051087575 0.0126385942 0.0259105246 
         75%          90%          95%        97.5%          99% 
0.0434183275 0.0631777073 0.0790792017 0.0940010439 0.1132836729 
[1] "Quantiles for eta (Sensitivity)"
       1%      2.5%        5%       10%       25%       50%       75%       90% 
0.5894864 0.6251633 0.6545705 0.6901026 0.7454641 0.7989575 0.8444463 0.8808413 
      95%     97.5%       99% 
0.8994901 0.9152295 0.9314328 
[1] "Quantiles for lambda (Specificity)"
       1%      2.5%        5%       10%       25%       50%       75%       90% 
0.8250863 0.8398550 0.8510695 0.8636489 0.8834727 0.9040362 0.9226832 0.9370757 
      95%     97.5%       99% 
0.9445400 0.9510353 0.9580801 
[1] "Mean and SD for pi"
[1] 0.03117857 0.02471908
[1] "Mean and SD for eta"
[1] 0.79094672 0.07449228
[1] "Mean and SD for lambda"
[1] 0.90183952 0.02886135
[1] "Proportion of accepted proposals is "
[1] 0.3809333
[1] "total time in minutes is "
[1] 1.154167

 --------------------------------------------------------------------------

In most cases running the poolprevsample1 function repeatedly should give similar results, 
but they may be slightly different.  For very small data sets, the results, particularly of 
the higher quantiles (90th percentile, etc.), may fluctuate.  If this is observed, the 
function below may be used to calculate quantities from the posterior distribution via 
integration of a discretized parameter space.  This function takes much longer than 
poolprevsample1, but will always give the same answer.

More on this function to come ...



Example Integration of discretized parameter space with Truckee data


compenummat1a(cbind(truckee1[,2:3],truckee1[,1]),1.,1.,24,6,81,9)

...
$x
 [1] 0.005 0.010 0.025 0.050 0.100 0.250 0.500 0.750 0.900 0.950 0.975 0.990 0.995
$y
 [1]         NA         NA         NA         NA         NA 0.01170293 0.02439053 0.04233997
 [9] 0.06310762 0.07797756 0.09266591 0.11235874 0.12785948
[1] "Mean and SD for pi"
[1] 0.03275283 0.02507062
[1] "total time in minutes is "
[1] 11.81483