This file describes how to run the CAROL script written by Margarida Lopes 
in the Applied Statistical Genetics group at the Wellcome Trust Sanger Institute.

The analysis requires only one source file called CAROL_script.r.
This script takes a single input file. The file should be tab-delimited with the 
following columns, obtained from the outputs of PolyPhen-2 (Adzhubei et al. 2010) 
and SIFT (Ng et al. 2003, 2002). Note that the first line of the file should
be a header containing the names of each of the columns listed below:

1)	RS_ID (i.e non-synonymous variant ID)
2)	POLYPHEN_PREDICTION
3)      POLYPHEN_SCORE (corresponds to pph2_prob in PolyPhen-2) 
4)	SIFT_PREDICTION
5)	SIFT_SCORE

The command to call CAROL in a Unix system is:

>R CMD BATCH '--args file_name' CAROL_script.r CAROL_results.Rout

The results can then be found in the CAROL_results.txt file in the 
current working directory.

The output file is a tab-delimited file consisting of the following 9 columns:

1)	RS_ID
2)	POLYPHEN_PREDICTION
3)	P_POLY1 (similar to POLYPHEN_SCORE with some rules applied)
4)      W_POLY (weights for PolyPhen-2)
5)      SIFT_PREDICTION
6)	SIFT_SCORE1 (similar to SIFT_SCORE with some rules applied)
7)	W_SIFT (weights for SIFT)
8)      CAROL_PREDICTION
9)	CAROL_SCORE (ranges between 0 and 1, probabilistic scores > 0.98 are considered deleterious, otherwise are neutral).