-------------------------------- cnD - Copy number variant caller cnD is a program to detect copy number variants from short-read sequence data. The target organism is assumed to be inbred, and therefore homozygous, so regions of apparent heterozygous SNPs (as called by samtools) can be used to detect copy number gains. cnD uses both the rate of these paralogous sequence variants, and the raw sequence depth, to call copy number gains and losses using a hidden markov model. -------- Building The program was written in the D programming language (http://www.digitalmars.com/d/) using the Tango library (www.dsource.org/projects/tango). Assuming the environment is set up correctly (gdc 0.24 and tango is installed), running 'make' from the src/ directory should build the executable to bin/. A prebuilt x86-64 binary is provided in the bin/ directory. ------- Running cnD requires a tab-delimited file containing the read depth, number of heterozygous SNPs and average mapping quality of reads for 1kb windows of the genome. You can generate this file from a SAM/BAM file using the script bin/pileup2win.pl: samtools pileup -c mouse.bam | bin/pileup2win.pl > mouse.win You can then call candidate CNVs from the file mouse.win using cnD: bin/cnD --prefix=mouse --smooth=100 mouse.win Two output files will be created: mouse_summary.txt -- Emission distribution parameters for each state. mouse_viterbi.txt -- Tab-seperated file with the data points annotated with their viterbi-decoded state. The data format is: --------------- Post-processing Run the metaCaller.pl script in the bin/ directory to extract consensus calls from the viterbi file: bin/metaCaller.pl --threshold 0.5 --window 10 mouse_viterbi.txt > mouse_metacalls.txt This will extract 10-window regions where >=50% of the windows have the same state. All other regions are classified as "X" (complex). In this step, all gain states are considered to be equivalent. The gain/loss regions can then be extracted by running: bin/extractCNChanges.pl mouse_metacalls.txt This script requires that the Set/IntSpan perl module is installed (http://search.cpan.org/dist/Set-IntSpan/). --------------- Credits/Contact Ideas by Jared Simpson, David Adams and Richard Durbin. Code by Jared Simpson (js18@sanger.ac.uk). The design of the dhmm library was inspired by the BioJava interface (http://biojava.org/wiki/Main_Page)