*** Overview *** The VCF files contain small indel calls for the CEU and YRI trios. The calls were made from MAQ-aligned Broad-recalibrated Illumina BAM files, using the indel caller Dindel (Albers et al). 454 and SOLID data were not considered in the analysis, not for the actual calling and also not for the detection of candidate indels. *** Procedure *** The calls were made as follows: 1. Extract all indels in MAQ alignments from the BAM file. These are the candidate indels. 2. Generate genotype likelihoods using Dindel for every candidate indel 3. Assuming Mendelian inheritance, calculate the posterior probability for each indel genotype in all three trio members jointly. Thus, the VCF files only contain indels that are consistent with Mendelian inheritance. Independent analysis of the trio members indicated that there is signficant number of cases where the trio-child shows evidence for an indel, but none of the parents does; these cases are not included in the VCF file. Such cases are likely explained by reduced sensitivity for detecting indels as compared to SNPs, mapping errors, and some may be a result of de novo mutations. We hope to address these issues in future releases by using more sensitive mapping and indel calling algorithms. NOTE As described above, all reported calls are consistent with Mendelian inheritance by construction. These VCFs are therefore not useful for mutation analysis. NOTE The second sample in the genotype file is that father, the third is sample is the mother. *** LOF variants *** In coding regions, indel rates are significantly lower due to selection, and since noise levels (factors resulting in false positives) are expected to be approximately constant across the genome, the false discovery rate of the indel call set will be increased in coding regions. To lower the number of false positive indel calls, we applied more stringent filters to the subset of indels that were called in the genome-wide set and were predicted to fall into the LOF class. The stringent filter requires that the range of positions where an indel would yield the same alternative haplotype sequence as the original called indel (for instance, in a repeat, the deletion of any repeat unit would give the same alternative haplotype), plus 4 bases of reference sequence on both sides of this region, was covered by at least one read on the forward strand, and at least one read on the reverse strand, with at most one mismatch between the read and the alternative haplotype sequence resulting from the indel (regardless of base-qualities). This filter removed the excess of 1-bp frameshift insertions seen in CHBJPT with respect to CEU in the less stringently filtered genome-wide indel call set, although it is expected to remove a significant number of true positive calls as well. The indels that pass this stringent filter have been annotated in the VCF files by 'SF' in the INFO field. If you have any questions, please contact me (Kees) at caa (at) sanger.ac.uk Kees Albers Gerton Lunter Richard Durbin