These VCF files were called on the full project bam files derived from the 20100311 sequence.index files, using the QCALL SNP finder. Only the five populations with the most samples were called, each independently from the others. Samples without HapMap3 genotype data were not called. The calls are not directly comparable to pilot calls, for multiple reasons: - they are only the product of one caller, not 2-or-3 calls from 3 callers - they are only filtered for call quality, not for the following filters used for the pilot - depth - proximity to indels (nor locally realigned to avoid possible new indels) - too high a fraction of mapping quality 0 Illumina reads We suggest these calls may be used to provide additional genotypes at previously known or called sites (remembering these are also genotypes from only one caller not consensus genotypes as for the pilot), or to provide candidate variable sites for other analyses. They should not be used as final SNP sites or genotypes for population genetic studies. Basic statistics about these calls, with comparison to the 0912 QCALL calls from pilot data that contributed to the final pilot call set. #num calls VCF Dec-2009 OverLap CEU 10250970 9535019 9017704 JPT 9047387 7503638 (*) CHB 7045299 TSI 10611134 YRI 15274485 12062137 (*): JPT+CHB MD5 checksums: 3711f27bd5c714c40968e0766f618be7 CHB.March2010.VCF.gz bd53cc4b656f99c9be519eaaf682d95b JPT.March2010.VCF.gz 953a95bc9983525417a61e3d9938ed80 TSI.March2010.VCF.gz b03cb53c2669d63f50f16215ae3e5765 YRI.March2010.VCF.gz 790601958f1e53c1d83c793f136be6f9 CEU.March2010.VCF.gz Quang Le, Richard Durbin, Sendu Balasubramanian, June 2010